HQ: A pipeline tool for The Joker

This package simplifies and enables creating pipelines to run The Joker on large datasets of radial velocities. The primary way to use this tool is through the command-line interface hq that is installed when you pip install this package.

Example pipeline

  • Initialize the run: Create a folder / repository / project for your run of the pipeline as a place to host the input data files and all of the outputs generated by this tool. For example, if I were to run this on APOGEE DR17 visit data, I would create a new repository “apogee-dr17”. I would then put the RV catalog (one RV measurement per row, with a column that represents a unique source ID) in “apogee-dr17/data”. To initialize the HQ run, we will run hq init and specify a path that we want to store the run configuration files. For example, to store the HQ configuration files in “apogee-dr17/hq-config”, I would run:

    hq init --path apogee-dr17/hq-config

    This will create the path (if it does not exist already), and will copy in template configuration files that will be needed by hq to run the rest of the pipeline. In particular, this will create a “config.yml” file, which contains the actual configurable values, and a “prior.py” file, which contains the pymc3 model specification of the prior used by The Joker and MCMC to generate the orbital parameter samplings.

  • Edit the config files: You will need to edit both of these files to update the values as you would like the run to proceed. A number of the parameters in the generated config.yml file (here, “apogee-dr17/hq-config/config.yml”) are required and have no default values. In particular, you must set the input_data_file parameter to the full path to the radial velocity data file you would like to run on. You may also want to set the cache_path parameter: This sets the location that HQ will use to store output data files. Here, for example, we may want to set this to “/full/path/to/apogee-dr17/cache”. All of the required parameters are labeled with comments as # REQUIRED.

  • (optional) Define the run environment: All of the hq commands accept passing in the run path containing the configuration files via the --path command flag. In this example, this would be the path to “apogee-dr17/hq-config”. However, you can also set this globally in your environment by setting the $HQ_RUN_PATH environment variable so that you do not have to pass the path in to every command. For the rest of these examples, I will assume that you have set the $HQ_RUN_PATH to your run config path!

  • Create the prior samples cache file: TODO:

    hq make_prior_cache
  • Set up the tasks used to parallelize and deploy the run: TODO:

    hq make_tasks
  • Run The Joker sampler on all stars: TODO:

    hq run_thejoker
  • (optional) Fit a robust constant RV model to all sources: TODO:

    hq run_constant
  • Analyze The Joker samplings: to determine which stars are complete and which stars need to be followed up with standard MCMC:

    hq analyze_joker
  • TODO: HQ_THEANO_PATH=/tmp/theano_cache

  • Run standard MCMC on the unimodal samplings: TODO:

    hq run_mcmc
  • Analyze the MCMC samplings: TODO:

    hq analyze_mcmc
  • Combine the metadata files: TODO:

    hq combine_metadata