HQ: A pipeline tool for The Joker¶
This package simplifies and enables creating pipelines to run The Joker on large datasets of radial velocities. The
primary way to use this tool is through the command-line interface hq
that
is installed when you pip install this package.
Example pipeline¶
Initialize the run: Create a folder / repository / project for your run of the pipeline as a place to host the input data files and all of the outputs generated by this tool. For example, if I were to run this on APOGEE DR17 visit data, I would create a new repository “apogee-dr17”. I would then put the RV catalog (one RV measurement per row, with a column that represents a unique source ID) in “apogee-dr17/data”. To initialize the HQ run, we will run
hq init
and specify a path that we want to store the run configuration files. For example, to store the HQ configuration files in “apogee-dr17/hq-config”, I would run:hq init --path apogee-dr17/hq-config
This will create the path (if it does not exist already), and will copy in template configuration files that will be needed by
hq
to run the rest of the pipeline. In particular, this will create a “config.yml” file, which contains the actual configurable values, and a “prior.py” file, which contains the pymc3 model specification of the prior used by The Joker and MCMC to generate the orbital parameter samplings.Edit the config files: You will need to edit both of these files to update the values as you would like the run to proceed. A number of the parameters in the generated config.yml file (here, “apogee-dr17/hq-config/config.yml”) are required and have no default values. In particular, you must set the
input_data_file
parameter to the full path to the radial velocity data file you would like to run on. You may also want to set thecache_path
parameter: This sets the location that HQ will use to store output data files. Here, for example, we may want to set this to “/full/path/to/apogee-dr17/cache”. All of the required parameters are labeled with comments as# REQUIRED
.(optional) Define the run environment: All of the
hq
commands accept passing in the run path containing the configuration files via the--path
command flag. In this example, this would be the path to “apogee-dr17/hq-config”. However, you can also set this globally in your environment by setting the$HQ_RUN_PATH
environment variable so that you do not have to pass the path in to every command. For the rest of these examples, I will assume that you have set the$HQ_RUN_PATH
to your run config path!Create the prior samples cache file: TODO:
hq make_prior_cache
Set up the tasks used to parallelize and deploy the run: TODO:
hq make_tasks
Run The Joker sampler on all stars: TODO:
hq run_thejoker
(optional) Fit a robust constant RV model to all sources: TODO:
hq run_constant
Analyze The Joker samplings: to determine which stars are complete and which stars need to be followed up with standard MCMC:
hq analyze_joker
TODO: HQ_THEANO_PATH=/tmp/theano_cache
Run standard MCMC on the unimodal samplings: TODO:
hq run_mcmc
Analyze the MCMC samplings: TODO:
hq analyze_mcmc
Combine the metadata files: TODO:
hq combine_metadata