feat: enable user to configure CLI params per rule
Closes #155 (closed)
- added
rule_config.yaml
totests/input_files/
. Here, additional parameters for all rules in main Snakefile and single/paired-end Snakefiles can be set - before each rule a variable
current_rule
is set to the name of the current rule. This workaround is necessary, as currently the snakemake directive {rule} can only be accessed from within the shell calls, whereas we need to access it from theparams
section - Parsing the
rule_config.yaml
:- In main Snakefile,
rule_config.yaml
will be loaded into arule_config dict
; if norule_config.yaml
specified,rule_config dict
will be empty and all tools will run with default parameters. - In main Snakefile, a function
parse_rule_config
is defined, which takes as input the rule_config dict, the current rule name, and a tuple of arguments that must not be changed for the current rule from within the rule config ("immutable" arguments) in order to ensure proper functioning of the pipeline. Immutable arguments are those that either influence a tool's behavior in a way that is critical for pipeline wiring, or those that are defined in the samples table. Returns a string with all additional parameter-value pairs. - in
params
section of each rule the above described function is called. Parameters read from therule_config.yaml
are then injected as a string in the respective shell calls to the tools
- In main Snakefile,
- Adapted config.yaml for tests accordingly
- Adapted pipeline documentation accordingly
- The variable
current_rule
is used for logfile naming in each rule - Adapted
zarp_multiqc_config.py
accordingly
Merge request reports
Activity
Sorry for the vague description. Yes, it is because "the tests need specific values for certain tool params in order to produce exactly the outputs that we expect (and provide and verify the MD5 sums of)". I guess I needn't have mentioned the failing of tests in case of a missing rule_config, because obviously when using a different sample.tsv file the tests will also fail...Sorry
I remember now why I mentioned the possible failing of tests at all: The
samples.tsv
resides intests/input_files
, whereas therule_config.yaml
is inconfig/
. Consequently it might not be obvious why tests might fail. Should we move therule_config.yaml
totests/input_files
? Not sure about the most consistent way here... How will ZARP-cli place these files?
Yes,
rule_config.yaml
should go intests/input_files
and its location should be set intests/input_files/config.yaml
, as it is clearly an input for a specific test. In ZARP-cli, I'm imagining that there will be a CLI option to supply the rule config file. If not provided, a default file will be used (which is going to be located in~/.zarp/rule_config.yaml
). During ZARP-cli initialization, the user will be prompted to create such a defaults file. If the user chooses not to create the file, rules will be run with default params.As for providing a template/example for this, I think we can point to the rule config (or configs, in case we will create more tests) in
tests/input_files
.Do you see any issues with this approach, @herrmchr?
156 156 - Genome sequence file (`.fasta`) 157 157 - Gene annotation file (`.gtf`) 158 158 - **Parameters** 159 - `--sjdbOverhang`: maximum read length - 1; lower values may reduce accuracy, 160 higher values may increase STAR runtime; specify in sample table column 161 `index_size` 159 - **samples.tsv** 160 - `--sjdbOverhang`: maximum read length - 1; lower values may reduce accuracy,higher values may increase STAR runtime; specify in sample table column `index_size` changed this line in version 3 of the diff
- Resolved by Alex Kanitz
1 current_rule = 'pe_remove_adapters_cutadapt' 1 2 rule pe_remove_adapters_cutadapt: couldn't figure out how. Just writing
rule current_rule
gives an error as would be expected. Couldn't find a way to access the variable value in the rule directive. Any ideas? Snakemake doc mentions anonymous rules but I don't quite understand the implications
37 38 adapter_3_mate2 = lambda wildcards: 38 39 get_sample('fq2_3p', search_id='index', search_value=wildcards.sample), 39 40 adapter_5_mate2 = lambda wildcards: 40 get_sample('fq2_5p', search_id='index', search_value=wildcards.sample) 41 get_sample('fq2_5p', search_id='index', search_value=wildcards.sample), 42 additional_params = parse_rule_config(rule_config,current_rule=current_rule, immutable=('-a', '-A', '-g', '-G', '-o', '-p')) Might be clearer/more legible to have these lines wrapped (and more in accordance with PEP-8 line length limitations), e.g.:
additional_params = parse_rule_config( rule_config, current_rule=current_rule, immutable=('-a', '-A', '-g', '-G', '-o', '-p'), )
or even:
additional_params = parse_rule_config( rule_config, current_rule=current_rule, immutable=( '-a', '-A', '-g', '-G', '-o', '-p', ), )
changed this line in version 3 of the diff
mentioned in commit 10e8891c