Skip to content
Snippets Groups Projects

feat: enable user to configure CLI params per rule

Merged CJHerrmann requested to merge rule_config_params into dev
4 unresolved threads

Closes #155 (closed)

  • added rule_config.yaml to tests/input_files/. Here, additional parameters for all rules in main Snakefile and single/paired-end Snakefiles can be set
  • before each rule a variable current_rule is set to the name of the current rule. This workaround is necessary, as currently the snakemake directive {rule} can only be accessed from within the shell calls, whereas we need to access it from the params section
  • Parsing the rule_config.yaml:
    • In main Snakefile, rule_config.yaml will be loaded into a rule_config dict; if no rule_config.yaml specified, rule_config dict will be empty and all tools will run with default parameters.
    • In main Snakefile, a function parse_rule_config is defined, which takes as input the rule_config dict, the current rule name, and a tuple of arguments that must not be changed for the current rule from within the rule config ("immutable" arguments) in order to ensure proper functioning of the pipeline. Immutable arguments are those that either influence a tool's behavior in a way that is critical for pipeline wiring, or those that are defined in the samples table. Returns a string with all additional parameter-value pairs.
    • in params section of each rule the above described function is called. Parameters read from the rule_config.yaml are then injected as a string in the respective shell calls to the tools
  • Adapted config.yaml for tests accordingly
  • Adapted pipeline documentation accordingly
  • The variable current_rule is used for logfile naming in each rule
  • Adapted zarp_multiqc_config.py accordingly
Edited by Alex Kanitz

Merge request reports

Loading
Loading

Activity

Filter activity
  • Approvals
  • Assignees & reviewers
  • Comments (from bots)
  • Comments (from users)
  • Commits & branches
  • Edits
  • Labels
  • Lock status
  • Mentions
  • Merge request status
  • Tracking
    • This looks great! One question though (before looking at the code in more detail): Can you elaborate why exactly tests fail? Is it because the tests need specific values for certain tool params in order to produce exactly the outputs that we expect (and provide and verify the MD5 sums of)?

    • Author Maintainer

      Sorry for the vague description. Yes, it is because "the tests need specific values for certain tool params in order to produce exactly the outputs that we expect (and provide and verify the MD5 sums of)". I guess I needn't have mentioned the failing of tests in case of a missing rule_config, because obviously when using a different sample.tsv file the tests will also fail...Sorry

    • Author Maintainer

      I remember now why I mentioned the possible failing of tests at all: The samples.tsv resides in tests/input_files, whereas the rule_config.yaml is in config/. Consequently it might not be obvious why tests might fail. Should we move the rule_config.yaml to tests/input_files? Not sure about the most consistent way here... How will ZARP-cli place these files?

    • Please register or sign in to reply
  • Yes, rule_config.yaml should go in tests/input_files and its location should be set in tests/input_files/config.yaml, as it is clearly an input for a specific test. In ZARP-cli, I'm imagining that there will be a CLI option to supply the rule config file. If not provided, a default file will be used (which is going to be located in ~/.zarp/rule_config.yaml). During ZARP-cli initialization, the user will be prompted to create such a defaults file. If the user chooses not to create the file, rules will be run with default params.

    As for providing a template/example for this, I think we can point to the rule config (or configs, in case we will create more tests) in tests/input_files.

    Do you see any issues with this approach, @herrmchr?

  • CJHerrmann added 1 commit

    added 1 commit

    • 019a1037 - rule_config moved to tests/input_files/

    Compare with previous version

  • CJHerrmann changed the description

    changed the description

156 156 - Genome sequence file (`.fasta`)
157 157 - Gene annotation file (`.gtf`)
158 158 - **Parameters**
159 - `--sjdbOverhang`: maximum read length - 1; lower values may reduce accuracy,
160 higher values may increase STAR runtime; specify in sample table column
161 `index_size`
159 - **samples.tsv**
160 - `--sjdbOverhang`: maximum read length - 1; lower values may reduce accuracy,higher values may increase STAR runtime; specify in sample table column `index_size`
  • Alex Kanitz
  • Alex Kanitz
  • 1 current_rule = 'pe_remove_adapters_cutadapt'
    1 2 rule pe_remove_adapters_cutadapt:
  • 37 38 adapter_3_mate2 = lambda wildcards:
    38 39 get_sample('fq2_3p', search_id='index', search_value=wildcards.sample),
    39 40 adapter_5_mate2 = lambda wildcards:
    40 get_sample('fq2_5p', search_id='index', search_value=wildcards.sample)
    41 get_sample('fq2_5p', search_id='index', search_value=wildcards.sample),
    42 additional_params = parse_rule_config(rule_config,current_rule=current_rule, immutable=('-a', '-A', '-g', '-G', '-o', '-p'))
    • Might be clearer/more legible to have these lines wrapped (and more in accordance with PEP-8 line length limitations), e.g.:

      additional_params = parse_rule_config(
          rule_config,
          current_rule=current_rule,
          immutable=('-a', '-A', '-g', '-G', '-o', '-p'),
      )

      or even:

      additional_params = parse_rule_config(
          rule_config,
          current_rule=current_rule,
          immutable=(
              '-a',
              '-A',
              '-g',
              '-G',
              '-o',
              '-p',
          ),
      )
    • changed this line in version 3 of the diff

    • Please register or sign in to reply
  • CJHerrmann added 1 commit

    added 1 commit

    Compare with previous version

  • Alex Kanitz approved this merge request

    approved this merge request

  • Alex Kanitz changed title from Rule config params to feat: enable user to configure CLI params per rule

    changed title from Rule config params to feat: enable user to configure CLI params per rule

  • merged

  • Alex Kanitz mentioned in commit 10e8891c

    mentioned in commit 10e8891c

  • Please register or sign in to reply
    Loading