|
|
# Design discussions
|
|
|
|
|
|
## Passing tool parameters
|
|
|
|
|
|
### Question
|
|
|
|
|
|
How can users pass tool parameters into ZARP?
|
|
|
|
|
|
### Constraints
|
|
|
|
|
|
Tool parameters here refer only to those parameters that are not constrained by ZARP's wiring. The proposed solution **MUST** outline a way to enforce that wiring-critical arguments cannot be overridden by users.
|
|
|
|
|
|
Executing arbitrary user input is a signficant security risk and any design proposals **MUST** present a viable strategy to mitigate the risk of code injection.
|
|
|
|
|
|
Next to strategies of preventing the modification of essential arguments and mitigating code injection, proposed solutions need to balance the following constraints:
|
|
|
|
|
|
* **Flexibility:** Users **SHOULD** be able to customize the behavior of individual tools to fit their use cases to the highest degree possible, and use cases **SHOULD** be limited as little as possible by design decisions.
|
|
|
* **Ease-of-use:** Users **SHOULD** be able to run ZARP as conveniently as possible, with minimal management of tool parameters, with minimal risk of producing faulty runs and from various user agents (e.g., ZARP-cli, Snakemake).
|
|
|
* **Maintainability:** The burden on maintainers with regard to dealing with tool parameters in the future (e.g., responding to requests for interfacing individual parameters and implementing these interfaces, keeping tool parameters up-to-date when bumping tool versions, maintaining parameter descriptions) **SHOULD** be minimized.
|
|
|
|
|
|
### Design proposals
|
|
|
|
|
|
#### Tool-specific config files applied to all samples of a run
|
|
|
|
|
|
##### Specification
|
|
|
|
|
|
A parameter `additional_params` is added to each Snakemake rule that runs a tool `{toolname}` that is open to modification by the user (e.g., `STAR`). The `additional_params` parameter is bound to a function `parse_tool_config()` that parses a tool configuration file in YAML format, the path to which is provided in a variable `{toolname}_config` defined in the main Snakemake configuration. The function returns a quoted string of user-provided tool parameters and values (e.g., `"'--my-param-1' 'my-arg-1' '--my-param-2' 'my-arg-2'"`), which Snakemake will inject into the command to be executed.
|
|
|
|
|
|
Users can modify the tool configuration file at will, allowing maximum flexibility in modifying tool behavior. However, they shoulder the risk of creating nonsensical, misleading or erroneous tool behavior. Providing parameters and values in a tool configuration file is _optional_. If no specific file is provided for a specific run, a default file will be used, if previously configured (e.g., during `zarp-cli --init`). If the tool configuration file is empty or no file is available, `parse_tool_config()` will return an empty list and the tool will run with default parameters.
|
|
|
|
|
|
##### Strategy to prevent overriding wiring-critical arguments
|
|
|
|
|
|
In the function call that is bound to the `additional_params`, any wiring-critical parameters must be specifically queried for after parsing the tool configuration file. If the user tried to override such a parameter, an error should be raised, leading to a premature exit.
|
|
|
|
|
|
##### Strategy to mitigate code injection
|
|
|
|
|
|
Parameters and values will be individually quoted in `parse_tool_config()` before compiling the return string.
|
|
|
|
|
|
##### Assessment of other metrics
|
|
|
|
|
|
| Category | Metric | Assessment |
|
|
|
| --- | --- | --- |
|
|
|
| Flexibility | Tool behavior can be customized? | Yes |
|
|
|
| Flexibility | No arbitrary limitations on tool behavior are imposed? | Yes |
|
|
|
| Ease-of-use | Management of parameters entirely optional? | Yes |
|
|
|
| Ease-of-use | Parameters passed in a structured way? | Yes |
|
|
|
| Ease-of-use | Validation of parameters before tool execution? | No |
|
|
|
| Maintenability | No need to interface individual parameters? | Yes |
|
|
|
| Maintenability | No need to maintain parameters? | Yes |
|
|
|
| Maintenability | No need to maintain parameter descriptions? | Yes |
|
|
|
|
|
|
##### Possible variations
|
|
|
|
|
|
* Attempts to override wiring-critical arguments could be ignored (a warning may optionally be emitted) instead of leading to a system exit.
|
|
|
* A single YAML file could probably be used for multiple tools, with each tool having it's own section (top-level keyword). It must, however, be obvious to users how to name those sections, and names need to be passed to `parse_tool_config()` so that the right section will be processed for each call.
|
|
|
* Instead of specifying the tool configuration file globally for each Snakemake run, the same approach could also be used to allow users to pass tool parameters individually for each run. In order to do so, the tool configuration file should be specified in the sample table rather than the main Snakemake config.
|
|
|
|
|
|
##### Prototype
|
|
|
|
|
|
See [here](https://git.scicore.unibas.ch/zavolan_group/pipelines/zarp/-/tree/prototype_tool_param_passing/sandbox) for some minimal prototypes that implement the design specification with either:
|
|
|
* a single config file for all tools (`Snakefile.single_tool_config_file`)
|
|
|
* one config file per tool (`Snakefile.multiple_tool_config_files`)
|
|
|
|
|
|
Note that safeguards for preventing wiring-critical arguments from being overridden are not implemented in the prototypes. However, these should be trivial to add.
|
|
|
|
|
|
Some tests for shell injection are also provided in `Snakefile.shell_injection_tests`.
|
|
|
|
|
|
#### ADD DESCRIPTION
|
|
|
|
|
|
##### Specification
|
|
|
|
|
|
ADD
|
|
|
|
|
|
##### Strategy to prevent overriding wiring-critical arguments
|
|
|
|
|
|
ADD
|
|
|
|
|
|
##### Strategy to mitigate code injection
|
|
|
|
|
|
ADD
|
|
|
|
|
|
##### Assessment of other metrics
|
|
|
|
|
|
| Category | Metric | Assessment |
|
|
|
| --- | --- | --- |
|
|
|
| Flexibility | Tool behavior can be customized? | |
|
|
|
| Flexibility | No arbitrary limitations on tool behavior are imposed? | |
|
|
|
| Ease-of-use | Management of parameters entirely optional? | |
|
|
|
| Ease-of-use | Parameters passed in a structured way? | |
|
|
|
| Ease-of-use | Validation of parameters before tool execution? | |
|
|
|
| Maintenability | No need to interface individual parameters? | |
|
|
|
| Maintenability | No need to maintain parameters? | |
|
|
|
| Maintenability | No need to maintain parameter descriptions? | |
|
|
|
|
|
|
##### Possible variations
|
|
|
|
|
|
* ADD
|
|
|
|
|
|
##### Prototype
|
|
|
|
|
|
ADD
|
|
|
|