Commits · ef2c6624df80ed62388863de0044078ba902f15a · zavolan_group / pipelines / ZARP

Jun 12, 2020
- fix(prepare_inputs): support relative paths · 402da16e
  Alex Kanitz authored 4 years ago
  
  402da16e
Apr 27, 2020

Refactor LabKey to Snakemake script · 556f1e12

Alex Kanitz authored 4 years ago

- clean up command line interface
  - improve descriptions
  - add consistent structure
  - remove or merge superfluous CLI arguments
  - set defaults
  - update test calls
  - update docs
  - when importing data from LabKey, table is saved to 'samples.tsv.labkey' in same directory as Snakemake sample table
- allow user to specify environment variables and relative paths in input table and on CLI
  - relative paths in the input table are interpreted with respect to the directory containing the input table
  - relative paths will are interpreted with respect to the current working directory; this is to achieve portability with respect to tests but is discouraged in production because its behavior is not very predictable from the user's perspective; consequently a warning is thrown
- set STAR index size to read length - 1
- remove `gtf_filtered` and `tr_fasta_filtered` and update Snakefiles and test sample tables accordingly
- rename some MultiQC report-related parameters and update Snakefiles and test config files accordingly
- add logging
- add docstrings to module and all functions
- add typing definitions to all functions
- restructure and comment code to improve readability
- linters `flake8` and `mypy` pass

556f1e12

Major refactoring · 6cf28511

BIOPZ-Katsantoni Maria authored 4 years ago and

Alex Kanitz committed 4 years ago

* Sequencing mode-related changes:
  * allowed sequencing modes in Snakemake input table changed from `paired_end` and `single_end` to `pe` and `se`, respectively
  * remove sequencing mode from output paths for each rule
  * corresponding wild cards removed entirely from all rules that do not depend on sequencing mode (currently all rules that are defined in the main `Snakefile` in the project root directory)
  * where absolutely necessary, sequencing mode is added as part of output file or directory instead
  * remove dependency of sequencing mode for rule for `FastQC`; now runs separately for each strand
* Changes related to MultiQC and output file/directory structure
  * moving and renaming outputs for MultiQC is no longer required
  * code to create MultiQC custom config externalized into script `scripts/rhea_multiqc_config.py`
  * add MultiQC output files with deterministic output to md5 sum checks performed during execution of `tests/test_integration_workflow/test.{local,slurm}.sh`
  * output filenames for each rule now follow this general structure: `samples/{sample_name}/{rule}/{output_file}`
  * change log directory structure matches results directory structure
* Miscellaneous changes
  * consistent, PEP8-compliant formatting in most parts, including Snakemake files, where allowed
  * remove rule `extract_decoys_salmon`; equivalent file `chrName.txt` produced by `star_index` is used instead
  * add rule `start` which copies sample data to the results directory and enforces uniform naming
  * refactoring of ALFA rules and modification of the CI/CD test to ensure compatibility

6cf28511

Feb 18, 2020

run tests in verbose mode · 0d95577e

Alex Kanitz authored 5 years ago

- trap call functionalized through cleanup() function
- function added to all test scripts
- function prints out exit status of last command before trap
- flag `--verbose` added to Snakemake calls in all test scripts
- script tests rename to follow naming convention 'test_script_<script_name>_<script_run_mode>

0d95577e

Feb 15, 2020

get Snakemake input from LabKey API · eea0206f

BIOPZ-Katsantoni Maria authored 5 years ago and

Alex Kanitz committed 5 years ago

- add script that prepares Snakemake input files 'samples.tsv' and 'config.yaml' from LabKey table
- script either connects to API directly (with '--remote' and related options) or processes a tab-separated LabKey dump file
- add tests for both use cases
- common input files for tests now in 'tests/input_files'
- update all other tests to account for new file locations
- update documentation

eea0206f

Feb 14, 2020
- repo follows recommended structure · 1e52fa56
  Alex Kanitz authored 5 years ago
  
  1e52fa56
- LabKey-like input to Snakmake input · 979e6cdd
  BIOPZ-Katsantoni Maria authored 5 years ago and Alex Kanitz committed 5 years ago
  
  - separate organism genome architecture (different input folder) - change MD5 checksums to match the new output
  979e6cdd
Feb 07, 2020
- Create two dictionary files for converting LabKey table to Snakemake inputs · 8b8e2774
  BIOPZ-Börsch Anastasiya authored 5 years ago and Alex Kanitz committed 5 years ago
  
  8b8e2774
Feb 04, 2020

add documentation · 1ef8b6af

Alex Kanitz authored 5 years ago

`README.md` file describes
- aim and background of the project (including the workflow DAG representation)
- how to install requirements (including setting up a `conda` environment for the project)
- how to execute the workflow run integration test
- how to run the workflow on your own samples (including how to auto-generate required params from LabKey metadata)

Additional minor changes:
- minor changes in various test and related files, including updates of paths
- root directory now includes subdirectory `runs/` for a user's workflow runs (contents not version-controlled)

1ef8b6af

Feb 03, 2020

generate Snakemake inputs from LabKey data table · cd541afe

BIOPZ-Katsantoni Maria authored 5 years ago and

Alex Kanitz committed 5 years ago

Adds script `scripts/labkey_to_snakemake.py` which
- maps LabKey table fields to Snakemake parameters
- assembles required parameters from the table data
- infers required parameters from the input data
- produces files `config.yaml` and `samples.tsv` required by the Snakemake pipeline

A self-contained integration test for the script is located at `tests/test_scripts_labkey_to_snakemake` (execute script `test.sh`) and was added to the CI/CD pipeline.

Note that intermittent changes to the `master` branch were merged into this branch to forego conflicts during merging.

Closes #39

cd541afe