Commits · 7c74291bb03cd04bc67f7713b5d3d686ec67483f · zavolan_group / pipelines / ZARP

Jun 11, 2020
- Set appropriate parameter. Fixes #83 · 3a9ec893
  Dominik Burri authored 4 years ago
  
  3a9ec893
Apr 27, 2020

Refactor LabKey to Snakemake script · 556f1e12

Alex Kanitz authored 4 years ago

- clean up command line interface
  - improve descriptions
  - add consistent structure
  - remove or merge superfluous CLI arguments
  - set defaults
  - update test calls
  - update docs
  - when importing data from LabKey, table is saved to 'samples.tsv.labkey' in same directory as Snakemake sample table
- allow user to specify environment variables and relative paths in input table and on CLI
  - relative paths in the input table are interpreted with respect to the directory containing the input table
  - relative paths will are interpreted with respect to the current working directory; this is to achieve portability with respect to tests but is discouraged in production because its behavior is not very predictable from the user's perspective; consequently a warning is thrown
- set STAR index size to read length - 1
- remove `gtf_filtered` and `tr_fasta_filtered` and update Snakefiles and test sample tables accordingly
- rename some MultiQC report-related parameters and update Snakefiles and test config files accordingly
- add logging
- add docstrings to module and all functions
- add typing definitions to all functions
- restructure and comment code to improve readability
- linters `flake8` and `mypy` pass

556f1e12

Major refactoring · 6cf28511

BIOPZ-Katsantoni Maria authored 4 years ago and

Alex Kanitz committed 4 years ago

* Sequencing mode-related changes:
  * allowed sequencing modes in Snakemake input table changed from `paired_end` and `single_end` to `pe` and `se`, respectively
  * remove sequencing mode from output paths for each rule
  * corresponding wild cards removed entirely from all rules that do not depend on sequencing mode (currently all rules that are defined in the main `Snakefile` in the project root directory)
  * where absolutely necessary, sequencing mode is added as part of output file or directory instead
  * remove dependency of sequencing mode for rule for `FastQC`; now runs separately for each strand
* Changes related to MultiQC and output file/directory structure
  * moving and renaming outputs for MultiQC is no longer required
  * code to create MultiQC custom config externalized into script `scripts/rhea_multiqc_config.py`
  * add MultiQC output files with deterministic output to md5 sum checks performed during execution of `tests/test_integration_workflow/test.{local,slurm}.sh`
  * output filenames for each rule now follow this general structure: `samples/{sample_name}/{rule}/{output_file}`
  * change log directory structure matches results directory structure
* Miscellaneous changes
  * consistent, PEP8-compliant formatting in most parts, including Snakemake files, where allowed
  * remove rule `extract_decoys_salmon`; equivalent file `chrName.txt` produced by `star_index` is used instead
  * add rule `start` which copies sample data to the results directory and enforces uniform naming
  * refactoring of ALFA rules and modification of the CI/CD test to ensure compatibility

6cf28511

Mar 20, 2020

Fix Poly(A)-trimming rule · 392b04d2

BIOPZ-Katsantoni Maria authored 4 years ago and

Alex Kanitz committed 4 years ago

In labkey_to_snakemake.py fixed the parameters so that there is 3p as well 5p polya
feature for every mate, which can be matched to the -a -g -A and -G options of cutadapt
depending on which is the sense or antisense mate the appropriate variable is populated
and the rest of variables are filled with 'XXXXXXXXXXXX' which leads to no trimming by
cutadapt. The poly-A trimming rules are fixed to contain all -a -g -A -G options.

392b04d2

Mar 19, 2020
- MultiQC · fd1e3123
  BIOPZ-Bak Maciej authored 4 years ago and Alex Kanitz committed 4 years ago
  
  fd1e3123
Mar 18, 2020
- Fix cutadapt log file names · 884afee0
  BIOPZ-Katsantoni Maria authored 5 years ago and Alex Kanitz committed 5 years ago
  
  884afee0
Mar 17, 2020
- Fix cutadapt overtrimming · bab8f25a
  BIOPZ-Katsantoni Maria authored 5 years ago and Alex Kanitz committed 5 years ago
  
  bab8f25a
Mar 13, 2020
- fixed snakemake subworkflow typos/bugs · 4baccc98
  BIOPZ-Katsantoni Maria authored 5 years ago
  
  4baccc98
Mar 12, 2020

added rule for · 37fb0fd0

Dominik Burri authored 5 years ago

- renaming bedgraph
- creating ALFA qc plots

removed conda dependence, moved import statement.

included ALFA in finish rule, corrected annotation.gtf and config.yaml, created new .svg

37fb0fd0

Mar 06, 2020
- Merged paired end and single end rules for star_rpm and... · fb784999
  BIOPZ-Gypas Foivos authored 5 years ago
  
  Merged paired end and single end rules for star_rpm and index_genomic_alignment_samtools. Fixed wiring of calculate tin score: bam should be input and not params.
  fb784999
- Generate bedgraph file of normalised coverage. Fixes #45 · a54ff3e8
  Dominik Burri authored 5 years ago and BIOPZ-Gypas Foivos committed 5 years ago
  
  a54ff3e8
- Separate stdout and stderr logs for the majority of rules. Closes #76 and #79 · ff0a39fa
  BIOPZ-Katsantoni Maria authored 5 years ago and BIOPZ-Gypas Foivos committed 5 years ago
  
  ff0a39fa
Feb 21, 2020

Use minified docker images · dc2afcf9
Alex Kanitz authored 5 years ago

dc2afcf9

Add rule that combines TPM values from Salmon · bb1f9b8f

BIOPZ-Iborra de Toledo Paula authored 5 years ago and

Alex Kanitz committed 5 years ago

-  Remove files with non-deterministic output from `tests/test_integration_workflow/expected_output.files`
-  Update MD5 sums in `tests/test_integration_workflow/expected_output.md5`
-  Update new workflow DAG and rule graph images

bb1f9b8f

Feb 20, 2020

create log directories in Snakefile\ · 5e1ec85e

Alex Kanitz authored 5 years ago

- log and, if workflow is executed on cluster, cluster log directories are explicitly created in `Snakefile`
- location of main log directory can be configured in `config.yaml` (field `log_dir`, previously: `local_log`; requires change in script `labkey_to_snakemake.py` as well as subworkflows as field name is hard-coded there)
- location of cluster log directory can be configured in `cluster.json` (in field `__default__` -> `out`)
- `config.yaml` and `cluster.json` in `tests/input_files` are set such that a directory `logs/` is created in the directory where Snakemake is run (i.e., the directory of each test); cluster logs are stored in a subdirectory `logs/cluster`
- removes instructions to explicitly create log directories from docs and all test scripts
- cleans up main `Snakefile` (apart from Snakemake-specific syntax, now passes `flake8` linter test)

5e1ec85e

Feb 17, 2020

add TIN score calculation · c538fe8b

BIOPZ-Bak Maciej authored 5 years ago and

Alex Kanitz committed 5 years ago

- add rule for input preparation (GTF to BED12)
- add rule for TIN score calculation
- update rule graph and DAG image
- update Slurm cluster config

c538fe8b

Feb 14, 2020
- repo follows recommended structure · 1e52fa56
  Alex Kanitz authored 5 years ago
  
  1e52fa56
Feb 08, 2020

integrate completed rules into workflow · ca9a2bb4

BIOPZ-Iborra de Toledo Paula authored 5 years ago and

Alex Kanitz committed 5 years ago

- Rules now integrated include #7, #8, #9, #10, #11, #12, #13, #14, #15, #16, #17
- Applies both for single-end and paired-end subworkflows
- DAG image updated
- Test sample tables and outcomes updated
- Unused preprocessing subworkflow removed

Closes #48
Closes #6

ca9a2bb4

Feb 07, 2020

fix various small issues · 17818f4a

Alex Kanitz authored 5 years ago

- remove log files and add '.snakemake' directories to '.gitignore'
- update wrong link in 'README.md'
- delete superfluous script documentation 'scripts/labkey_api.md'
- add Snakemake-specific file extension '.smk' to subworkflows
- remove non-deterministic workflow output from md5 sums

17818f4a

Jan 24, 2020

Fix params.direcionality for kallisto rule · ecc71f7d
BIOPZ-Gypas Foivos authored 5 years ago

ecc71f7d

update test input files · 4e99b664

BIOPZ-Gypas Foivos authored 5 years ago and

Alex Kanitz committed 5 years ago

- replace corrupt input files
- add script that runs locally named `run_test.sh` in the snakemake directory
- add script to CI/CD pipeline
- fix typo in `paired_end.snakefile`
- update samples.tsv with fake adapter `XXXXXXX` when we do not want to trim them

4e99b664

Dec 20, 2019
- Rename snakemake/paired_end.snakemake to snakemake/paired_end.snakefile. Fix... · 44d3f3e3
  BIOPZ-Gypas Foivos authored 5 years ago
  
  Rename snakemake/paired_end.snakemake to snakemake/paired_end.snakefile. Fix wiring of rules. Add fake tests.
  44d3f3e3
- Clean up unused code in snakemake/Snakefile. Add 2 threads in htseq_qa (paired... · 5094be70
  BIOPZ-Gypas Foivos authored 5 years ago
  
  Clean up unused code in snakemake/Snakefile. Add 2 threads in htseq_qa (paired end mode). Add example of tsv files from LabKey.
  5094be70
- Remove prepare_annotation directory. Rename process_data to snakemake. · cb46aa34
  BIOPZ-Gypas Foivos authored 5 years ago
  
  cb46aa34
- Paired end sequencing analysis subpipeline · dda2fb8d
  BIOPZ-Katsantoni Maria authored 5 years ago
  
  dda2fb8d