Commits · c535a892cdc78677426191898dfd349ec3e0ccfb · zavolan_group / pipelines / ZARP

Oct 16, 2020

- Delete unused files scripts/fg_extract_transcripts.py,... · c535a892

- Delete unused files scripts/fg_extract_transcripts.py, scripts/heatmap_and_clustermap.py, scripts/perform_PCA.py
- Add rules (pca_kallisto, pca_salmon) that run zpca (https://github.com/zavolanlab/zpca) on genes and transcripts TPM tables from kallisto and salmon.
- The output is wired to multiqc_report but the plots are not visualized to multiqc. Update documentation.
- Update dag and rulegraph.
Fixes #140 #142

c535a892

Jun 15, 2020
- refactor: rename LabKey/input table column · 82c4dcea
  BIOPZ-Börsch Anastasiya authored 4 years ago and Alex Kanitz committed 4 years ago
  
  82c4dcea
Apr 27, 2020

Refactor LabKey to Snakemake script · 556f1e12

Alex Kanitz authored 4 years ago

- clean up command line interface
  - improve descriptions
  - add consistent structure
  - remove or merge superfluous CLI arguments
  - set defaults
  - update test calls
  - update docs
  - when importing data from LabKey, table is saved to 'samples.tsv.labkey' in same directory as Snakemake sample table
- allow user to specify environment variables and relative paths in input table and on CLI
  - relative paths in the input table are interpreted with respect to the directory containing the input table
  - relative paths will are interpreted with respect to the current working directory; this is to achieve portability with respect to tests but is discouraged in production because its behavior is not very predictable from the user's perspective; consequently a warning is thrown
- set STAR index size to read length - 1
- remove `gtf_filtered` and `tr_fasta_filtered` and update Snakefiles and test sample tables accordingly
- rename some MultiQC report-related parameters and update Snakefiles and test config files accordingly
- add logging
- add docstrings to module and all functions
- add typing definitions to all functions
- restructure and comment code to improve readability
- linters `flake8` and `mypy` pass

556f1e12

Major refactoring · 6cf28511

BIOPZ-Katsantoni Maria authored 4 years ago and

Alex Kanitz committed 4 years ago

* Sequencing mode-related changes:
  * allowed sequencing modes in Snakemake input table changed from `paired_end` and `single_end` to `pe` and `se`, respectively
  * remove sequencing mode from output paths for each rule
  * corresponding wild cards removed entirely from all rules that do not depend on sequencing mode (currently all rules that are defined in the main `Snakefile` in the project root directory)
  * where absolutely necessary, sequencing mode is added as part of output file or directory instead
  * remove dependency of sequencing mode for rule for `FastQC`; now runs separately for each strand
* Changes related to MultiQC and output file/directory structure
  * moving and renaming outputs for MultiQC is no longer required
  * code to create MultiQC custom config externalized into script `scripts/rhea_multiqc_config.py`
  * add MultiQC output files with deterministic output to md5 sum checks performed during execution of `tests/test_integration_workflow/test.{local,slurm}.sh`
  * output filenames for each rule now follow this general structure: `samples/{sample_name}/{rule}/{output_file}`
  * change log directory structure matches results directory structure
* Miscellaneous changes
  * consistent, PEP8-compliant formatting in most parts, including Snakemake files, where allowed
  * remove rule `extract_decoys_salmon`; equivalent file `chrName.txt` produced by `star_index` is used instead
  * add rule `start` which copies sample data to the results directory and enforces uniform naming
  * refactoring of ALFA rules and modification of the CI/CD test to ensure compatibility

6cf28511

Add rules for bigWig creation · 907082c3
CJHerrmann authored 4 years ago and Alex Kanitz committed 4 years ago

907082c3

Mar 25, 2020
- Update salmon transcriptome index generation · 4e3cac05
  BIOPZ-Bak Maciej authored 4 years ago and Alex Kanitz committed 4 years ago
  
  4e3cac05
Mar 21, 2020
- use minified container images · e9d50454
  Alex Kanitz authored 4 years ago
  
  e9d50454
Mar 20, 2020
- extend ALFA functionality · f5e2f6ac
  Dominik Burri authored 4 years ago and Alex Kanitz committed 4 years ago
  
  - generate nucleotide distribution for unique reads only - new rule to generate PNG image for MultiQC
  f5e2f6ac
Mar 19, 2020
- MultiQC · fd1e3123
  BIOPZ-Bak Maciej authored 4 years ago and Alex Kanitz committed 4 years ago
  
  fd1e3123
Mar 12, 2020

updated DAG and rulegraph · c3fffb6c
BIOPZ-Bak Maciej authored 5 years ago

c3fffb6c

replaced synthetic test by new one. · 46e6e00b

Dominik Burri authored 5 years ago

moved input_files into top-layer test directory for consistency.

corrected removal of test files

46e6e00b

included tests for ALFA qc · ad3a8e52
Dominik Burri authored 5 years ago
```
corrected md5sum for config.yaml

remove unnecessary file
```
ad3a8e52

added rule for · 37fb0fd0

Dominik Burri authored 5 years ago

- renaming bedgraph
- creating ALFA qc plots

removed conda dependence, moved import statement.

included ALFA in finish rule, corrected annotation.gtf and config.yaml, created new .svg

37fb0fd0

Mar 06, 2020
- Merged paired end and single end rules for star_rpm and... · fb784999
  BIOPZ-Gypas Foivos authored 5 years ago
  
  Merged paired end and single end rules for star_rpm and index_genomic_alignment_samtools. Fixed wiring of calculate tin score: bam should be input and not params.
  fb784999
- Generate bedgraph file of normalised coverage. Fixes #45 · a54ff3e8
  Dominik Burri authored 5 years ago and BIOPZ-Gypas Foivos committed 5 years ago
  
  a54ff3e8
- Extract transcript sequences from genome (fasta file) and gene annotations (gtf file). Fixes #62 · 0def7b72
  BIOPZ-Iborra de Toledo Paula authored 5 years ago and BIOPZ-Gypas Foivos committed 5 years ago
  
  0def7b72
Feb 24, 2020

Add rule that combines counts from Salmon · 357ed828

BIOPZ-Gypas Foivos authored 5 years ago and

Alex Kanitz committed 5 years ago

- compile Salmon gene and transcript count summary tables across all samples in workflow run
- add `pandas` to `install/environment.dev.yml`
- update rule graph and DAG images

357ed828

Feb 21, 2020

Use minified docker images · dc2afcf9
Alex Kanitz authored 5 years ago

dc2afcf9

Add rule that combines TPM values from Salmon · bb1f9b8f

BIOPZ-Iborra de Toledo Paula authored 5 years ago and

Alex Kanitz committed 5 years ago

-  Remove files with non-deterministic output from `tests/test_integration_workflow/expected_output.files`
-  Update MD5 sums in `tests/test_integration_workflow/expected_output.md5`
-  Update new workflow DAG and rule graph images

bb1f9b8f

handle polyA processing in input preparation script · c4e20a21

BIOPZ-Katsantoni Maria authored 5 years ago and

Alex Kanitz committed 5 years ago

- fixes some functions in `labkey_to_snakemake.py`
- add optional argument for trimming polyA tails; they are trimmed as follows:
  - if mate is sense, oligo-A is added to sample table for `cutadapt` rule to trim
  - if mate is antisense, oligo-T is added to sample table for `cutadapt` rule to trim
  - if option is set to `--trim_polya`, oligo-X stretch is added to sample table and `cutadapt` will not trim

c4e20a21

Feb 17, 2020

add TIN score calculation · c538fe8b

BIOPZ-Bak Maciej authored 5 years ago and

Alex Kanitz committed 5 years ago

- add rule for input preparation (GTF to BED12)
- add rule for TIN score calculation
- update rule graph and DAG image
- update Slurm cluster config

c538fe8b

Feb 15, 2020

get Snakemake input from LabKey API · eea0206f

BIOPZ-Katsantoni Maria authored 5 years ago and

Alex Kanitz committed 5 years ago

- add script that prepares Snakemake input files 'samples.tsv' and 'config.yaml' from LabKey table
- script either connects to API directly (with '--remote' and related options) or processes a tab-separated LabKey dump file
- add tests for both use cases
- common input files for tests now in 'tests/input_files'
- update all other tests to account for new file locations
- update documentation

eea0206f

Feb 14, 2020

LabKey-like input to Snakmake input · 979e6cdd
BIOPZ-Katsantoni Maria authored 5 years ago and Alex Kanitz committed 5 years ago
```
- separate organism genome architecture (different input folder)
- change MD5 checksums to match the new output
```
979e6cdd

display rule graph instead of DAG · ff08b9c3

CJHerrmann authored 5 years ago and

Alex Kanitz committed 5 years ago

- add script `tests/test_rule_graph/test.sh` to generate a rule graph in `images/rule_graph.svg`
- display rule graph created in `README.md` instead of specific workflow DAG
- add test script to GitLab CI config
- renamed test to create workflow DAG from `test_create_dag_chart` to `test_create_dag_image` (also output file is renamed from `images/workflow_dag.svg` to `images/dag_test_workflow.svg`

ff08b9c3

Feb 09, 2020

replace test files with small synthetic ones · 48e012a0

Alex Kanitz authored 5 years ago

- replaces existing larger libraries and annotations in test cases `test_create_dag_chart` and `test_integration_workflow`
- adds the following new test files:
  - `chr1-10000-20000.fa`: artificial chromosome of length 10'000 (based on human chromosome 1)
  - `chr1-10000-20000.gtf`: matching gene annotation file with two gene and three multi-exon transcripts entries
  - `chr1-10000-20000.transcripts.fa`: sequences of the transcripts listed in the gene annotation file
  - `synthetic.mate_?.fastq.gz`: 10 read pairs randomly sampled from the genic regions of the artificial chromosome
  - `synthetic.*.bed`: BED files with expected alignments for each read; names of overlapping genes are specified in a 7th column
- updates file paths in the relevant sample tables
- extends and updates checksum checking of result files in CI/CD pipeline

48e012a0

Feb 08, 2020

integrate completed rules into workflow · ca9a2bb4

BIOPZ-Iborra de Toledo Paula authored 5 years ago and

Alex Kanitz committed 5 years ago

- Rules now integrated include #7, #8, #9, #10, #11, #12, #13, #14, #15, #16, #17
- Applies both for single-end and paired-end subworkflows
- DAG image updated
- Test sample tables and outcomes updated
- Unused preprocessing subworkflow removed

Closes #48
Closes #6

ca9a2bb4

Feb 04, 2020

clean up tests · 6a231363

Alex Kanitz authored 5 years ago

- set up integration test for Snakefile in dedicated folder; current test case was left untouched for the time being, despite requiring large input files
- set up DAG chart creation test in dedicated folder; script creates an SVG representation of the workflow DAG at `images/workflow_dag.svg`
- both tests have been added to the GitLab CI/CD configuration; the latter test ensures that always the latest version of the
- all tests are now located inside subdirectories of `tests/`; test scripts and configuration files for test runs etc. have been moved to the appropriate test directories
- for the time being, required input files for each test are placed within the individual test directories; a layout for common test files should be introduced later and paths and bind paths in tests adapted
- make script `scripts/labkey_api.py` executable

6a231363