Commits · c2ed5f3d1cb1320067aa7543c4d7f7bd5eaa93f8 · zavolan_group / pipelines / ZARP

Feb 11, 2021
- MultiQC plugins for TIN scores and ALFA Fixes #138 · fab75506
  BIOPZ-Bak Maciej authored 4 years ago and BIOPZ-Gypas Foivos committed 4 years ago
  
  fab75506
Jun 23, 2020
- Merge kallisto rules: kallisto_merge_genes and kallisto_merge_transcript · 1f007e19
  BIOPZ-Iborra de Toledo Paula authored 4 years ago and BIOPZ-Gypas Foivos committed 4 years ago
  
  The rules rely on https://github.com/zavolanlab/merge_kallisto Update info in pipeline_documentation.md
  1f007e19
Apr 27, 2020

BIOPZ-Katsantoni Maria authored 5 years ago and

Alex Kanitz committed 4 years ago

* Sequencing mode-related changes:
  * allowed sequencing modes in Snakemake input table changed from `paired_end` and `single_end` to `pe` and `se`, respectively
  * remove sequencing mode from output paths for each rule
  * corresponding wild cards removed entirely from all rules that do not depend on sequencing mode (currently all rules that are defined in the main `Snakefile` in the project root directory)
  * where absolutely necessary, sequencing mode is added as part of output file or directory instead
  * remove dependency of sequencing mode for rule for `FastQC`; now runs separately for each strand
* Changes related to MultiQC and output file/directory structure
  * moving and renaming outputs for MultiQC is no longer required
  * code to create MultiQC custom config externalized into script `scripts/rhea_multiqc_config.py`
  * add MultiQC output files with deterministic output to md5 sum checks performed during execution of `tests/test_integration_workflow/test.{local,slurm}.sh`
  * output filenames for each rule now follow this general structure: `samples/{sample_name}/{rule}/{output_file}`
  * change log directory structure matches results directory structure
* Miscellaneous changes
  * consistent, PEP8-compliant formatting in most parts, including Snakemake files, where allowed
  * remove rule `extract_decoys_salmon`; equivalent file `chrName.txt` produced by `star_index` is used instead
  * add rule `start` which copies sample data to the results directory and enforces uniform naming
  * refactoring of ALFA rules and modification of the CI/CD test to ensure compatibility

6cf28511

Add rules for bigWig creation · 907082c3
CJHerrmann authored 5 years ago and Alex Kanitz committed 4 years ago

907082c3

Mar 19, 2020
- MultiQC · fd1e3123
  BIOPZ-Bak Maciej authored 5 years ago and Alex Kanitz committed 5 years ago
  
  fd1e3123
Mar 06, 2020
- Extract transcript sequences from genome (fasta file) and gene annotations (gtf file). Fixes #62 · 0def7b72
  BIOPZ-Iborra de Toledo Paula authored 5 years ago and BIOPZ-Gypas Foivos committed 5 years ago
  
  0def7b72
Feb 21, 2020

Add rule that combines TPM values from Salmon · bb1f9b8f

BIOPZ-Iborra de Toledo Paula authored 5 years ago and

Alex Kanitz committed 5 years ago

-  Remove files with non-deterministic output from `tests/test_integration_workflow/expected_output.files`
-  Update MD5 sums in `tests/test_integration_workflow/expected_output.md5`
-  Update new workflow DAG and rule graph images

bb1f9b8f

Feb 15, 2020

get Snakemake input from LabKey API · eea0206f

BIOPZ-Katsantoni Maria authored 5 years ago and

Alex Kanitz committed 5 years ago

- add script that prepares Snakemake input files 'samples.tsv' and 'config.yaml' from LabKey table
- script either connects to API directly (with '--remote' and related options) or processes a tab-separated LabKey dump file
- add tests for both use cases
- common input files for tests now in 'tests/input_files'
- update all other tests to account for new file locations
- update documentation

eea0206f

Feb 09, 2020

replace test files with small synthetic ones · 48e012a0

Alex Kanitz authored 5 years ago

- replaces existing larger libraries and annotations in test cases `test_create_dag_chart` and `test_integration_workflow`
- adds the following new test files:
  - `chr1-10000-20000.fa`: artificial chromosome of length 10'000 (based on human chromosome 1)
  - `chr1-10000-20000.gtf`: matching gene annotation file with two gene and three multi-exon transcripts entries
  - `chr1-10000-20000.transcripts.fa`: sequences of the transcripts listed in the gene annotation file
  - `synthetic.mate_?.fastq.gz`: 10 read pairs randomly sampled from the genic regions of the artificial chromosome
  - `synthetic.*.bed`: BED files with expected alignments for each read; names of overlapping genes are specified in a 7th column
- updates file paths in the relevant sample tables
- extends and updates checksum checking of result files in CI/CD pipeline

48e012a0