- Oct 16, 2020
-
-
BIOPZ-Gypas Foivos authored
- Delete unused files scripts/fg_extract_transcripts.py, scripts/heatmap_and_clustermap.py, scripts/perform_PCA.py - Add rules (pca_kallisto, pca_salmon) that run zpca (https://github.com/zavolanlab/zpca) on genes and transcripts TPM tables from kallisto and salmon. - The output is wired to multiqc_report but the plots are not visualized to multiqc. Update documentation. - Update dag and rulegraph. Fixes #140 #142
-
- Jun 15, 2020
-
-
- Apr 27, 2020
-
-
Alex Kanitz authored
- clean up command line interface - improve descriptions - add consistent structure - remove or merge superfluous CLI arguments - set defaults - update test calls - update docs - when importing data from LabKey, table is saved to 'samples.tsv.labkey' in same directory as Snakemake sample table - allow user to specify environment variables and relative paths in input table and on CLI - relative paths in the input table are interpreted with respect to the directory containing the input table - relative paths will are interpreted with respect to the current working directory; this is to achieve portability with respect to tests but is discouraged in production because its behavior is not very predictable from the user's perspective; consequently a warning is thrown - set STAR index size to read length - 1 - remove `gtf_filtered` and `tr_fasta_filtered` and update Snakefiles and test sample tables accordingly - rename some MultiQC report-related parameters and update Snakefiles and test config files accordingly - add logging - add docstrings to module and all functions - add typing definitions to all functions - restructure and comment code to improve readability - linters `flake8` and `mypy` pass
-
* Sequencing mode-related changes: * allowed sequencing modes in Snakemake input table changed from `paired_end` and `single_end` to `pe` and `se`, respectively * remove sequencing mode from output paths for each rule * corresponding wild cards removed entirely from all rules that do not depend on sequencing mode (currently all rules that are defined in the main `Snakefile` in the project root directory) * where absolutely necessary, sequencing mode is added as part of output file or directory instead * remove dependency of sequencing mode for rule for `FastQC`; now runs separately for each strand * Changes related to MultiQC and output file/directory structure * moving and renaming outputs for MultiQC is no longer required * code to create MultiQC custom config externalized into script `scripts/rhea_multiqc_config.py` * add MultiQC output files with deterministic output to md5 sum checks performed during execution of `tests/test_integration_workflow/test.{local,slurm}.sh` * output filenames for each rule now follow this general structure: `samples/{sample_name}/{rule}/{output_file}` * change log directory structure matches results directory structure * Miscellaneous changes * consistent, PEP8-compliant formatting in most parts, including Snakemake files, where allowed * remove rule `extract_decoys_salmon`; equivalent file `chrName.txt` produced by `star_index` is used instead * add rule `start` which copies sample data to the results directory and enforces uniform naming * refactoring of ALFA rules and modification of the CI/CD test to ensure compatibility
-
-
- Mar 25, 2020
-
-
- Mar 21, 2020
-
-
Alex Kanitz authored
-
- Mar 20, 2020
-
-
- generate nucleotide distribution for unique reads only - new rule to generate PNG image for MultiQC
-
- Mar 19, 2020
-
-
- Mar 12, 2020
-
-
BIOPZ-Bak Maciej authored
-
Dominik Burri authored
moved input_files into top-layer test directory for consistency. corrected removal of test files
-
Dominik Burri authored
corrected md5sum for config.yaml remove unnecessary file
-
Dominik Burri authored
- renaming bedgraph - creating ALFA qc plots removed conda dependence, moved import statement. included ALFA in finish rule, corrected annotation.gtf and config.yaml, created new .svg
-
- Mar 06, 2020
-
-
BIOPZ-Gypas Foivos authored
Merged paired end and single end rules for star_rpm and index_genomic_alignment_samtools. Fixed wiring of calculate tin score: bam should be input and not params.
-
-
-
- Feb 24, 2020
-
-
- compile Salmon gene and transcript count summary tables across all samples in workflow run - add `pandas` to `install/environment.dev.yml` - update rule graph and DAG images
-
- Feb 21, 2020
-
-
Alex Kanitz authored
-
- Remove files with non-deterministic output from `tests/test_integration_workflow/expected_output.files` - Update MD5 sums in `tests/test_integration_workflow/expected_output.md5` - Update new workflow DAG and rule graph images
-
- fixes some functions in `labkey_to_snakemake.py` - add optional argument for trimming polyA tails; they are trimmed as follows: - if mate is sense, oligo-A is added to sample table for `cutadapt` rule to trim - if mate is antisense, oligo-T is added to sample table for `cutadapt` rule to trim - if option is set to `--trim_polya`, oligo-X stretch is added to sample table and `cutadapt` will not trim
-
- Feb 17, 2020
-
-
- add rule for input preparation (GTF to BED12) - add rule for TIN score calculation - update rule graph and DAG image - update Slurm cluster config
-
- Feb 15, 2020
-
-
- add script that prepares Snakemake input files 'samples.tsv' and 'config.yaml' from LabKey table - script either connects to API directly (with '--remote' and related options) or processes a tab-separated LabKey dump file - add tests for both use cases - common input files for tests now in 'tests/input_files' - update all other tests to account for new file locations - update documentation
-
- Feb 14, 2020
-
-
- separate organism genome architecture (different input folder) - change MD5 checksums to match the new output
-
- add script `tests/test_rule_graph/test.sh` to generate a rule graph in `images/rule_graph.svg` - display rule graph created in `README.md` instead of specific workflow DAG - add test script to GitLab CI config - renamed test to create workflow DAG from `test_create_dag_chart` to `test_create_dag_image` (also output file is renamed from `images/workflow_dag.svg` to `images/dag_test_workflow.svg`
-
- Feb 09, 2020
-
-
Alex Kanitz authored
- replaces existing larger libraries and annotations in test cases `test_create_dag_chart` and `test_integration_workflow` - adds the following new test files: - `chr1-10000-20000.fa`: artificial chromosome of length 10'000 (based on human chromosome 1) - `chr1-10000-20000.gtf`: matching gene annotation file with two gene and three multi-exon transcripts entries - `chr1-10000-20000.transcripts.fa`: sequences of the transcripts listed in the gene annotation file - `synthetic.mate_?.fastq.gz`: 10 read pairs randomly sampled from the genic regions of the artificial chromosome - `synthetic.*.bed`: BED files with expected alignments for each read; names of overlapping genes are specified in a 7th column - updates file paths in the relevant sample tables - extends and updates checksum checking of result files in CI/CD pipeline
-
- Feb 08, 2020
- Feb 04, 2020
-
-
Alex Kanitz authored
- set up integration test for Snakefile in dedicated folder; current test case was left untouched for the time being, despite requiring large input files - set up DAG chart creation test in dedicated folder; script creates an SVG representation of the workflow DAG at `images/workflow_dag.svg` - both tests have been added to the GitLab CI/CD configuration; the latter test ensures that always the latest version of the - all tests are now located inside subdirectories of `tests/`; test scripts and configuration files for test runs etc. have been moved to the appropriate test directories - for the time being, required input files for each test are placed within the individual test directories; a layout for common test files should be introduced later and paths and bind paths in tests adapted - make script `scripts/labkey_api.py` executable
-