- Jun 12, 2020
-
-
Alex Kanitz authored
-
- Apr 27, 2020
-
-
Alex Kanitz authored
- clean up command line interface - improve descriptions - add consistent structure - remove or merge superfluous CLI arguments - set defaults - update test calls - update docs - when importing data from LabKey, table is saved to 'samples.tsv.labkey' in same directory as Snakemake sample table - allow user to specify environment variables and relative paths in input table and on CLI - relative paths in the input table are interpreted with respect to the directory containing the input table - relative paths will are interpreted with respect to the current working directory; this is to achieve portability with respect to tests but is discouraged in production because its behavior is not very predictable from the user's perspective; consequently a warning is thrown - set STAR index size to read length - 1 - remove `gtf_filtered` and `tr_fasta_filtered` and update Snakefiles and test sample tables accordingly - rename some MultiQC report-related parameters and update Snakefiles and test config files accordingly - add logging - add docstrings to module and all functions - add typing definitions to all functions - restructure and comment code to improve readability - linters `flake8` and `mypy` pass
-
* Sequencing mode-related changes: * allowed sequencing modes in Snakemake input table changed from `paired_end` and `single_end` to `pe` and `se`, respectively * remove sequencing mode from output paths for each rule * corresponding wild cards removed entirely from all rules that do not depend on sequencing mode (currently all rules that are defined in the main `Snakefile` in the project root directory) * where absolutely necessary, sequencing mode is added as part of output file or directory instead * remove dependency of sequencing mode for rule for `FastQC`; now runs separately for each strand * Changes related to MultiQC and output file/directory structure * moving and renaming outputs for MultiQC is no longer required * code to create MultiQC custom config externalized into script `scripts/rhea_multiqc_config.py` * add MultiQC output files with deterministic output to md5 sum checks performed during execution of `tests/test_integration_workflow/test.{local,slurm}.sh` * output filenames for each rule now follow this general structure: `samples/{sample_name}/{rule}/{output_file}` * change log directory structure matches results directory structure * Miscellaneous changes * consistent, PEP8-compliant formatting in most parts, including Snakemake files, where allowed * remove rule `extract_decoys_salmon`; equivalent file `chrName.txt` produced by `star_index` is used instead * add rule `start` which copies sample data to the results directory and enforces uniform naming * refactoring of ALFA rules and modification of the CI/CD test to ensure compatibility
-
-
- Mar 25, 2020
-
-
- Mar 20, 2020
-
-
- generate nucleotide distribution for unique reads only - new rule to generate PNG image for MultiQC
-
In labkey_to_snakemake.py fixed the parameters so that there is 3p as well 5p polya feature for every mate, which can be matched to the -a -g -A and -G options of cutadapt depending on which is the sense or antisense mate the appropriate variable is populated and the rest of variables are filled with 'XXXXXXXXXXXX' which leads to no trimming by cutadapt. The poly-A trimming rules are fixed to contain all -a -g -A -G options.
-
- Mar 19, 2020
-
-
- Mar 17, 2020
-
-
- Mar 12, 2020
-
-
BIOPZ-Bak Maciej authored
-
Dominik Burri authored
moved input_files into top-layer test directory for consistency. corrected removal of test files
-
Dominik Burri authored
corrected md5sum for config.yaml remove unnecessary file
-
Dominik Burri authored
- renaming bedgraph - creating ALFA qc plots removed conda dependence, moved import statement. included ALFA in finish rule, corrected annotation.gtf and config.yaml, created new .svg
-
- Mar 06, 2020
-
-
BIOPZ-Gypas Foivos authored
-
-
-
- Feb 21, 2020
-
-
Alex Kanitz authored
-
- Remove files with non-deterministic output from `tests/test_integration_workflow/expected_output.files` - Update MD5 sums in `tests/test_integration_workflow/expected_output.md5` - Update new workflow DAG and rule graph images
-
- fixes some functions in `labkey_to_snakemake.py` - add optional argument for trimming polyA tails; they are trimmed as follows: - if mate is sense, oligo-A is added to sample table for `cutadapt` rule to trim - if mate is antisense, oligo-T is added to sample table for `cutadapt` rule to trim - if option is set to `--trim_polya`, oligo-X stretch is added to sample table and `cutadapt` will not trim
-
- Feb 20, 2020
-
-
Alex Kanitz authored
- log and, if workflow is executed on cluster, cluster log directories are explicitly created in `Snakefile` - location of main log directory can be configured in `config.yaml` (field `log_dir`, previously: `local_log`; requires change in script `labkey_to_snakemake.py` as well as subworkflows as field name is hard-coded there) - location of cluster log directory can be configured in `cluster.json` (in field `__default__` -> `out`) - `config.yaml` and `cluster.json` in `tests/input_files` are set such that a directory `logs/` is created in the directory where Snakemake is run (i.e., the directory of each test); cluster logs are stored in a subdirectory `logs/cluster` - removes instructions to explicitly create log directories from docs and all test scripts - cleans up main `Snakefile` (apart from Snakemake-specific syntax, now passes `flake8` linter test)
-
- Feb 18, 2020
-
-
Alex Kanitz authored
- trap call functionalized through cleanup() function - function added to all test scripts - function prints out exit status of last command before trap - flag `--verbose` added to Snakemake calls in all test scripts - script tests rename to follow naming convention 'test_script_<script_name>_<script_run_mode>
-
- Feb 17, 2020
-
-
- add rule for input preparation (GTF to BED12) - add rule for TIN score calculation - update rule graph and DAG image - update Slurm cluster config
-
- Feb 15, 2020
-
-
- add script that prepares Snakemake input files 'samples.tsv' and 'config.yaml' from LabKey table - script either connects to API directly (with '--remote' and related options) or processes a tab-separated LabKey dump file - add tests for both use cases - common input files for tests now in 'tests/input_files' - update all other tests to account for new file locations - update documentation
-
- Feb 14, 2020
-
-
Alex Kanitz authored
-
Alex Kanitz authored
-
- separate organism genome architecture (different input folder) - change MD5 checksums to match the new output
-
- add script `tests/test_rule_graph/test.sh` to generate a rule graph in `images/rule_graph.svg` - display rule graph created in `README.md` instead of specific workflow DAG - add test script to GitLab CI config - renamed test to create workflow DAG from `test_create_dag_chart` to `test_create_dag_image` (also output file is renamed from `images/workflow_dag.svg` to `images/dag_test_workflow.svg`
-
Alex Kanitz authored
add tests to workflow integration test `tests/test_workflow_integration` that - verify that STAR alignments match expected alignments (based on "ground truth" files) - verify that Salmon gene quantification assign the correct number of reads to each gene (based on "ground truth files) resolves #49
-
- Feb 09, 2020
-
-
Alex Kanitz authored
- replaces existing larger libraries and annotations in test cases `test_create_dag_chart` and `test_integration_workflow` - adds the following new test files: - `chr1-10000-20000.fa`: artificial chromosome of length 10'000 (based on human chromosome 1) - `chr1-10000-20000.gtf`: matching gene annotation file with two gene and three multi-exon transcripts entries - `chr1-10000-20000.transcripts.fa`: sequences of the transcripts listed in the gene annotation file - `synthetic.mate_?.fastq.gz`: 10 read pairs randomly sampled from the genic regions of the artificial chromosome - `synthetic.*.bed`: BED files with expected alignments for each read; names of overlapping genes are specified in a 7th column - updates file paths in the relevant sample tables - extends and updates checksum checking of result files in CI/CD pipeline
-
- Feb 08, 2020
- Feb 07, 2020
-
-
-
Alex Kanitz authored
- remove log files and add '.snakemake' directories to '.gitignore' - update wrong link in 'README.md' - delete superfluous script documentation 'scripts/labkey_api.md' - add Snakemake-specific file extension '.smk' to subworkflows - remove non-deterministic workflow output from md5 sums
-
- Feb 04, 2020
-
-
Alex Kanitz authored
`README.md` file describes - aim and background of the project (including the workflow DAG representation) - how to install requirements (including setting up a `conda` environment for the project) - how to execute the workflow run integration test - how to run the workflow on your own samples (including how to auto-generate required params from LabKey metadata) Additional minor changes: - minor changes in various test and related files, including updates of paths - root directory now includes subdirectory `runs/` for a user's workflow runs (contents not version-controlled)
-
Alex Kanitz authored
- set up integration test for Snakefile in dedicated folder; current test case was left untouched for the time being, despite requiring large input files - set up DAG chart creation test in dedicated folder; script creates an SVG representation of the workflow DAG at `images/workflow_dag.svg` - both tests have been added to the GitLab CI/CD configuration; the latter test ensures that always the latest version of the - all tests are now located inside subdirectories of `tests/`; test scripts and configuration files for test runs etc. have been moved to the appropriate test directories - for the time being, required input files for each test are placed within the individual test directories; a layout for common test files should be introduced later and paths and bind paths in tests adapted - make script `scripts/labkey_api.py` executable
-
- Feb 03, 2020
-
-
Adds script `scripts/labkey_to_snakemake.py` which - maps LabKey table fields to Snakemake parameters - assembles required parameters from the table data - infers required parameters from the input data - produces files `config.yaml` and `samples.tsv` required by the Snakemake pipeline A self-contained integration test for the script is located at `tests/test_scripts_labkey_to_snakemake` (execute script `test.sh`) and was added to the CI/CD pipeline. Note that intermittent changes to the `master` branch were merged into this branch to forego conflicts during merging. Closes #39
-
- Jan 24, 2020
-
-
- replace corrupt input files - add script that runs locally named `run_test.sh` in the snakemake directory - add script to CI/CD pipeline - fix typo in `paired_end.snakefile` - update samples.tsv with fake adapter `XXXXXXX` when we do not want to trim them
-
- Jan 10, 2020
-
-
- Dec 20, 2019
-
-
BIOPZ-Gypas Foivos authored
Rename snakemake/paired_end.snakemake to snakemake/paired_end.snakefile. Fix wiring of rules. Add fake tests.
-
BIOPZ-Gypas Foivos authored
Clean up unused code in snakemake/Snakefile. Add 2 threads in htseq_qa (paired end mode). Add example of tsv files from LabKey.
-