Skip to content
Snippets Groups Projects
  1. Apr 27, 2020
    • Alex Kanitz's avatar
      Refactor LabKey to Snakemake script · 556f1e12
      Alex Kanitz authored
      - clean up command line interface
        - improve descriptions
        - add consistent structure
        - remove or merge superfluous CLI arguments
        - set defaults
        - update test calls
        - update docs
        - when importing data from LabKey, table is saved to 'samples.tsv.labkey' in same directory as Snakemake sample table
      - allow user to specify environment variables and relative paths in input table and on CLI
        - relative paths in the input table are interpreted with respect to the directory containing the input table
        - relative paths will are interpreted with respect to the current working directory; this is to achieve portability with respect to tests but is discouraged in production because its behavior is not very predictable from the user's perspective; consequently a warning is thrown
      - set STAR index size to read length - 1
      - remove `gtf_filtered` and `tr_fasta_filtered` and update Snakefiles and test sample tables accordingly
      - rename some MultiQC report-related parameters and update Snakefiles and test config files accordingly
      - add logging
      - add docstrings to module and all functions
      - add typing definitions to all functions
      - restructure and comment code to improve readability
      - linters `flake8` and `mypy` pass
      556f1e12
    • BIOPZ-Katsantoni Maria's avatar
      Major refactoring · 6cf28511
      BIOPZ-Katsantoni Maria authored and Alex Kanitz's avatar Alex Kanitz committed
      * Sequencing mode-related changes:
        * allowed sequencing modes in Snakemake input table changed from `paired_end` and `single_end` to `pe` and `se`, respectively
        * remove sequencing mode from output paths for each rule
        * corresponding wild cards removed entirely from all rules that do not depend on sequencing mode (currently all rules that are defined in the main `Snakefile` in the project root directory)
        * where absolutely necessary, sequencing mode is added as part of output file or directory instead
        * remove dependency of sequencing mode for rule for `FastQC`; now runs separately for each strand
      * Changes related to MultiQC and output file/directory structure
        * moving and renaming outputs for MultiQC is no longer required
        * code to create MultiQC custom config externalized into script `scripts/rhea_multiqc_config.py`
        * add MultiQC output files with deterministic output to md5 sum checks performed during execution of `tests/test_integration_workflow/test.{local,slurm}.sh`
        * output filenames for each rule now follow this general structure: `samples/{sample_name}/{rule}/{output_file}`
        * change log directory structure matches results directory structure
      * Miscellaneous changes
        * consistent, PEP8-compliant formatting in most parts, including Snakemake files, where allowed
        * remove rule `extract_decoys_salmon`; equivalent file `chrName.txt` produced by `star_index` is used instead
        * add rule `start` which copies sample data to the results directory and enforces uniform naming
        * refactoring of ALFA rules and modification of the CI/CD test to ensure compatibility
      6cf28511
  2. Mar 12, 2020
    • Dominik Burri's avatar
      added rule for · 37fb0fd0
      Dominik Burri authored
      - renaming bedgraph
      - creating ALFA qc plots
      
      removed conda dependence, moved import statement.
      
      included ALFA in finish rule, corrected annotation.gtf and config.yaml, created new .svg
      37fb0fd0
  3. Feb 20, 2020
    • Alex Kanitz's avatar
      create log directories in Snakefile\ · 5e1ec85e
      Alex Kanitz authored
      - log and, if workflow is executed on cluster, cluster log directories are explicitly created in `Snakefile`
      - location of main log directory can be configured in `config.yaml` (field `log_dir`, previously: `local_log`; requires change in script `labkey_to_snakemake.py` as well as subworkflows as field name is hard-coded there)
      - location of cluster log directory can be configured in `cluster.json` (in field `__default__` -> `out`)
      - `config.yaml` and `cluster.json` in `tests/input_files` are set such that a directory `logs/` is created in the directory where Snakemake is run (i.e., the directory of each test); cluster logs are stored in a subdirectory `logs/cluster`
      - removes instructions to explicitly create log directories from docs and all test scripts
      - cleans up main `Snakefile` (apart from Snakemake-specific syntax, now passes `flake8` linter test)
      5e1ec85e
  4. Feb 15, 2020
    • BIOPZ-Katsantoni Maria's avatar
      get Snakemake input from LabKey API · eea0206f
      BIOPZ-Katsantoni Maria authored and Alex Kanitz's avatar Alex Kanitz committed
      - add script that prepares Snakemake input files 'samples.tsv' and 'config.yaml' from LabKey table
      - script either connects to API directly (with '--remote' and related options) or processes a tab-separated LabKey dump file
      - add tests for both use cases
      - common input files for tests now in 'tests/input_files'
      - update all other tests to account for new file locations
      - update documentation
      eea0206f
Loading