diff --git a/README.md b/README.md index 409479e1986ae6f00bed015ea33e57d2cbaddfa5..b255cc7e60b8c28c0d57a8d35c3284896649a670 100644 --- a/README.md +++ b/README.md @@ -1,11 +1,16 @@ -# MetagenomicSnake +# MetaSnk [](https://snakemake.bitbucket.io) [](https://travis-ci.org/snakemake-workflows/metagenomicsnake) ## Description -MetagenomicSnake is a Snakemake workflow for the analysis of metagenomic datasets from human microbiomes. +MetaSnk is a modularized Snakemake workflow for the analysis of metagenomic datasets from human microbiomes. + +### Modules: + - [rawQC](README_rawQC.md) + - [preQC](README_preQC.md) + - taxProf ## Authors @@ -17,8 +22,10 @@ MetagenomicSnake is a Snakemake workflow for the analysis of metagenomic dataset - Paired-end Illumina sequences ### Dependencies -- Snakemake >= 5.4.4 +- Snakemake >= 5.5.0 - Singularity >= 2.6 +- python >= 3.6.8 +- conda >= 4.6 ## Usage @@ -27,40 +34,61 @@ MetagenomicSnake is a Snakemake workflow for the analysis of metagenomic dataset #### Step 1: Install workflow If you simply want to use this workflow, download and extract the [latest release](https://github.com/snakemake-workflows/metagenomicsnake/releases). + + git clone https://git.scicore.unibas.ch/TBRU/MetagenomicSnake.git path/to/MetaSnk + cd path/to/MetaSnk + echo -e "#MetaSnk directory\nmetasnk=$(pwd)\nexport metasnk">>$HOME/.bashrc + source $HOME/.bashrc + If you intend to modify and further extend this workflow or want to work under version control, fork this repository as outlined in [Advanced](#advanced). The latter way is recommended. In any case, if you use this workflow in a paper, don't forget to give credits to the authors by citing the URL of this repository and, if available, its DOI (see above). +#### Step 2: Create minimal environment + +Some rules will use this environment. + + conda env create -f ./envs/MetaSnk.yaml + #### Step 2: Configure workflow Configure the workflow according to your needs via editing the file `config.yaml`. +##### Basic configuration +- Make a copy of the config.yaml: +``` + cp ./config.yaml path/to/my_config.yaml +``` + #### Step 3: Execute workflow +Activate the environment via + + conda activate MetaSnk Test your configuration by performing a dry-run via - snakemake --use-conda -n + snakemake --use-singularity -n Execute the workflow locally via - snakemake --use-conda --cores $N + snakemake --use-singularity --cores $N -using `$N` cores or run it in a cluster environment via +using `$N` cores or run it in a cluster environment controlled by SGE (Sun Grid Engine) via - snakemake --use-conda --cluster qsub --jobs 100 + snakemake --use-singularity --cluster qsub --jobs 100 or - snakemake --use-conda --drmaa --jobs 100 + snakemake --use-singularity --drmaa --jobs 100 -If you not only want to fix the software stack but also the underlying OS, use +or, in a cluster environment controlled by SLURM workload manager via - snakemake --use-conda --use-singularity + snakemake --profile ./profiles/slurm --use-singularity in combination with any of the modes above. See the [Snakemake documentation](https://snakemake.readthedocs.io/en/stable/executable.html) for further details. -# Step 4: Investigate results +#### Step 4: Investigate results After successful execution, you can create a self-contained interactive HTML report with all results via: @@ -87,3 +115,5 @@ The following recipe provides established best practices for running and extendi ## Testing Tests cases are in the subfolder `.test`. They are automtically executed via continuous integration with Travis CI. + +snakemake --use-singularity -n -s Snakefile_test diff --git a/README_preQC.md b/README_preQC.md new file mode 100644 index 0000000000000000000000000000000000000000..85d68d827b36205ac666694bdc586ed75109e499 --- /dev/null +++ b/README_preQC.md @@ -0,0 +1,46 @@ +# MetaSnk: preQC + +## Description +FastQC only performs a quality check but no QC processing is done. The **preQC** +rule runs a multi-step pre-processing of the paired fastq files, it includes: + +- **trim_adapters**: adapter-trimming with "fastp". Fastp performs a quality check + and both paired fastq files are processed as follows\: + + + remove adapters: here we provide the Nextera XT adapters, + + base correction in overlapped regions + + trimming of the last base in read 1 + + discard reads shorter than a minimum length, after trimming + + a report with quality check, before and after processing +- **filter_human**: removal of reads derived from human DNA with BBTools' [bbsplit] +- **dedupe**: removal of duplicated reads with BBTools' [clumpify] +- **trim_3end**: 3\'-end quality trimming with "fastp" +- **concatenate_fastqs**: merges fastq files corresponding to the same sample into a single pair of fastq files +- **summarize_preQC**: creates summarizing tables and plots + +[bbsplit]: http://seqanswers.com/forums/showthread.php?t=41288 +[clumpify]: https://jgi.doe.gov/data-and-tools/bbtools/bb-tools-user-guide/clumpify-guide/ + +<div style="text-align:center"> + <img src="./images/preQC_dag.png" width="350" height="500" /> +</div> + +## Usage + +### Case 1: +From within the MetaSnk directory (i.e where it was installed, where the Snakefile is located): + + snakemake --use-singularity --directory='path/to/workdir' -n preQC + +--directory : specifies the working directory and it is where snakemake will store its files for tracking the status of the workflow before/during/after execution. + +After preQC is completed we can generate a html report: + + snakemake --directory='path/to/workdir' -n preQC_make_report + +The report will be created in the path specified for OUT_DIR in the configuration file. + +### Case2: +Execute preQC from a working directory outside the MetaSnk directory: + + snakemake --use-singularity -s path/to/MetaSnk/Snakefile -n preQC diff --git a/README_rawQC.md b/README_rawQC.md new file mode 100644 index 0000000000000000000000000000000000000000..e1b7a53f35faf9c8f3bce667c936ad449adc5e42 --- /dev/null +++ b/README_rawQC.md @@ -0,0 +1,30 @@ +# MetaSnk: rawQC + +## Description +It runs FastQC on a random sample of R1 reads from the paired fastq-format files. MultiQC generates a single report per dataset. + +rawQC is an independent module, its output files are not required as inputs in other MetaSnk rules. + +<div style="text-align:center"> + <img src="./images/rawQC_dag.png" width="130" height="250" /> +</div> + +## Usage + +### Case 1: +From within the MetaSnk directory (i.e where it was installed, where the Sbakefile is located): + + snakemake --use-singularity --directory='path/to/workdir' -n rawQC + +--directory : specifies the working directory and it is where snakemake will store its files for tracking the status of the workflow before/during/after execution. + +After rawQC is completed we can generate a html report: + + snakemake --directory='path/to/workdir' -n rawQC_make_report + +The report will be created in the path specified for OUT_DIR in the configuration file. + +### Case2: +Execute rawQC from a working directory outside the MetaSnk directory: + + snakemake --use-singularity -s path/to/MetaSnk/Snakefile -n rawQC