Skip to content
Snippets Groups Projects
Commit 9a3f646e authored by Ticlla Ccenhua Monica Roxana's avatar Ticlla Ccenhua Monica Roxana
Browse files

update README for proper installation and usage

parent fff348a2
Branches
No related tags found
No related merge requests found
# MetagenomicSnake
# MetaSnk
[![Snakemake](https://img.shields.io/badge/snakemake-≥5.4.4-brightgreen.svg)](https://snakemake.bitbucket.io)
[![Build Status](https://travis-ci.org/snakemake-workflows/metagenomicsnake.svg?branch=master)](https://travis-ci.org/snakemake-workflows/metagenomicsnake)
## Description
MetagenomicSnake is a Snakemake workflow for the analysis of metagenomic datasets from human microbiomes.
MetaSnk is a modularized Snakemake workflow for the analysis of metagenomic datasets from human microbiomes.
### Modules:
- [rawQC](README_rawQC.md)
- [preQC](README_preQC.md)
- taxProf
## Authors
......@@ -17,8 +22,10 @@ MetagenomicSnake is a Snakemake workflow for the analysis of metagenomic dataset
- Paired-end Illumina sequences
### Dependencies
- Snakemake >= 5.4.4
- Snakemake >= 5.5.0
- Singularity >= 2.6
- python >= 3.6.8
- conda >= 4.6
## Usage
......@@ -27,40 +34,61 @@ MetagenomicSnake is a Snakemake workflow for the analysis of metagenomic dataset
#### Step 1: Install workflow
If you simply want to use this workflow, download and extract the [latest release](https://github.com/snakemake-workflows/metagenomicsnake/releases).
git clone https://git.scicore.unibas.ch/TBRU/MetagenomicSnake.git path/to/MetaSnk
cd path/to/MetaSnk
echo -e "#MetaSnk directory\nmetasnk=$(pwd)\nexport metasnk">>$HOME/.bashrc
source $HOME/.bashrc
If you intend to modify and further extend this workflow or want to work under version control, fork this repository as outlined in [Advanced](#advanced). The latter way is recommended.
In any case, if you use this workflow in a paper, don't forget to give credits to the authors by citing the URL of this repository and, if available, its DOI (see above).
#### Step 2: Create minimal environment
Some rules will use this environment.
conda env create -f ./envs/MetaSnk.yaml
#### Step 2: Configure workflow
Configure the workflow according to your needs via editing the file `config.yaml`.
##### Basic configuration
- Make a copy of the config.yaml:
```
cp ./config.yaml path/to/my_config.yaml
```
#### Step 3: Execute workflow
Activate the environment via
conda activate MetaSnk
Test your configuration by performing a dry-run via
snakemake --use-conda -n
snakemake --use-singularity -n
Execute the workflow locally via
snakemake --use-conda --cores $N
snakemake --use-singularity --cores $N
using `$N` cores or run it in a cluster environment via
using `$N` cores or run it in a cluster environment controlled by SGE (Sun Grid Engine) via
snakemake --use-conda --cluster qsub --jobs 100
snakemake --use-singularity --cluster qsub --jobs 100
or
snakemake --use-conda --drmaa --jobs 100
snakemake --use-singularity --drmaa --jobs 100
If you not only want to fix the software stack but also the underlying OS, use
or, in a cluster environment controlled by SLURM workload manager via
snakemake --use-conda --use-singularity
snakemake --profile ./profiles/slurm --use-singularity
in combination with any of the modes above.
See the [Snakemake documentation](https://snakemake.readthedocs.io/en/stable/executable.html) for further details.
# Step 4: Investigate results
#### Step 4: Investigate results
After successful execution, you can create a self-contained interactive HTML report with all results via:
......@@ -87,3 +115,5 @@ The following recipe provides established best practices for running and extendi
## Testing
Tests cases are in the subfolder `.test`. They are automtically executed via continuous integration with Travis CI.
snakemake --use-singularity -n -s Snakefile_test
# MetaSnk: preQC
## Description
FastQC only performs a quality check but no QC processing is done. The **preQC**
rule runs a multi-step pre-processing of the paired fastq files, it includes:
- **trim_adapters**: adapter-trimming with "fastp". Fastp performs a quality check
and both paired fastq files are processed as follows\:
+ remove adapters: here we provide the Nextera XT adapters,
+ base correction in overlapped regions
+ trimming of the last base in read 1
+ discard reads shorter than a minimum length, after trimming
+ a report with quality check, before and after processing
- **filter_human**: removal of reads derived from human DNA with BBTools' [bbsplit]
- **dedupe**: removal of duplicated reads with BBTools' [clumpify]
- **trim_3end**: 3\'-end quality trimming with "fastp"
- **concatenate_fastqs**: merges fastq files corresponding to the same sample into a single pair of fastq files
- **summarize_preQC**: creates summarizing tables and plots
[bbsplit]: http://seqanswers.com/forums/showthread.php?t=41288
[clumpify]: https://jgi.doe.gov/data-and-tools/bbtools/bb-tools-user-guide/clumpify-guide/
<div style="text-align:center">
<img src="./images/preQC_dag.png" width="350" height="500" />
</div>
## Usage
### Case 1:
From within the MetaSnk directory (i.e where it was installed, where the Snakefile is located):
snakemake --use-singularity --directory='path/to/workdir' -n preQC
--directory : specifies the working directory and it is where snakemake will store its files for tracking the status of the workflow before/during/after execution.
After preQC is completed we can generate a html report:
snakemake --directory='path/to/workdir' -n preQC_make_report
The report will be created in the path specified for OUT_DIR in the configuration file.
### Case2:
Execute preQC from a working directory outside the MetaSnk directory:
snakemake --use-singularity -s path/to/MetaSnk/Snakefile -n preQC
# MetaSnk: rawQC
## Description
It runs FastQC on a random sample of R1 reads from the paired fastq-format files. MultiQC generates a single report per dataset.
rawQC is an independent module, its output files are not required as inputs in other MetaSnk rules.
<div style="text-align:center">
<img src="./images/rawQC_dag.png" width="130" height="250" />
</div>
## Usage
### Case 1:
From within the MetaSnk directory (i.e where it was installed, where the Sbakefile is located):
snakemake --use-singularity --directory='path/to/workdir' -n rawQC
--directory : specifies the working directory and it is where snakemake will store its files for tracking the status of the workflow before/during/after execution.
After rawQC is completed we can generate a html report:
snakemake --directory='path/to/workdir' -n rawQC_make_report
The report will be created in the path specified for OUT_DIR in the configuration file.
### Case2:
Execute rawQC from a working directory outside the MetaSnk directory:
snakemake --use-singularity -s path/to/MetaSnk/Snakefile -n rawQC
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Please register or to comment