Riboseq pipeline
Requirements
- wget
- git
- singularity
- Slurm Workload Manager
Features
Pipeline for Ribo-Seq data. It consists of two snakemake workflows:
- prepare_annotation: Prepares the annotation files
- process_data: Processes the Ribo-Seq data
Installation
The recommended way is to create a virtual environment via conda and install the snakemake dependenies. In order to run the workflows you need to run it in a system where singularity is available.
Step 1: Download miniconda 3 installation file (if not already installed)
for Linux:
wget https://repo.continuum.io/miniconda/Miniconda3-latest-Linux-x86_64.sh
Step 2: Install miniconda 3
Make sure that you run the 'bash' shell and execute:
for Linux:
bash Miniconda3-latest-Linux-x86_64.sh
Step 3: Create a new conda environment
Create a new conda environment
conda create --name riboseq_pipeline --channel bioconda --channel conda-forge snakemake=4.8.1
Activate the virtual environment
conda activate riboseq_pipeline
You can deactivate later the virtual environment as
conda deactivate
Check if snakemake was installed properly
snakemake --help
Step 4: Clone the repository
git clone ssh://git@git.scicore.unibas.ch:2222/AnnotationPipelines/riboseq_pipeline.git
Configure pipeline
Download annotation files
Go in the snakemake directory and create a new annotation directory
cd riboseq_pipeline/snakemake
mkdir annotation
Download an annotation file (e.g. gtf from ENSEMBL) and uncompress it
wget ftp://ftp.ensembl.org/pub/release-90/gtf/homo_sapiens/Homo_sapiens.GRCh38.90.chr.gtf.gz
gunzip Homo_sapiens.GRCh38.90.chr.gtf.gz
Download a chromosome sequences file (e.g. soft-masked fasta from ENSEMBL) and uncompress it
wget ftp://ftp.ensembl.org/pub/release-90/fasta/homo_sapiens/dna/Homo_sapiens.GRCh38.dna_sm.primary_assembly.fa.gz
gunzip Homo_sapiens.GRCh38.dna_sm.primary_assembly.fa.gz
Then download rRNAs (e.g. from RefSeq). For the members of the group you can use the following file:
cp /scicore/home/zavolan/gypas/projects/resources/human/rRNA/txome_rRNAs_joao.fa .
Finally copy or create a file with oligos. For the members of the group you can use the following file:
cp /scicore/home/zavolan/gypas/projects/resources/human/rRNA/oligos.txt .
Configure and run workflows
As mentioned earlier two snakemake workflows are available. One that prepares the annotation files (e.g. generation of index files etc) and one that processes the Ribo-Seq data.
Prepare annotation workflow
First of all go to the 'snakemake/prepare_annotation' directory and fill in the 'config.yaml' file. To make sure that everything is configured properly create a dag of the workflow.
bash create_snakemake_flowchart.sh
And finally run the pipeline. This script is configured for the Slurm Workload Manager
nohup bash run_snakefile.sh &
Process data workflow
Once the prepare_annotation pipeline is complete you can move to the 'snakemake/process_data' directory. Copy or create a hard link of the Ribo-Seq samples you want to process in the 'samples' directory. Fill in the config.yaml file.
Create the dag
bash create_snakemake_flowchart.sh
And finally run the pipeline. This script is configured for the Slurm Workload Manager
nohup bash run_snakefile.sh &