Skip to content
Snippets Groups Projects
BIOPZ-Gypas Foivos's avatar
BIOPZ-Gypas Foivos authored
added a new script for counting reads

See merge request AnnotationPipelines/riboseq_pipeline!2
070a81ec
History
Name Last commit Last update
snakemake
.gitignore
README.md

Riboseq pipeline

Requirements

  • wget
  • git
  • singularity
  • Slurm Workload Manager

Features

Pipeline for Ribo-Seq data. It consists of two snakemake workflows:

  • prepare_annotation: Prepares the annotation files
  • process_data: Processes the Ribo-Seq data

Installation

The recommended way is to create a virtual environment via conda and install the snakemake dependenies. In order to run the workflows you need to run it in a system where singularity is available.

Step 1: Download miniconda 3 installation file (if not already installed)

for Linux:

wget https://repo.continuum.io/miniconda/Miniconda3-latest-Linux-x86_64.sh

Step 2: Install miniconda 3

Make sure that you run the 'bash' shell and execute:

for Linux:

bash Miniconda3-latest-Linux-x86_64.sh

Step 3: Create a new conda environment

Create a new conda environment

conda create --name riboseq_pipeline --channel bioconda --channel conda-forge snakemake=4.8.1

Activate the virtual environment

conda activate riboseq_pipeline

You can deactivate later the virtual environment as

conda deactivate

Check if snakemake was installed properly

snakemake --help

Step 4: Clone the repository

git clone ssh://git@git.scicore.unibas.ch:2222/AnnotationPipelines/riboseq_pipeline.git

Configure pipeline

Download annotation files

Go in the snakemake directory and create a new annotation directory

cd riboseq_pipeline/snakemake
mkdir annotation

Download an annotation file (e.g. gtf from ENSEMBL) and uncompress it

wget ftp://ftp.ensembl.org/pub/release-90/gtf/homo_sapiens/Homo_sapiens.GRCh38.90.chr.gtf.gz
gunzip Homo_sapiens.GRCh38.90.chr.gtf.gz

Download a chromosome sequences file (e.g. soft-masked fasta from ENSEMBL) and uncompress it

wget ftp://ftp.ensembl.org/pub/release-90/fasta/homo_sapiens/dna/Homo_sapiens.GRCh38.dna_sm.primary_assembly.fa.gz
gunzip Homo_sapiens.GRCh38.dna_sm.primary_assembly.fa.gz

Then download rRNAs (e.g. from RefSeq). For the members of the group you can use the following file:

cp /scicore/home/zavolan/gypas/projects/resources/human/rRNA/txome_rRNAs_joao.fa .

Finally copy or create a file with oligos. For the members of the group you can use the following file:

cp /scicore/home/zavolan/gypas/projects/resources/human/rRNA/oligos.txt .

Configure and run workflows

As mentioned earlier two snakemake workflows are available. One that prepares the annotation files (e.g. generation of index files etc) and one that processes the Ribo-Seq data.

Prepare annotation workflow

First of all go to the 'snakemake/prepare_annotation' directory and fill in the 'config.yaml' file. To make sure that everything is configured properly create a dag of the workflow.

bash create_snakemake_flowchart.sh

And finally run the pipeline. This script is configured for the Slurm Workload Manager

nohup bash run_snakefile.sh &

Process data workflow

Once the prepare_annotation pipeline is complete you can move to the 'snakemake/process_data' directory. Copy or create a hard link of the Ribo-Seq samples you want to process in the 'samples' directory. Fill in the config.yaml file.

Create the dag

bash create_snakemake_flowchart.sh

And finally run the pipeline. This script is configured for the Slurm Workload Manager

nohup bash run_snakefile.sh &