README.md

# Riboseq pipeline

## Requirements

* wget
* git
* singularity
* Slurm Workload Manager

## Features

Pipeline for Ribo-Seq data. It consists of two snakemake workflows:
* prepare_annotation: Prepares the annotation files
* process_data: Processes the Ribo-Seq data

## Installation

The recommended way is to create a virtual environment via conda and install the snakemake dependenies.
**In order to run the workflows you need to run it in a system where singularity is available.**

### Step 1: Download miniconda 3 installation file (if not already installed)


for Linux:
```
wget https://repo.continuum.io/miniconda/Miniconda3-latest-Linux-x86_64.sh
```

### Step 2: Install miniconda 3

Make sure that you run the 'bash' shell and execute:

for Linux:
```
bash Miniconda3-latest-Linux-x86_64.sh
```

### Step 3: Create a new conda environment

Create a new conda environment
```
conda create --name riboseq_pipeline --channel bioconda --channel conda-forge snakemake=4.8.1
```

Activate the virtual environment
```
conda activate riboseq_pipeline
```

You can deactivate later the virtual environment as
```
conda deactivate
```

Check if snakemake was installed properly
```
snakemake --help
```

### Step 4: Clone the repository
```
git clone ssh://git@git.scicore.unibas.ch:2222/AnnotationPipelines/riboseq_pipeline.git
```

## Configure pipeline

### Download annotation files

Go in the snakemake directory and create a new annotation directory
```
cd riboseq_pipeline/snakemake
mkdir annotation
```

Download an annotation file (e.g. gtf from ENSEMBL) and uncompress it
```
wget ftp://ftp.ensembl.org/pub/release-90/gtf/homo_sapiens/Homo_sapiens.GRCh38.90.chr.gtf.gz
gunzip Homo_sapiens.GRCh38.90.chr.gtf.gz
```

Download a chromosome sequences file (e.g. soft-masked fasta from ENSEMBL) and uncompress it
```
wget ftp://ftp.ensembl.org/pub/release-90/fasta/homo_sapiens/dna/Homo_sapiens.GRCh38.dna_sm.primary_assembly.fa.gz
gunzip Homo_sapiens.GRCh38.dna_sm.primary_assembly.fa.gz
```

Then download rRNAs (e.g. from RefSeq). For the members of the group you can use the following file:
```
cp /scicore/home/zavolan/gypas/projects/resources/human/rRNA/txome_rRNAs_joao.fa .
```

Finally copy or create a file with oligos. For the members of the group you can use the following file:
```
cp /scicore/home/zavolan/gypas/projects/resources/human/rRNA/oligos.txt .
```

### Configure and run workflows

As mentioned earlier two snakemake workflows are available. One that prepares the annotation files (e.g. generation of index files etc) and one that processes the Ribo-Seq data.

#### Prepare annotation workflow

First of all go to the 'snakemake/prepare_annotation' directory and fill in the 'config.yaml' file. To make sure that everything is configured properly create a dag of the workflow.

```
bash create_snakemake_flowchart.sh
```

And finally run the pipeline. This script is configured for the Slurm Workload Manager
```
nohup bash run_snakefile.sh &
```
#### Process data workflow

Once the prepare_annotation pipeline is complete you can move to the 'snakemake/process_data' directory. Copy or create a hard link of the Ribo-Seq samples you want to process in the 'samples' directory. Fill in the config.yaml file. 

Create the dag
```
bash create_snakemake_flowchart.sh
```

And finally run the pipeline. This script is configured for the Slurm Workload Manager
```
nohup bash run_snakefile.sh &
```