# Riboseq pipeline ## Requirements * wget * git * singularity * Slurm Workload Manager ## Features Pipeline for Ribo-Seq data. It consists of two snakemake workflows: * prepare_annotation: Prepares the annotation files * process_data: Processes the Ribo-Seq data ## Installation The recommended way is to create a virtual environment via conda and install the snakemake dependenies. **In order to run the workflows you need to run it in a system where singularity is available.** ### Step 1: Download miniconda 3 installation file (if not already installed) for Linux: ``` wget https://repo.continuum.io/miniconda/Miniconda3-latest-Linux-x86_64.sh ``` ### Step 2: Install miniconda 3 Make sure that you run the 'bash' shell and execute: for Linux: ``` bash Miniconda3-latest-Linux-x86_64.sh ``` ### Step 3: Create a new conda environment Create a new conda environment ``` conda create --name riboseq_pipeline --channel bioconda --channel conda-forge snakemake=4.8.1 ``` Activate the virtual environment ``` conda activate riboseq_pipeline ``` You can deactivate later the virtual environment as ``` conda deactivate ``` Check if snakemake was installed properly ``` snakemake --help ``` ### Step 4: Clone the repository ``` git clone ssh://git@git.scicore.unibas.ch:2222/AnnotationPipelines/riboseq_pipeline.git ``` ## Configure pipeline ### Download annotation files Go in the snakemake directory and create a new annotation directory ``` cd riboseq_pipeline/snakemake mkdir annotation ``` Download an annotation file (e.g. gtf from ENSEMBL) and uncompress it ``` wget ftp://ftp.ensembl.org/pub/release-90/gtf/homo_sapiens/Homo_sapiens.GRCh38.90.chr.gtf.gz gunzip Homo_sapiens.GRCh38.90.chr.gtf.gz ``` Download a chromosome sequences file (e.g. soft-masked fasta from ENSEMBL) and uncompress it ``` wget ftp://ftp.ensembl.org/pub/release-90/fasta/homo_sapiens/dna/Homo_sapiens.GRCh38.dna_sm.primary_assembly.fa.gz gunzip Homo_sapiens.GRCh38.dna_sm.primary_assembly.fa.gz ``` Then download rRNAs (e.g. from RefSeq). For the members of the group you can use the following file: ``` cp /scicore/home/zavolan/gypas/projects/resources/human/rRNA/txome_rRNAs_joao.fa . ``` Finally copy or create a file with oligos. For the members of the group you can use the following file: ``` cp /scicore/home/zavolan/gypas/projects/resources/human/rRNA/oligos.txt . ``` ### Configure and run workflows As mentioned earlier two snakemake workflows are available. One that prepares the annotation files (e.g. generation of index files etc) and one that processes the Ribo-Seq data. #### Prepare annotation workflow First of all go to the 'snakemake/prepare_annotation' directory and fill in the 'config.yaml' file. To make sure that everything is configured properly create a dag of the workflow. ``` bash create_snakemake_flowchart.sh ``` And finally run the pipeline. This script is configured for the Slurm Workload Manager ``` nohup bash run_snakefile.sh & ``` #### Process data workflow Once the prepare_annotation pipeline is complete you can move to the 'snakemake/process_data' directory. Copy or create a hard link of the Ribo-Seq samples you want to process in the 'samples' directory. Fill in the config.yaml file. Create the dag ``` bash create_snakemake_flowchart.sh ``` And finally run the pipeline. This script is configured for the Slurm Workload Manager ``` nohup bash run_snakefile.sh & ```