diff --git a/README.md b/README.md new file mode 100644 index 0000000000000000000000000000000000000000..d189025ed5458cb2db04c32f5b2fa88a691eb27f --- /dev/null +++ b/README.md @@ -0,0 +1,127 @@ +# Riboseq pipeline + +## Requirements + +* wget +* git +* singularity +* Slurm Workload Manager + +## Features + +Pipeline for Ribo-Seq data. It consists of two snakemake workflows: +* prepare_annotation: Prepares the annotation files +* process_data: Processes the Ribo-Seq data + +## Installation + +The recommended way is to create a virtual environment via conda and install the snakemake dependenies. +**In order to run the workflows you need to run it in a system where singularity is available.** + +### Step 1: Download miniconda 3 installation file (if not already installed) + + +for Linux: +``` +wget https://repo.continuum.io/miniconda/Miniconda3-latest-Linux-x86_64.sh +``` + +### Step 2: Install miniconda 3 + +Make sure that you run the 'bash' shell and execute: + +for Linux: +``` +bash Miniconda3-latest-Linux-x86_64.sh +``` + +### Step 3: Create a new conda environment + +Create a new conda environment +``` +conda create --name riboseq_pipeline --channel bioconda --channel conda-forge snakemake=4.8.1 +``` + +Activate the virtual environment +``` +conda activate riboseq_pipeline +``` + +You can deactivate later the virtual environment as +``` +conda deactivate +``` + +Check if snakemake was installed properly +``` +snakemake --help +``` + +### Step 4: Clone the repository +``` +git clone ssh://git@git.scicore.unibas.ch:2222/AnnotationPipelines/riboseq_pipeline.git +``` + +## Configure pipeline + +### Download annotation files + +Go in the snakemake directory and create a new annotation directory +``` +cd riboseq_pipeline/snakemake +mkdir annotation +``` + +Download an annotation file (e.g. gtf from ENSEMBL) and uncompress it +``` +wget ftp://ftp.ensembl.org/pub/release-90/gtf/homo_sapiens/Homo_sapiens.GRCh38.90.chr.gtf.gz +gunzip Homo_sapiens.GRCh38.90.chr.gtf.gz +``` + +Download a chromosome sequences file (e.g. soft-masked fasta from ENSEMBL) and uncompress it +``` +wget ftp://ftp.ensembl.org/pub/release-90/fasta/homo_sapiens/dna/Homo_sapiens.GRCh38.dna_sm.primary_assembly.fa.gz +gunzip Homo_sapiens.GRCh38.dna_sm.primary_assembly.fa.gz +``` + +Then download rRNAs (e.g. from RefSeq). For the members of the group you can use the following file: +``` +cp /scicore/home/zavolan/gypas/projects/resources/human/rRNA/txome_rRNAs_joao.fa . +``` + +Finally copy or create a file with oligos. For the members of the group you can use the following file: +``` +cp /scicore/home/zavolan/gypas/projects/resources/human/rRNA/oligos.txt . +``` + +### Configure and run workflows + +As mentioned earlier two snakemake workflows are available. One that prepares the annotation files (e.g. generation of index files etc) and one that processes the Ribo-Seq data. + +#### Prepare annotation workflow + +First of all go to the 'snakemake/prepare_annotation' directory and fill in the 'config.yaml' file. To make sure that everything is configured properly create a dag of the workflow. + +``` +bash create_snakemake_flowchart.sh +``` + +And finally run the pipeline. This script is configured for the Slurm Workload Manager +``` +nohup bash run_snakefile.sh & +``` +#### Process data workflow + +Once the prepare_annotation pipeline is complete you can move to the 'snakemake/process_data' directory. Copy or create a hard link of the Ribo-Seq samples you want to process in the 'samples' directory. Fill in the config.yaml file. + +Create the dag +``` +bash create_snakemake_flowchart.sh +``` + +And finally run the pipeline. This script is configured for the Slurm Workload Manager +``` +nohup bash run_snakefile.sh & +``` + +