Skip to content
Snippets Groups Projects
Commit 8742cd72 authored by Alex Kanitz's avatar Alex Kanitz
Browse files

install dependencies via Conda

- add 3 separate Conda environment files for installing
  - essential dependencies without Singularity (most users)
  - essential dependencies including Singularity (root users on Linux only)
  - non-essential dependencies (required for tests etc.)
- install Python dependencies via `pip` _within_ Conda environment to avoid #56
- update Conda-based installation instructions
  - describe usage of different environment files
  - add supported OS versions (macOS not yet added, because only installation works)
- update CI config to use same setup (based on Miniconda Docker image)
parent 4ad34d82
Branches
Tags
1 merge request!29Add pipeline updates into MultiQC dev branch
image: snakemake/snakemake:v5.9.1 image: continuumio/miniconda3:4.7.12
before_script: before_script:
- apt update - apt update && apt install -y gcc
- apt install -y unzip bedtools - conda init bash && source ~/.bashrc && echo $CONDA_DEFAULT_ENV
- pip install -r scripts/requirements.txt - conda env create -f install/environment.root.yml
- conda activate rnaseq_pipeline && echo $CONDA_DEFAULT_ENV
- conda env update -f install/environment.dev.yml
test: test:
script: script:
......
# RNA-Seq pipeline # RNA-Seq pipeline
[Snakemake] workflow for general purpose RNA-Seq library annotation developed [Snakemake][snakemake] workflow for general purpose RNA-Seq library annotation
by the [Zavolan lab]. developed by the [Zavolan lab][zavolan-lab].
Reads are processed, aligned, quantified and analyzed with state-of-the-art Reads are processed, aligned, quantified and analyzed with state-of-the-art
tools to give meaningful initial insights into various aspects of an RNA-Seq tools to give meaningful initial insights into various aspects of an RNA-Seq
library while cutting down on hands-on time for bioinformaticians. library while cutting down on hands-on time for bioinformaticians.
The scheme below is a visual representation of an example run of the Below is a visual representation of the individual workflow steps ("pe"
workflow: refers to "paired-end"):
> ![rule_graph](images/rule_graph.svg) > ![rule_graph][rule-graph]
## Requirements
Currently the workflow is only available for Linux distributions. It was tested
on the following distributions:
- CentOS 7.5
- Debian 10
- Ubuntu 16.04, 18.04
## Installation ## Installation
...@@ -24,69 +33,82 @@ git clone ssh://git@git.scicore.unibas.ch:2222/zavolan_group/pipelines/rnaseqpip ...@@ -24,69 +33,82 @@ git clone ssh://git@git.scicore.unibas.ch:2222/zavolan_group/pipelines/rnaseqpip
cd rnaseqpipeline cd rnaseqpipeline
``` ```
### Installing Singularity ### Installing Conda
For improved reproducibility and reusability of the workflow, as well as an Workflow dependencies can be conveniently installed with the [Conda][conda]
easy means to run it on a high performance computing (HPC) cluster managed, package manager. We recommend that you install
e.g., by [Slurm], each individual step of the workflow runs in its own [Miniconda][miniconda-installation] for your system (Linux). Be sure to select
container. Specifically, containers are created out of [Singularity] images Python 3 option. The workflow was built and tested with `miniconda 4.7.12`.
built for each software used within the workflow. As a consequence, running Other versions are not guaranteed to work as expected.
this workflow has very few individual dependencies. It does, however, require
that Singularity be installed. See the links below for installation
instructions for the most up-to-date (as of writing) as well as for the tested
version (2.6.1) of Singularity:
### Installing dependencies
- [Singularity v3.5](https://sylabs.io/guides/3.5/user-guide/quick_start.html) For improved reproducibility and reusability of the workflow,
- [Singularity v2.6](https://sylabs.io/guides/2.6/user-guide/installation.html) each individual step of the workflow runs in its own [Singularity][singularity]
container. As a consequence, running this workflow has very few
individual dependencies. It does, however, require Singularity to be installed
on the system running the workflow. As the functional installation of
Singularity requires root privileges, and Conda currently only provides
Singularity for Linux architectures, the installation instructions are
slightly different depending on your system/setup:
If you have root privileges, you can directly install Singularity together with snakemake in a virtual environment (see next section) #### For most users
### Setting up a Snakemake virtual environment If you do *not* have root privileges on the machine you want to run the
workflow on *or* if you do not have a Linux machine, please [install
Singularity][singularity-install] separately and in privileged mode, depending
on your system. You may have to ask an authorized person (e.g., a systems
administrator) to do that. This will almost certainly be required if you want
to run the workflow on a high-performance computing (HPC) cluster. We have
successfully tested the workflow with the following Singularity versions:
In addition to Singularity, [Snakemake] needs to be installed. We strongly - `v2.4.5`
recommended to do so via a virtual environment. Here we describe the steps - `v2.6.2`
necessary to set up such a virtual environment with a recent version (v4.4+) of - `v3.5.2`
the `conda` package manager. If you prefer to use another solution, such as
`virtualenv`, adapt the steps according to the specific instructions of your After installing Singularity, install the remaining dependencies with:
preferred solution.
```bash
conda env create -f install/environment.yml
```
If you do not have `conda` installed for Python3, we recommend to install the #### As root user on Linux
minimal version (Python and package manager) [Miniconda] (see the link for
installation instructions). Be sure to select the correct version for your
operating system and ensure that you select the Python 3 option.
To create and activate a snakemake environment, run: If you have a Linux machine, as well as root privileges, (e.g., if you plan to
run the workflow on your own computer), you can execute the following command
to include Singularity in the Conda environment:
```bash ```bash
conda create -n rnaseq_pipeline \ conda env create -f install/environment.root.yml
-c bioconda \
-c conda-forge \
snakemake=5.10.0
conda activate rnaseq_pipeline
``` ```
or, to create a conda environment containing Snakemake AND Singularity (currently not working on MacOS): ### Activate environment
> Note: Singularity has to be installed as root, so wherever you don't have root privileges, use the installation methods described above! Activate the Conda environment with:
```bash ```bash
conda create -n rnaseq_pipeline \ conda activate rnaseq_pipeline
-c bioconda \
-c conda-forge \
snakemake=5.10.0 \
singularity=3.5.2
conda activate rnaseq_pipeline
``` ```
### Installing non-essential dependencies
All installation requirements should now be met with. Most tests have additional dependencies. If you are planning to run tests, you
will need to install these by executing the following command _in your active
Conda environment_:
```bash
conda env update -f install/environment.dev.yml
```
## Testing the installation ## Testing the installation
We have prepared several tests to check the integrity of the workflow. The We have prepared several tests to check the integrity of the workflow, its
most important one lets you execute the workflow on a small set of example components and non-essential processing scripts. These can be found in
input files. subdirectories of the `tests/` directory. The most critical of these tests
lets you execute the entire workflow on a small set of example input files.
Note that for this and other tests to complete without issues,
[additional dependencies](#installing-non-essential-dependencies) need to be
installed.
### Run workflow on local machine ### Run workflow on local machine
...@@ -98,7 +120,8 @@ bash tests/test_integration_workflow/test.local.sh ...@@ -98,7 +120,8 @@ bash tests/test_integration_workflow/test.local.sh
### Run workflow via Slurm ### Run workflow via Slurm
Execute the following command to run the test workflow on a Slurm-managed HPC. Execute the following command to run the test workflow on a Slurm-managed
high-performance computing (HPC) cluster:
```bash ```bash
bash tests/test_integration_workflow/test.slurm.sh bash tests/test_integration_workflow/test.slurm.sh
...@@ -297,10 +320,12 @@ Cycles | cycles ...@@ -297,10 +320,12 @@ Cycles | cycles
Molecule | molecule Molecule | molecule
Contaminant sequences | contaminant_seqs Contaminant sequences | contaminant_seqs
[conda]: <https://docs.conda.io/projects/conda/en/latest/index.html>
[cluster execution]: <https://snakemake.readthedocs.io/en/stable/executing/cluster-cloud.html#cluster-execution> [cluster execution]: <https://snakemake.readthedocs.io/en/stable/executing/cluster-cloud.html#cluster-execution>
[LabKey]: <https://www.labkey.com/> [LabKey]: <https://www.labkey.com/>
[Miniconda]: <https://docs.conda.io/en/latest/miniconda.html> [miniconda-installation]: <https://docs.conda.io/en/latest/miniconda.html>
[Snakemake]: <https://snakemake.readthedocs.io/en/stable/> [rule-graph]: images/rule_graph.svg
[snakemake]: <https://snakemake.readthedocs.io/en/stable/>
[Singularity]: <https://sylabs.io/singularity/> [Singularity]: <https://sylabs.io/singularity/>
[Slurm]: <https://slurm.schedmd.com/documentation.html> [Slurm]: <https://slurm.schedmd.com/documentation.html>
[Zavolan lab]: <https://www.biozentrum.unibas.ch/research/researchgroups/overview/unit/zavolan/research-group-mihaela-zavolan/> [zavolan-lab]: <https://www.biozentrum.unibas.ch/research/researchgroups/overview/unit/zavolan/research-group-mihaela-zavolan/>
name: rnaseq_pipeline
channels:
- bioconda
- conda-forge
- defaults
dependencies:
- bedtools=2.29.2
- unzip=6.0
- pip=20.0.2
- pip:
- biopython==1.76
- labkey==1.2.0
name: rnaseq_pipeline
channels:
- conda-forge
- defaults
dependencies:
- graphviz=2.40.1
- pip=20.0.2
- python=3.7.4
- singularity=3.5.2
- pip:
- pandas==1.0.1
- snakemake==5.10.0
name: rnaseq_pipeline
channels:
- defaults
dependencies:
- graphviz=2.40.1
- pip=20.0.2
- python=3.7.4
- pip:
- pandas==1.0.1
- snakemake==5.10.0
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Please register or to comment