Snippets Groups Projects

container path added to config

Christoph Stritt authored 1 year ago

b9c365ca

b9c365ca 1 year ago

Name	Last commit	Last update
assembly
variantcalling
.gitignore
README.md

Genome assembly and variant calling from PacBio HiFi reads

This folder contains two Snakemake workflows:

assembly: from PacBio HiFi consensus reads to annotated genome assemblies
variantcalling: combine assemblies into a pangenome graph and call variants from the graph

This is ongoing work, some things will change.

Requirements

On the sciCORE cluster, the pipeline is installed in the GROUP folder (/scicore/home/gagneux/GROUP/PacbioSnake) and ready to run.

In other contexts, four things need to be set up before the pipeline can be run:

Install Snakemake and Singularity
Build the singularity container for the assembly pipeline
Download the bakta database for genome annotation
Pull the singularity container for the variant calling pipeline

These steps are detailed below.

1. Install Snakemake and Singularity

As described on the Snakemake and the Singularity sites.

2. Build the singularity container for the assembly pipeline

cd assembly/container
sudo singularity build assemblySC.sif assemblySC.def

3. Download the bakta database for genome annotation

Light-weight (1.3 Gb) and full (33.1 Gb) databases for the bakta annotation tool can be downloaded from https://zenodo.org/records/7669534. The extracted folder should be located at assembly/resources/bakta_db/. Otherwise the path to the database can be modified in the assembly config file.

4. Pull the singularity container for the variant calling pipeline

singularity pull docker://ghcr.io/pangenome/pggb:latest