update documentation
Compare changes
+ 60
− 57
@@ -21,6 +21,7 @@ This document describes the individual rules of the pipeline for information pur
@@ -38,7 +39,7 @@ This document describes the individual rules of the pipeline for information pur
The pipeline consists of three snakefiles: A main Snakefile and an individual Snakefile for each sequencing mode (single-end and paired-end), as parameters to individual tools differ between the sequencing modes. The main Snakefile contains some general rules for the creation of indices, rules that are applicable to both sequencing modes, and rules that deal with summary steps and combining results across samples of the run.
@@ -55,13 +56,17 @@ Parameter name | Description
@@ -74,18 +79,13 @@ soft_clip | "Local": standard local alignment with soft-clipping allowed. "EndTo
Create index for STAR alignments. Supply the reference genome sequences (FASTA files) and annotations (GTF file), from which STAR generates genome indexes that are utilized in the 2nd (mapping) step. The genome indexes are saved to disk and need only be generated once for each genome/annotation/index size combination. [STAR manual](http://labshare.cshl.edu/shares/gingeraslab/www-data/dobin/STAR/STAR.posix/doc/STARmanual.pdf#section.2)
Create index for STAR alignments. Supply the reference genome sequences (FASTA files) and annotations (GTF file), from which STAR generates genome indexes that are utilized in the 2nd (mapping) step. The genome indexes are saved to disk and are only be generated once for each genome/annotation/index size combination. [STAR manual](http://labshare.cshl.edu/shares/gingeraslab/www-data/dobin/STAR/STAR.posix/doc/STARmanual.pdf#section.2)
@@ -100,7 +100,7 @@ Create transcriptome from genome and gene annotations using [gffread](https://gi
Create index for [Salmon](https://salmon.readthedocs.io/en/latest/salmon.html) quantification. If you want to use Salmon in mapping-based mode, then you first have to build a salmon index for your transcriptome. This will build the mapping-based index, using an auxiliary k-mer hash over k-mers of length 31. While the mapping algorithms will make use of arbitrarily long matches between the query and reference, the k size selected here will act as the minimum acceptable length for a valid match. Thus, a smaller value of k may slightly improve sensitivty. Apparently a k of 31 seems to work well for reads of 75bp or longer, but you might consider a smaller k if you plan to deal with shorter reads.
Create index for [Salmon](https://salmon.readthedocs.io/en/latest/salmon.html) quantification. Salmon index of transcriptome, required for mapping-based mode of Salmon. The index is created via an auxiliary k-mer hash over k-mers of length 31. While mapping algorithms will make use of arbitrarily long matches between the query and reference, the k-mer size selected here will act as the minimum acceptable length for a valid match. A k-mer size of 31 seems to work well for reads of 75bp or longer, although smaller size might improve sensitivity. A smaller k-mer size is suggested when working with shorter reads.
@@ -108,58 +108,56 @@ Create index for [Salmon](https://salmon.readthedocs.io/en/latest/salmon.html) q
Index the genomic alignment with [samtools index](http://quinlanlab.org/tutorials/samtools/samtools.html#samtools-index). Indexing a genome sorted BAM file allows one to quickly extract alignments overlapping particular genomic regions. Moreover, indexing is required by genome viewers such as IGV so that the viewers can quickly display alignments in each genomic region to which you navigate.
Index the genomic alignment with [samtools index](http://quinlanlab.org/tutorials/samtools/samtools.html#samtools-index). Indexing a genome sorted BAM file enables quick extraction of alignments overlapping particular genomic regions. It is also required by genome viewers such as IGV allowing for quick display of read coverages in specific genomic regions chosen by the user.
Given a set of BAM files and a gene annotation BED file, calculates the Transcript Integrity Number (TIN) for each transcript. [GitLab repository](https://git.scicore.unibas.ch/zavolan_group/tools/tin_score_calculation). TIN is conceptually similar to RIN (RNA integrity number) but provides transcript level measurement of RNA quality and is more sensitive to measure low quality RNA samples:
Calculation of Transcript Integrity Number (TIN) for each transcript [GitLab repository](https://git.scicore.unibas.ch/zavolan_group/tools/tin_score_calculation). Requires a set of BAM files and a BED file containing the gene annotation. TIN is conceptually similar to RIN (RNA integrity number) but provides transcript level measurement of RNA quality and is more sensitive in measuring low quality RNA samples:
@@ -167,7 +165,7 @@ Given a set of BAM files and a gene annotation BED file, calculates the Transcri
@@ -175,7 +173,7 @@ Concatenates the tsv files of all samples into one wider table.
@@ -183,20 +181,20 @@ Generates sample-wise [boxplots](https://en.wikipedia.org/wiki/Box_plot) of TIN
@@ -204,40 +202,45 @@ Create ALFA index files used for running [ALFA](https://github.com/biocompibens/
@@ -246,7 +249,7 @@ Creates an interactive report after the pipeline is finished. [MultiQC](https://
[FastQC](https://www.bioinformatics.babraham.ac.uk/projects/fastqc/) aims to provide a simple way to do some quality control checks on raw sequence data coming from high throughput sequencing pipelines. It provides a modular set of analyses which you can use to give a quick impression of whether your data has any problems of which you should be aware before doing any further analysis.
@@ -254,11 +257,11 @@ Creates an interactive report after the pipeline is finished. [MultiQC](https://
@@ -297,7 +300,7 @@ Spliced Transcripts Alignment to a Reference; Read the [Publication](https://www
@@ -323,7 +326,7 @@ Spliced Transcripts Alignment to a Reference; Read the [Publication](https://www
--validateMappings: Enables selective alignment of the sequencing reads when mapping them to the transcriptome. This can improve both the sensitivity and specificity of mapping and, as a result, can [improve quantification accuracy](https://salmon.readthedocs.io/en/latest/salmon.html#validatemappings).