From 54d45d79beef3298c575e839b06af2f5ad1d08d3 Mon Sep 17 00:00:00 2001 From: BIOPZ-Herrmann Christina <christina.herrmann@unibas.ch> Date: Fri, 6 Mar 2020 17:50:55 +0100 Subject: [PATCH] added star_rpm_paired_end --- pipeline_steps.md | 26 ++++++++++++++++++++++++-- 1 file changed, 24 insertions(+), 2 deletions(-) diff --git a/pipeline_steps.md b/pipeline_steps.md index 0a913fb..98f235b 100644 --- a/pipeline_steps.md +++ b/pipeline_steps.md @@ -5,6 +5,7 @@ * read samples table * create log directories * **create_index_star** +* **extract_transcriptome** * **create_index_salmon** * **create_index_kallisto** * **extract_transcripts_as_bed12** @@ -20,6 +21,7 @@ * **pe_index_genomic_alignment_samtools** * **pe_quantification_salmon** * **pe_genome_quantification_kallisto** +* **star_rpm_paired_end** @@ -76,6 +78,10 @@ Create index for STAR alignments. Supply the reference genome sequences (FASTA f **Parameters:** sjdbOverhang (This is the `index_size` specified in the samples table). **Output:** chrNameLength.txt will be used for STAR mapping; chrName.txt +#### extract_transcriptome +> TODO + + #### create_index_salmon Create index for Salmon quantification. If you want to use Salmon in mapping-based mode, then you first have to build a salmon index for your transcriptome. This will build the mapping-based index, using an auxiliary k-mer hash over k-mers of length 31. While the mapping algorithms will make use of arbitrarily long matches between the query and reference, the k size selected here will act as the minimum acceptable length for a valid match. Thus, a smaller value of k may slightly improve sensitivty. We find that a k of 31 seems to work well for reads of 75bp or longer, but you might consider a smaller k if you plan to deal with shorter reads. [Salmon manual](https://salmon.readthedocs.io/en/latest/salmon.html) @@ -203,7 +209,7 @@ Spliced Transcripts Alignment to a Reference #### (pe_)index_genomic_alignment_samtools Index the genomic alignment with [samtools index](http://quinlanlab.org/tutorials/samtools/samtools.html#samtools-index). Indexing a genome sorted BAM file allows one to quickly extract alignments overlapping particular genomic regions. Moreover, indexing is required by genome viewers such as IGV so that the viewers can quickly display alignments in each genomic region to which you navigate. -Needed for TIN score calculation. +Needed for TIN score calculation and bedgraph coverage calculation. **Input:** bam file **Output:** bam.bai index file @@ -246,4 +252,20 @@ Needed for TIN score calculation. *additionally for single end:* * -l: fragment length, user specified as `mean` -* -s: fragment length SD, user specified as `sd` \ No newline at end of file +* -s: fragment length SD, user specified as `sd` + + + +#### star_rpm_paired_end +Create stranded bedgraph coverage with STARs RPM normalisation. +Described [here](https://ycl6.gitbooks.io/rna-seq-data-analysis/visualization.html) + +**Input:** .bam, .bam.bai index +**Output:** coverage bedGraphs + +**Arguments not influencable by user:** +--outWigStrans "Stranded" +--outWigNorm "RPM" + + +*Same for single- and paired-end.* \ No newline at end of file -- GitLab