CJHerrmann
--- a/pipeline_steps.md

+ 24

− 2
+++ b/pipeline_steps.md

+ 24

− 2
 @@ -5,6 +5,7 @@
 @@ -5,6 +5,7 @@
 * read samples table
 * create log directories
 * **create_index_star**
+* **extract_transcriptome**
 * **create_index_salmon**
 * **create_index_kallisto**
 * **extract_transcripts_as_bed12**
 @@ -20,6 +21,7 @@
 @@ -20,6 +21,7 @@
 * **pe_index_genomic_alignment_samtools**
 * **pe_quantification_salmon**
 * **pe_genome_quantification_kallisto**
+* **star_rpm_paired_end**
 @@ -76,6 +78,10 @@ Create index for STAR alignments. Supply the reference genome sequences (FASTA f
 @@ -76,6 +78,10 @@ Create index for STAR alignments. Supply the reference genome sequences (FASTA f
 **Parameters:** sjdbOverhang (This is the `index_size` specified in the samples table).    
 **Output:** chrNameLength.txt will be used for STAR mapping; chrName.txt
+#### extract_transcriptome
+> TODO
 #### create_index_salmon
 Create index for Salmon quantification. If you want to use Salmon in mapping-based mode, then you first have to build a salmon index for your transcriptome. This will build the mapping-based index, using an auxiliary k-mer hash over k-mers of length 31. While the mapping algorithms will make use of arbitrarily long matches between the query and reference, the k size selected here will act as the minimum acceptable length for a valid match. Thus, a smaller value of k may slightly improve sensitivty. We find that a k of 31 seems to work well for reads of 75bp or longer, but you might consider a smaller k if you plan to deal with shorter reads. [Salmon manual](https://salmon.readthedocs.io/en/latest/salmon.html)   
 @@ -203,7 +209,7 @@ Spliced Transcripts Alignment to a Reference
 @@ -203,7 +209,7 @@ Spliced Transcripts Alignment to a Reference
 #### (pe_)index_genomic_alignment_samtools
 Index the genomic alignment with [samtools index](http://quinlanlab.org/tutorials/samtools/samtools.html#samtools-index). Indexing a genome sorted BAM file allows one to quickly extract alignments overlapping particular genomic regions. Moreover, indexing is required by genome viewers such as IGV so that the viewers can quickly display alignments in each genomic region to which you navigate.    
-Needed for TIN score calculation.    
+Needed for TIN score calculation and bedgraph coverage calculation.    
 **Input:** bam file    
 **Output:** bam.bai index file    
 @@ -246,4 +252,20 @@ Needed for TIN score calculation.
 @@ -246,4 +252,20 @@ Needed for TIN score calculation.
 *additionally for single end:*    
 * -l: fragment length, user specified as `mean`
 * -s: fragment length SD, user specified as `sd` 
 \ No newline at end of file
+#### star_rpm_paired_end
+Create stranded bedgraph coverage with STARs RPM normalisation.
+Described [here](https://ycl6.gitbooks.io/rna-seq-data-analysis/visualization.html)    
+**Input:** .bam, .bam.bai index
+**Output:** coverage bedGraphs 
+**Arguments not influencable by user:**   
+--outWigStrans "Stranded"    
+--outWigNorm "RPM"  
+*Same for single- and paired-end.*
+\ No newline at end of file