diff --git a/pipeline_steps.md b/pipeline_steps.md index 0a913fba55657f8393abd10bad129aa7cb96bde4..98f235b23328080d7a7de0e1a2acc2af130783f1 100644 --- a/pipeline_steps.md +++ b/pipeline_steps.md @@ -5,6 +5,7 @@ * read samples table * create log directories * **create_index_star** +* **extract_transcriptome** * **create_index_salmon** * **create_index_kallisto** * **extract_transcripts_as_bed12** @@ -20,6 +21,7 @@ * **pe_index_genomic_alignment_samtools** * **pe_quantification_salmon** * **pe_genome_quantification_kallisto** +* **star_rpm_paired_end** @@ -76,6 +78,10 @@ Create index for STAR alignments. Supply the reference genome sequences (FASTA f **Parameters:** sjdbOverhang (This is the `index_size` specified in the samples table). **Output:** chrNameLength.txt will be used for STAR mapping; chrName.txt +#### extract_transcriptome +> TODO + + #### create_index_salmon Create index for Salmon quantification. If you want to use Salmon in mapping-based mode, then you first have to build a salmon index for your transcriptome. This will build the mapping-based index, using an auxiliary k-mer hash over k-mers of length 31. While the mapping algorithms will make use of arbitrarily long matches between the query and reference, the k size selected here will act as the minimum acceptable length for a valid match. Thus, a smaller value of k may slightly improve sensitivty. We find that a k of 31 seems to work well for reads of 75bp or longer, but you might consider a smaller k if you plan to deal with shorter reads. [Salmon manual](https://salmon.readthedocs.io/en/latest/salmon.html) @@ -203,7 +209,7 @@ Spliced Transcripts Alignment to a Reference #### (pe_)index_genomic_alignment_samtools Index the genomic alignment with [samtools index](http://quinlanlab.org/tutorials/samtools/samtools.html#samtools-index). Indexing a genome sorted BAM file allows one to quickly extract alignments overlapping particular genomic regions. Moreover, indexing is required by genome viewers such as IGV so that the viewers can quickly display alignments in each genomic region to which you navigate. -Needed for TIN score calculation. +Needed for TIN score calculation and bedgraph coverage calculation. **Input:** bam file **Output:** bam.bai index file @@ -246,4 +252,20 @@ Needed for TIN score calculation. *additionally for single end:* * -l: fragment length, user specified as `mean` -* -s: fragment length SD, user specified as `sd` \ No newline at end of file +* -s: fragment length SD, user specified as `sd` + + + +#### star_rpm_paired_end +Create stranded bedgraph coverage with STARs RPM normalisation. +Described [here](https://ycl6.gitbooks.io/rna-seq-data-analysis/visualization.html) + +**Input:** .bam, .bam.bai index +**Output:** coverage bedGraphs + +**Arguments not influencable by user:** +--outWigStrans "Stranded" +--outWigNorm "RPM" + + +*Same for single- and paired-end.* \ No newline at end of file