Workflow for computing per-intron inclusion rates from bulk RNA-seq
To improve the realism of our simulations, we could use instead of a constant probability of intron inclusion the observed probability of inclusion in bulk RNA-seq data. This could be done in two steps, first using the RNA-seq data to identify the most expressed transcript for each gene, and second quantifying the rate of intron inclusion. This could be the read coverage per position in an intron relative to the average read coverage per position in the flanking exons.
Input:
- bam file with genome alignments
- gtf file of gene annotations
Output: rate of inclusion for all introns in the most expressed transcript per gene