Generate transcript structures
Given a number transcripts to be sampled from each of a set of genes, generate their intron-exon structures and counts, allowing for some of the introns to be included in the transcripts.
Input:
- Csv-formatted file ("ID,Count") with counts for individual transcripts.
- Probability of an intron inclusion.
- gtf-formatted file with the intron/exon coordinates of the transcripts represented in the count file.
Output:
- Gtf-formatted file containing the unique intron/exon structures that have been generated.
- Csv-formatted file ("NewTranscriptID,ID,Count") with the ID of the parent transcript (that did not have any intron inclusions) and then copy number of each unique transcript structure.
The structure of each transcript should be generated individually, using the same exons as there are in the input transcript, but allowing for the possibility of intron inclusion. This is done by walking along the introns implied by the intron/exon structure of the transcript and deciding whether to include them, with the specified probability for each intron. If an intron is selected as included, a new exon will be created, covering the selected intron and the exons preceding and succeeding it. The exon/intron structures of all transcripts will be written to a new gtf file, and the number of times each unique transcript form was generated is written to a new csv-formatted file.