Generate transcripts
Generate new transcript structures based on sampled exons that result from intron inclusion. Write abundance of new structures and annotation to files, respectively.
As input, take processed annotations dataframe and abundances dataframe.
For each transcript, filter annotations dataframe to inlcude only exons for given transcript id. Take number of exons and number of samples to generate and create random number array in range [0,1] of size (#exons,#samples). From this, create boolean array by thresholding with intron inclusion probability: arr < incl_prob. Count unique columns in boolean array and save unique vectors with counts. Generate new transcript names for unique vectors.
For each unique vector, copy filtered dataframe. Pick exons from dataframe, where the probability vector for an intron inclusion is True. If strand is "+", change "end" to appropriate next exon "start" - 1. If strand is "-", change "start" to appropriate previous "end" + 1. Generate ids for newly created exons. Set id for newly generated transcript. Accumulate generated transcript anntotation dataframess, reverse parse free text columns and write to file. Write abundances ("new_id","count") to file.