Terminal fragment selection
Given a set of full-length DNAs, apply fragmentation and select end fragments within a certain length range. Each copy of a transcript is fragmented independently of the others.
Input:
- Fasta-formatted file of transcript sequences
- Csv-formatted ("TranscriptID,GeneID,Count) with transcript copy numbe
- Mean fragment lengths
- Standard deviation of fragment lengths
Output: Fasta-formatted file with terminal fragments from the input transcripts that fall within the desired range of length (mean +/- 2 standard deviations).
First, a number of break points should be chosen so that the expected fragment length is the one provided (input 3). Then, the location of these points in the transcripts should be sampled. Finally, if the terminal fragment of the respective transcript falls within 2 std.dev. (input 4) from the provided mean (input 3), it is saved in the output file. The process should be repeated independently for each copy (input 2) of each transcript (input 1).