Exons length comparison to choose the best representative transcript
Submodule 1 (J) :
Remove gene lenght
Look for the exons length instead of total gene length.
Submodule 2 (HG) :
create a dictionary as output {Gene_name; transcripts_with_best_confidence}
Submodule 3 :
- Inputs : dict from Sub2, original gtf file
- For each key in the dict form sub2, if there is more than a transcript by gene, calculate total exons length for this transcript
- Choose the ones with the longest exons length
- Output : a dict with one transcript ID by gene name {Gene_name; transcript_with_best_confidence}
Submodule 4
(there is already a script from Laura doing that in our repository but with a different input) :
- Input : dict output from sub3 ; original gtf file
- From gene in dict, generate a new gtf containing only the representative transcripts
- Output the gtf file of representative transcripts.