Exons length comparison to choose the best representative transcript

Submodule 1 (J) :

Remove gene lenght

Look for the exons length instead of total gene length.

Submodule 2 (HG) :

create a dictionary as output {Gene_name; transcripts_with_best_confidence}

Submodule 3 :

  1. Inputs : dict from Sub2, original gtf file
  2. For each key in the dict form sub2, if there is more than a transcript by gene, calculate total exons length for this transcript
  3. Choose the ones with the longest exons length
  4. Output : a dict with one transcript ID by gene name {Gene_name; transcript_with_best_confidence}

Submodule 4

(there is already a script from Laura doing that in our repository but with a different input) :

  1. Input : dict output from sub3 ; original gtf file
  2. From gene in dict, generate a new gtf containing only the representative transcripts
  3. Output the gtf file of representative transcripts.
Edited by Hugo Gillet