Transcripts to consider for analysis
There seems to be an inconsistency between the gtf file and the cDNA file from ENSEMBL. See here
- Which transcripts should we use for the pipeline
- Add scripts or/and rules to select the transcripts
There seems to be an inconsistency between the gtf file and the cDNA file from ENSEMBL. See here
changed the description
At this point I wouldn't worry about this and leave it for the user to decide. We can of course discuss this for internal use...
added Discuss label
added To Do label
removed To Do label
changed milestone to %v0.1.0 release
Oh, I misread the issue. Thought it was about filtering transcripts/annotations. As far as I know Ensembl splits up their transcript FASTA files into separate files, cDNA and non-coding. But I think even then there may be inconsistencies with the gene annotations file. Using scripts to keep the files consistent is tedious. The easiest and sanest solution here should be to use the gene annotation file as a single source of truth and derive the transcriptome from that with the help of the genome. And it has the additional advantage that we require one input file less for the workflow.
As there is already an issue for this (#62 (closed)), I will close this one.
closed
assigned to @kanitz