Handling genome resources
The script prepare_inputs.py
that parses LabKey table into a sample table, which is further used as the input of the pipeline, uses (too?) many assumptions concerning the location of resources.
- Currently the name of the directory with FASTA and GTF file should correspond to the name of the organism used for sequencing.
- Currently the name of the FASTA file should be
genome.fasta
and the name of the GTF file should beannotation.gtf
.
Shall we relax these conditions?
Suggestions:
- Anyway we assume that samples processed in one run originate from the same organism. Anyway we specify the directory, where resources are located in the argument
--resources-dir
. Shall we use this argument to point at the directory with FASTA and GTF files right away? - Shall we assume that only one FASTA file and one GTF file should be provided and then automatically retrieve their names in the specified resources directory?
Edited by Alex Kanitz