Handling genome resources

The script prepare_inputs.py that parses LabKey table into a sample table, which is further used as the input of the pipeline, uses (too?) many assumptions concerning the location of resources.

  1. Currently the name of the directory with FASTA and GTF file should correspond to the name of the organism used for sequencing.
  2. Currently the name of the FASTA file should be genome.fasta and the name of the GTF file should be annotation.gtf.

Shall we relax these conditions?

Suggestions:

  1. Anyway we assume that samples processed in one run originate from the same organism. Anyway we specify the directory, where resources are located in the argument --resources-dir. Shall we use this argument to point at the directory with FASTA and GTF files right away?
  2. Shall we assume that only one FASTA file and one GTF file should be provided and then automatically retrieve their names in the specified resources directory?
Edited by Alex Kanitz