Construct parameters object
Write a function to parse the config file for the run and populate a dictionary with parameter values. The list of parameters is as follows:
- Csv-formatted table “GeneID,Counts” specifying the number of transcripts expressed, on average, for each gene in a given cell type. These can come for example from a bulk RNA-seq experiment of sorted cells of a given type.
- File with the genome sequence
- gff/gtf-formatted file with the transcript annotation of the genome
- Output directory
- Number of reads to sequence
- Number of cells to simulate
- Mean and standard deviation of RNA fragment length
- Read length
- Probability of intron inclusion - considered constant per intron to start with, can be extended to intron-specific. In the latter case, estimates could be obtained from bulk RNA-seq data by dividing the average per-position coverage in a given intron by the average per-position coverage of the gene, or of flanking exons.
- Option to add poly(A) tails to transcripts and an associated function for generating these tails (with specific length distribution and non-A nucleotide frequency).
- Parameters for evaluating internal priming: primer sequence, function implementing the constraints on priming sites (accessibility, energy of interaction, perfect matching at last primer position etc.).
- Number of replicates (cells) for which to run the simulation.
Input: config file (txt)
Output: Class holding the parameters for the run.