Skip to content

Construct parameters object

Write a function to parse the config file for the run and populate a dictionary with parameter values. The list of parameters is as follows:

  1. Csv-formatted table “GeneID,Counts” specifying the number of transcripts expressed, on average, for each gene in a given cell type. These can come for example from a bulk RNA-seq experiment of sorted cells of a given type.
  2. File with the genome sequence
  3. gff/gtf-formatted file with the transcript annotation of the genome
  4. Output directory
  5. Number of reads to sequence
  6. Number of cells to simulate
  7. Mean and standard deviation of RNA fragment length
  8. Read length
  9. Probability of intron inclusion - considered constant per intron to start with, can be extended to intron-specific. In the latter case, estimates could be obtained from bulk RNA-seq data by dividing the average per-position coverage in a given intron by the average per-position coverage of the gene, or of flanking exons.
  10. Option to add poly(A) tails to transcripts and an associated function for generating these tails (with specific length distribution and non-A nucleotide frequency).
  11. Parameters for evaluating internal priming: primer sequence, function implementing the constraints on priming sites (accessibility, energy of interaction, perfect matching at last primer position etc.).
  12. Number of replicates (cells) for which to run the simulation.

Input: config file (txt)

Output: Class holding the parameters for the run.