Workflow for generating synthetic data

Encapsulate all the functionality of scRNA-seq data generation into a workflow (written in nextflow https://github.com/nextflow-io/nextflow).

Inputs (I#):

  1. Path to genome sequence file (fasta)
  2. Path to genome annotation file (gtf)
  3. Path to gene expression values (csv: geneID,count)
  4. Total number of transcripts to samples
  5. Probability of intron inclusion
  6. Script containing function for constructing poly(A) tails
  7. Length of poly(A) tails
  8. Dictionary with nucleotide frequencies in poly(A) tails
  9. Primer sequence
  10. Threshold for the energy of primer-mRNA interaction needed for priming
  11. Mean and standard deviation of fragment length
  12. Read length (number of sequencing cycles)
  13. Number of cells to simulate
  14. Directory for storing output files
  15. Software for predicting energy of primer-target interaction
  16. Pattern specifying the reads file name for an individual cell

Output of this issue: Nextflow code for executing the workflow

Outputs (O#) of the entire workflow:

  1. Path to sampled transcript structures (gtf)
  2. Path to transcript counts (csv: transcriptID,count)
  3. Path with sampled transcript sequences (fasta)
  4. Path to annotated internal priming sites (gtf)
  5. Path to unique cDNA sequences
  6. Path to cDNA count table
  7. Path to sequences of terminal fragments (fasta)
  8. Path to read sequences

The workflow will include the following steps:

  • Repeat the simulation for the required number of cells (I3)
    • Generate transcript structures (#2)
      • Inputs I2,I3,I5
      • Outputs O1,O2
    • Extract transcript sequences (#3)
      • Inputs I1,I6,I7,I8,O1
      • Outputs O3
    • Predict priming sites (#4)
      • Inputs I9,I10,I15,O3
      • Outputs O4
    • Generate cDNAs (#5)
      • Inputs O2,O3,O4
      • Outputs O5,O6
    • Terminal fragment selection (#6)
      • Inputs O5,O6,I11
      • Outputs O7
    • Read sequencing (#7)
      • Inputs O7,I12
      • Outputs O8
Edited by MihaelaZavolan