Changes

Bastian Wagner · ebfe3944
--- a/Project-Design.md
+++ b/Project-Design.md
 **Input:** fasta-formatted file of transcript sequences gtf-formatted file with potential priming sites for individual transcripts, with associated probabilities file with the copy number of each unique transcript subjected to the cDNA synthesis

-**Output:** fasta-formatted file with DNA copies of the transcripts, ending at the one of the possible priming sites for each transcript. Priming sites are sampled in proportion to their probability of being used within a transcript. Each copy of a unique transcript is independently sampled, but only unique DNA sequences are saved to the output file.Csv-formatted file with the copy number of each unique DNA copy.
+**Output:** fasta-formatted file with DNA copies of the transcripts, ending at the one of the possible priming sites for each transcript. Priming sites are sampled in proportion to their probability of being used within a transcript. Each copy of a unique transcript is independently sampled, but only unique DNA sequences are saved to the output file. Csv-formatted file with the copy number of each unique DNA copy.

 Simulating cDNA synthesis This is done by reverse transcribing starting from the primer sequence. For each transcript we have the sequence and the copy number. So we for each copy of the transcript we have to sample a priming site in proportion to its probability, calculated at the previous step. Then the cDNAs will be all the sequences generated from the initial pool of transcripts by copying the initial transcript sequence up to the chosen priming site.

 **Design**

-1. Extract transcritpt_sequences, transcritpt_copy_number, priming_sites and priming_probabilities from input files.
+1. Extract transcript_sequences, transcript_copy_number, priming_sites and priming_probabilities from input files.

- transcritpt_sequences = GATGCGG… , AAGCGCGG…, CTCTTGCGG… \[...\]
- transcritpt_copy_number = 100, 40, 30 \[...\]
+- transcript_sequences = GATGCGG… , AAGCGCGG…, CTCTTGCGG… \[...\]
+- transcript_copy_number = 100, 40, 30 \[...\]
 - priming_sites = 220, 260, 390 \[...\]
 - priming_probabilities = 0.33, 0.27, 0.40 \[...\]

-2. Generate a list of unique_transcripts based on transcritpt_sequences + priming_sites and add the list to the fasta output file.
+2. Generate a list of unique_transcripts based on transcript_sequences + priming_sites and add the list to the FASTA output file.

 - TTTACGGT…
 - CCATACGG…
 - CGGGGCG…

-3. Generate list of copy numbers for each unique transcript based on priming_probabilities + transcritpt_copy_number
+3. Generate list of copy numbers for each unique transcript based on priming_probabilities + transcript_copy_number

 - TTTACGGT… 33
 - CCATACGG… 27