Skip to content
Snippets Groups Projects
Commit 10d0c9ec authored by sunhollyjolly's avatar sunhollyjolly
Browse files

update

parent cbf2077d
No related branches found
No related tags found
1 merge request!45Hugo new
Pipeline #14872 failed
# Terminal fragment selector
Simulating single cell RNA library generation (scRNA-seq)
This is a repository as part of the the course <Programming for Life Science 43513, uniBasel>
As part of the project, to test the accuracy of cscRNA-seq, we generate *synthetic data*. That is, to evaluate the if our prediction falls into the boudary in which everyone can agree, we generated ground truth data sets and determine whether the computational analysis can recover properties of the data that was assumed in the simulation. These simulations are never trivial since it can help build intuitions as to which steps of the experiment have the largest consequences for the outcome, where specific behaviors may come from etc.
In this sub-project we will be working on selecting terminal fragment. The detail information for distribution we used for selecting fragments can be found below paper.
[title](https://www.nature.com/articles/srep04532#MOESM1)
> Next Generation Sequencing (NGS) technology is based on cutting DNA into small fragments and their massive parallel sequencing. The multiple overlapping segments termed “reads” are assembled into a contiguous sequence. To reduce sequencing errors, every genome region should be sequenced several dozen times. This sequencing approach is based on the assumption that genomic DNA breaks are random and sequence-independent. However, previously we showed that for the sonicated restriction DNA fragments the rates of double-stranded breaks depend on the nucleotide sequence. In this work we analyzed genomic reads from NGS data and discovered that fragmentation methods based on the action of the hydrodynamic forces on DNA, produce similar bias. Consideration of this non-random DNA fragmentation may allow one to unravel what factors and to what extent influence the non-uniform coverage of various genomic regions.
In the end, as a whole, we will implement a procedure for sampling reads from mRNA sequences, incorporating a few sources of “noise”. These include the presence of multiple transcript isoforms from a given gene, some that are incompletely spliced, stochastic binding of primers to RNA fragments and stochastic sampling of DNA fragments for sequencing. We will then use standard methods to estimate gene expression from the simulated data. We will repeat the process multiple times, each time corresponding to a single cell. We will then compare the estimates obtained from the simulated cells with the gene expression values assumed in the simulation. We will also try to explore which steps in the sample preparation have the largest impact on the accuracy of gene expression estimates.
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment