Skip to content
Snippets Groups Projects
Commit 5b073670 authored by Niels Schlusser's avatar Niels Schlusser
Browse files

Typesetting corrections README 2

parent 93064d1d
No related branches found
No related tags found
No related merge requests found
...@@ -13,12 +13,14 @@ There are deep learning scripts for essentially three different usecases: ...@@ -13,12 +13,14 @@ There are deep learning scripts for essentially three different usecases:
- MPRA from Sample et.al. (2019) - MPRA from Sample et.al. (2019)
- endogenous (riboseq/RNAseq) data based on Alexaki et.al. (2020) - endogenous (riboseq/RNAseq) data based on Alexaki et.al. (2020)
- clinvar variations based on Landrum et. al. (2020) - clinvar variations based on Landrum et. al. (2020)
are provided in the directory HEK293_training_data/ are provided in the directory HEK293_training_data/
## Scripts ## Scripts
1. turn the output of RNAseq and ribosome profiling data into translation efficiency estimates 1. turn the output of RNAseq and ribosome profiling data into translation efficiency estimates
2. append non-sequential features to a given data set 2. append non-sequential features to a given data set
3. construct a data set based on a vcf file 3. construct a data set based on a vcf file
can be found in the directory training_data_preprocessing/. can be found in the directory training_data_preprocessing/.
...@@ -30,12 +32,14 @@ The preprocessing procedure for MPRA data calculates and appends the non-sequent ...@@ -30,12 +32,14 @@ The preprocessing procedure for MPRA data calculates and appends the non-sequent
- number_inframe_uAUGs - number_inframe_uAUGs
- normalized_5p_folding_energy - normalized_5p_folding_energy
- GC_content - GC_content
to the input file using. to the input file using.
### Endogenous data ### Endogenous data
The preprocessing procedure for endogenous data takes mapping files from: The preprocessing procedure for endogenous data takes mapping files from:
- riboseq data analysis - riboseq data analysis
- RNAseq data analysis - RNAseq data analysis
as *input* and turns it into a file with: as *input* and turns it into a file with:
- translation efficiencies - translation efficiencies
- 5'UTR sequences - 5'UTR sequences
...@@ -49,6 +53,7 @@ As an input from the riboseq side, you need: ...@@ -49,6 +53,7 @@ As an input from the riboseq side, you need:
- bam and bai file of the mapping done in riboseq - bam and bai file of the mapping done in riboseq
- an alignment json file that contains the p-site offsets for different RPF lengths - an alignment json file that contains the p-site offsets for different RPF lengths
- a tsv file that links gene id and transcript id - a tsv file that links gene id and transcript id
From the RNA seq side, you need From the RNA seq side, you need
- transcripts_numreads.tsv (output from kallisto) - transcripts_numreads.tsv (output from kallisto)
- a file with the TIN scores (potentially per replicate) - a file with the TIN scores (potentially per replicate)
...@@ -86,6 +91,7 @@ There are a few parameters to specify in the middle of the script: ...@@ -86,6 +91,7 @@ There are a few parameters to specify in the middle of the script:
- the path to the directory where to save the scalers - the path to the directory where to save the scalers
- the path for saving the trained model - the path for saving the trained model
- the path for the pretrained model (for transfer learning, only) - the path for the pretrained model (for transfer learning, only)
All these scripts can be run in All these scripts can be run in
1. normal mode (training and testing) 1. normal mode (training and testing)
2. with the suffix 'predict' after '''python3 <scriptname>''' for prediction and scatterplot creation, only 2. with the suffix 'predict' after '''python3 <scriptname>''' for prediction and scatterplot creation, only
......
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment