Typesetting corrections README 2

5b073670 · Niels Schlusser · 93064d1d · 5b073670
Commit 5b073670 authored 9 months ago by Niels Schlusser
--- a/README.md
+++ b/README.md
@@ -13,12 +13,14 @@ There are deep learning scripts for essentially three different usecases:
 - MPRA from Sample et.al. (2019)
 - endogenous (riboseq/RNAseq) data based on Alexaki et.al. (2020)
 - clinvar variations based on Landrum et. al. (2020)
 are provided in the directory HEK293_training_data/
 ## Scripts
 1. turn the output of RNAseq and ribosome profiling data into translation efficiency estimates
 2. append non-sequential features to a given data set
 3. construct a data set based on a vcf file
 can be found in the directory training_data_preprocessing/.
@@ -30,12 +32,14 @@ The preprocessing procedure for MPRA data calculates and appends the non-sequent
 - number_inframe_uAUGs
 - normalized_5p_folding_energy
 - GC_content
 to the input file using.
 ### Endogenous data
 The preprocessing procedure for endogenous data takes mapping files from:
 - riboseq data analysis
 - RNAseq data analysis
 as *input* and turns it into a file with:
 - translation efficiencies
 - 5'UTR sequences
@@ -49,6 +53,7 @@ As an input from the riboseq side, you need:
 - bam and bai file of the mapping done in riboseq
 - an alignment json file that contains the p-site offsets for different RPF lengths
 - a tsv file that links gene id and transcript id
 From the RNA seq side, you need
 - transcripts_numreads.tsv (output from kallisto)
 - a file with the TIN scores (potentially per replicate)
@@ -86,6 +91,7 @@ There are a few parameters to specify in the middle of the script:
 - the path to the directory where to save the scalers
 - the path for saving the trained model
 - the path for the pretrained model (for transfer learning, only)
 All these scripts can be run in
 1. normal mode (training and testing)
 2. with the suffix 'predict' after '''python3 <scriptname>''' for prediction and scatterplot creation, only