closes #160 (closed)
Introduce a dictionary to infer "directionality parameters" for different tools from one provided Salmon library type.
directionality_dict = {
"SF":
{"kallisto":"--fr-stranded",
"alfa": "fr-secondstrand",
"alfa_plus": "str1",
"alfa_minus": "str2"},
"SR":
{"kallisto":"--rf-stranded",
"alfa": "fr-firststrand",
"alfa_plus": "str2",
"alfa_minus": "str1"},
This is based on the following tool documentations and checked by Dominik and Mihaela:
Salmon,
Kallisto,
ALFA,
Zarp issue created by Dominik when first incorporating ALFA,
Blog about strandedness in RNAseq
For single end sequencing, the two codes 'SF' and 'SR' are used; for paired end, one of 'I' (inward), 'O' (outward), 'M' (matching) is prepended to 'SF' or 'SR'. However, those latter three types are treated the same when translated for other tools, as only salmon has a specific code for those types.
Note: With the introduced changes, the
libtype
insamples.tsv
has to be specified; the previous value 'A' (automatic inference) is not valid anymore
get_directionality(libtype, tool)
to find correct parameter for each tool, given salmon libtype. Calls to the different tools' directionality params from samples.tsv replaced by get_directionality
functionprepare_inputs.py
:
get_libtype
, which infers the salmon libtype from labkey parameters 'SENSE or ANTISENSE' and 'pe or se'test_scripts_prepare_inputs_table
kallisto_directionality
, alfa_directionality
, alfa_plus
, alfa_minus
Apart from the zarp integration tests, I also ran the pipeline on
test_alfa
files created by Dominik when incorporating ALFA originally. Was able to reproduce the plots he got when specifying correct or reverse parameters.# Old samples.tsv
libtype fq1_polya_3p fq1_polya_5p kallisto_directionality alfa_directionality alfa_plus alfa_minus fq2
A AAAAAAAAAAAAAAAAA XXXXXXXXXXXXX --rf fr-firststrand str2 str1 XXXXXXXXXXXXX
# New samples.tsv
libtype fq1_polya_3p fq1_polya_5p fq2
SF AAAAAAAAAAAAAAAAA XXXXXXXXXXXXX XXXXXXXXXXXXX
Obtained the same results with the old and new pipeline version. When specifying 'SR' instead of 'SF', ALFA classifies most reads as 'opposite strand', and kallisto and salmon cannot align most reads, as expected.