Skip to content
Snippets Groups Projects
Commit 2a934740 authored by BIOPZ-Katsantoni Maria's avatar BIOPZ-Katsantoni Maria
Browse files

Remove sample specific options and use them as rule specific.

Partially solves issue #158.
The features softclip, passmode, multimappers which were related to STAR
are relocated to the rule config and given a default value.
The documentation is updated and these options are removed from the
immutable lists in the corresponding rules.
parent 8098698b
No related branches found
No related tags found
1 merge request!98Remove sample specific options and use them as rule specific.
...@@ -116,9 +116,6 @@ gtf_filtered | Required for [Salmon](#third-party-software-used). Path to filter ...@@ -116,9 +116,6 @@ gtf_filtered | Required for [Salmon](#third-party-software-used). Path to filter
genome | Required for [STAR](#third-party-software-used). Path to genome `.fa` file. File needs to be in subdirectory corresponding to `organism` field. Example: `/path/to/GRCh38/genome.fa` | `str` genome | Required for [STAR](#third-party-software-used). Path to genome `.fa` file. File needs to be in subdirectory corresponding to `organism` field. Example: `/path/to/GRCh38/genome.fa` | `str`
sd | Required for [kallisto](#third-party-software-used) and [Salmon](#third-party-software-used), but only for single-end libraries. Estimated standard deviation of fragment length distribution. Can be assessed from, e.g., BioAnalyzer profiles. Value ignored for paired-end libraries. | `int` sd | Required for [kallisto](#third-party-software-used) and [Salmon](#third-party-software-used), but only for single-end libraries. Estimated standard deviation of fragment length distribution. Can be assessed from, e.g., BioAnalyzer profiles. Value ignored for paired-end libraries. | `int`
mean | Required for [kallisto](#third-party-software-used) and [Salmon](#third-party-software-used), but only for single-end libraries. Estimated mean of fragment length distribution. Can be assessed, e.g., from BioAnalyzer profiles. Value ignored for paired-end libraries. | `int` mean | Required for [kallisto](#third-party-software-used) and [Salmon](#third-party-software-used), but only for single-end libraries. Estimated mean of fragment length distribution. Can be assessed, e.g., from BioAnalyzer profiles. Value ignored for paired-end libraries. | `int`
multimappers | Required for [STAR](#third-party-software-used). Maximum number of multiple alignments allowed for a read; if exceeded, the read is considered unmapped. | `int`
soft_clip | Required for [STAR](#third-party-software-used). One of `Local` (standard local alignment with soft-clipping allowed) or `EndToEnd` (force end-to-end read alignment, do not soft-clip). | `str`
pass_mode | Required for [STAR](#third-party-software-used). One of `None` (1-pass mapping) or `Basic` (basic 2-pass mapping, with all 1st-pass junctions inserted into the genome indices on the fly). | `str`
libtype | Required for [Salmon](#third-party-software-used). See [Salmon manual][docs-salmon] for allowed values. If in doubt, enter `A` to automatically infer the library type. | `str` libtype | Required for [Salmon](#third-party-software-used). See [Salmon manual][docs-salmon] for allowed values. If in doubt, enter `A` to automatically infer the library type. | `str`
kallisto_directionality | Required for [kallisto](#third-party-software-used) and [ALFA](#third-party-software-used). One of `--fr-stranded` (strand-specific reads, first read forward) and `--rf-stranded` (strand-specific reads, first read reverse) | `str` kallisto_directionality | Required for [kallisto](#third-party-software-used) and [ALFA](#third-party-software-used). One of `--fr-stranded` (strand-specific reads, first read forward) and `--rf-stranded` (strand-specific reads, first read reverse) | `str`
fq1_polya3p | Required for [Cutadapt](#third-party-software-used). Stretch of `A`s or `T`s, depending on read orientation. Trimmed from the 3' end of the read. Use value such as `XXXXXXXXXXXXXXX` if no poly(A) stretch present or if no trimming is desired. | `str` fq1_polya3p | Required for [Cutadapt](#third-party-software-used). Stretch of `A`s or `T`s, depending on read orientation. Trimmed from the 3' end of the read. Use value such as `XXXXXXXXXXXXXXX` if no poly(A) stretch present or if no trimming is desired. | `str`
...@@ -599,13 +596,12 @@ Align short reads to reference genome and/or transcriptome with ...@@ -599,13 +596,12 @@ Align short reads to reference genome and/or transcriptome with
[**remove_polya_cutadapt**](#remove_polya_cutadapt) [**remove_polya_cutadapt**](#remove_polya_cutadapt)
- Index; from [**create_index_star**](#create_index_star) - Index; from [**create_index_star**](#create_index_star)
- **Parameters** - **Parameters**
- **samples.tsv**
- `--outFilterMultimapNmax`: maximum number of multiple alignments allowed; if exceeded, read is considered unmapped; specify in sample table column `multimappers`
- `--alignEndsType`: one of `Local` (standard local alignment with soft-clipping allowed) or `EndToEnd` (force end-to-end read alignment, do not soft-clip); specify in sample table column `soft_clip`
- `--twopassMode`: one of `None` (1-pass mapping) or `Basic` (basic 2-pass mapping, with all 1st-pass junctions inserted into the genome indices on the fly); specify in sample table column `pass_mode`
- **rule_config.yaml** - **rule_config.yaml**
- `--outFilterMultimapScoreRange=0`: the score range below the maximum score for multimapping alignments (default 1) - `--outFilterMultimapScoreRange=0`: the score range below the maximum score for multimapping alignments (default 1)
- `--outFilterType=BySJout`: reduces the number of ”spurious” junctions - `--outFilterType=BySJout`: reduces the number of ”spurious” junctions
- `--outFilterMultimapNmax`: maximum number of multiple alignments allowed; if exceeded, read is considered unmapped; specify in sample table column `multimappers`
- `--alignEndsType`: one of `Local` (standard local alignment with soft-clipping allowed) or `EndToEnd` (force end-to-end read alignment, do not soft-clip); specify in sample table column `soft_clip`
- `--twopassMode`: one of `None` (1-pass mapping) or `Basic` (basic 2-pass mapping, with all 1st-pass junctions inserted into the genome indices on the fly); specify in sample table column `pass_mode`
- **Output** - **Output**
- Aligned reads file (`.bam`); used in - Aligned reads file (`.bam`); used in
[**calculate_TIN_scores**](#calculate_TIN_scores), [**calculate_TIN_scores**](#calculate_TIN_scores),
......
...@@ -122,12 +122,24 @@ map_genome_star: ...@@ -122,12 +122,24 @@ map_genome_star:
--outFilterMultimapScoreRange: '0' --outFilterMultimapScoreRange: '0'
# keep only those reads that contain junctions that passed filtering into SJ.out.tab. (default 'Normal', ZARP recommends 'BySJout', as this reduces the number of ”spurious” junctions ) # keep only those reads that contain junctions that passed filtering into SJ.out.tab. (default 'Normal', ZARP recommends 'BySJout', as this reduces the number of ”spurious” junctions )
--outFilterType: 'BySJout' --outFilterType: 'BySJout'
# type of read ends alignment: force end-to-end read alignment, do not soft-clip
--alignEndsType: 'EndToEnd'
# extract junctions, insert them into the genome index and re-map reads in a 2nd mapping pass
--twopassMode: Basic
# alignments (all of them) will be output only if the read maps to no more loci than 10
--outFilterMultimapNmax: '10'
pe_map_genome_star: pe_map_genome_star:
# the score range below the maximum score for multimapping alignments (default 1, ZARP recommends 0) # the score range below the maximum score for multimapping alignments (default 1, ZARP recommends 0)
--outFilterMultimapScoreRange: '0' --outFilterMultimapScoreRange: '0'
# keep only those reads that contain junctions that passed filtering into SJ.out.tab. (default 'Normal', ZARP recommends 'BySJout', as this reduces the number of ”spurious” junctions ) # keep only those reads that contain junctions that passed filtering into SJ.out.tab. (default 'Normal', ZARP recommends 'BySJout', as this reduces the number of ”spurious” junctions )
--outFilterType: 'BySJout' --outFilterType: 'BySJout'
# type of read ends alignment: force end-to-end read alignment, do not soft-clip
--alignEndsType: 'EndToEnd'
# extract junctions, insert them into the genome index and re-map reads in a 2nd mapping pass
--twopassMode: Basic
# alignments (all of them) will be output only if the read maps to no more loci than 10
--outFilterMultimapNmax: '10'
quantification_salmon: quantification_salmon:
# correct for sequence specific biases](https://salmon.readthedocs.io/en/latest/salmon.html#seqbias # correct for sequence specific biases](https://salmon.readthedocs.io/en/latest/salmon.html#seqbias
......
sample seqmode fq1 index_size kmer fq1_3p fq1_5p organism gtf genome sd mean multimappers soft_clip pass_mode libtype fq1_polya_3p fq1_polya_5p kallisto_directionality alfa_directionality alfa_plus alfa_minus fq2 fq2_3p fq2_5p fq2_polya_3p fq2_polya_5p sample seqmode fq1 index_size kmer fq1_3p fq1_5p organism gtf genome sd mean libtype fq1_polya_3p fq1_polya_5p kallisto_directionality alfa_directionality alfa_plus alfa_minus fq2 fq2_3p fq2_5p fq2_polya_3p fq2_polya_5p
synthetic_10_reads_paired_synthetic_10_reads_paired pe ../input_files/project1/synthetic.mate_1.fastq.gz 75 31 AGATCGGAAGAGCACA XXXXXXXXXXXXX homo_sapiens ../input_files/homo_sapiens/annotation.gtf ../input_files/homo_sapiens/genome.fa 100 250 10 EndToEnd None A AAAAAAAAAAAAAAAAA XXXXXXXXXXXXXXXXX --fr fr-firststrand str1 str2 ../input_files/project1/synthetic.mate_2.fastq.gz AGATCGGAAGAGCGT XXXXXXXXXXXXX XXXXXXXXXXXXXXXXX TTTTTTTTTTTTTTTTT synthetic_10_reads_paired_synthetic_10_reads_paired pe ../input_files/project1/synthetic.mate_1.fastq.gz 75 31 AGATCGGAAGAGCACA XXXXXXXXXXXXX homo_sapiens ../input_files/homo_sapiens/annotation.gtf ../input_files/homo_sapiens/genome.fa 100 250 A AAAAAAAAAAAAAAAAA XXXXXXXXXXXXXXXXX --fr fr-firststrand str1 str2 ../input_files/project1/synthetic.mate_2.fastq.gz AGATCGGAAGAGCGT XXXXXXXXXXXXX XXXXXXXXXXXXXXXXX TTTTTTTTTTTTTTTTT
synthetic_10_reads_mate_1_synthetic_10_reads_mate_1 se ../input_files/project2/synthetic.mate_1.fastq.gz 75 31 AGATCGGAAGAGCACA XXXXXXXXXXXXX homo_sapiens ../input_files/homo_sapiens/annotation.gtf ../input_files/homo_sapiens/genome.fa 100 250 10 EndToEnd None A AAAAAAAAAAAAAAAAA XXXXXXXXXXXXXXXXX --fr fr-firststrand str1 str2 XXXXXXXXXXXXX XXXXXXXXXXXXX XXXXXXXXXXXXX XXXXXXXXXXXXX XXXXXXXXXXXXX synthetic_10_reads_mate_1_synthetic_10_reads_mate_1 se ../input_files/project2/synthetic.mate_1.fastq.gz 75 31 AGATCGGAAGAGCACA XXXXXXXXXXXXX homo_sapiens ../input_files/homo_sapiens/annotation.gtf ../input_files/homo_sapiens/genome.fa 100 250 A AAAAAAAAAAAAAAAAA XXXXXXXXXXXXXXXXX --fr fr-firststrand str1 str2 XXXXXXXXXXXXX XXXXXXXXXXXXX XXXXXXXXXXXXX XXXXXXXXXXXXX XXXXXXXXXXXXX
...@@ -251,36 +251,18 @@ rule pe_map_genome_star: ...@@ -251,36 +251,18 @@ rule pe_map_genome_star:
"{sample}", "{sample}",
"map_genome", "map_genome",
"{sample}.pe."), "{sample}.pe."),
multimappers = lambda wildcards:
get_sample(
'multimappers',
search_id='index',
search_value=wildcards.sample),
soft_clip = lambda wildcards:
get_sample(
'soft_clip',
search_id='index',
search_value=wildcards.sample),
pass_mode = lambda wildcards:
get_sample(
'pass_mode',
search_id='index',
search_value=wildcards.sample),
additional_params = parse_rule_config( additional_params = parse_rule_config(
rule_config, rule_config,
current_rule=current_rule, current_rule=current_rule,
immutable=( immutable=(
'--twopassMode',
'--genomeDir', '--genomeDir',
'--readFilesIn', '--readFilesIn',
'--readFilesCommand', '--readFilesCommand',
'--outFilterMultimapNmax',
'--outFileNamePrefix', '--outFileNamePrefix',
'--outSAMattributes', '--outSAMattributes',
'--outStd', '--outStd',
'--outSAMtype', '--outSAMtype',
'--outSAMattrRGline', '--outSAMattrRGline',
'--alignEndsType',
) )
) )
...@@ -301,18 +283,15 @@ rule pe_map_genome_star: ...@@ -301,18 +283,15 @@ rule pe_map_genome_star:
shell: shell:
"(STAR \ "(STAR \
--twopassMode {params.pass_mode} \
--runThreadN {threads} \ --runThreadN {threads} \
--genomeDir {params.index} \ --genomeDir {params.index} \
--readFilesIn {input.reads1} {input.reads2} \ --readFilesIn {input.reads1} {input.reads2} \
--readFilesCommand zcat \ --readFilesCommand zcat \
--outFilterMultimapNmax {params.multimappers} \
--outFileNamePrefix {params.outFileNamePrefix} \ --outFileNamePrefix {params.outFileNamePrefix} \
--outSAMattributes All \ --outSAMattributes All \
--outStd BAM_SortedByCoordinate \ --outStd BAM_SortedByCoordinate \
--outSAMtype BAM SortedByCoordinate \ --outSAMtype BAM SortedByCoordinate \
--outSAMattrRGline ID:rnaseq_pipeline SM:{params.sample_id} \ --outSAMattrRGline ID:rnaseq_pipeline SM:{params.sample_id} \
--alignEndsType {params.soft_clip} \
{params.additional_params} \ {params.additional_params} \
> {output.bam};) \ > {output.bam};) \
2> {log.stderr}" 2> {log.stderr}"
......
...@@ -195,36 +195,18 @@ rule map_genome_star: ...@@ -195,36 +195,18 @@ rule map_genome_star:
"{sample}", "{sample}",
"map_genome", "map_genome",
"{sample}.se."), "{sample}.se."),
multimappers = lambda wildcards:
get_sample(
'multimappers',
search_id='index',
search_value=wildcards.sample),
soft_clip = lambda wildcards:
get_sample(
'soft_clip',
search_id='index',
search_value=wildcards.sample),
pass_mode = lambda wildcards:
get_sample(
'pass_mode',
search_id='index',
search_value=wildcards.sample),
additional_params = parse_rule_config( additional_params = parse_rule_config(
rule_config, rule_config,
current_rule=current_rule, current_rule=current_rule,
immutable=( immutable=(
'--twopassMode',
'--genomeDir', '--genomeDir',
'--readFilesIn', '--readFilesIn',
'--readFilesCommand', '--readFilesCommand',
'--outFilterMultimapNmax',
'--outFileNamePrefix', '--outFileNamePrefix',
'--outSAMattributes', '--outSAMattributes',
'--outStd', '--outStd',
'--outSAMtype', '--outSAMtype',
'--outSAMattrRGline', '--outSAMattrRGline',
'--alignEndsType',
) )
) )
...@@ -245,18 +227,15 @@ rule map_genome_star: ...@@ -245,18 +227,15 @@ rule map_genome_star:
shell: shell:
"(STAR \ "(STAR \
--twopassMode {params.pass_mode} \
--runThreadN {threads} \ --runThreadN {threads} \
--genomeDir {params.index} \ --genomeDir {params.index} \
--readFilesIn {input.reads} \ --readFilesIn {input.reads} \
--readFilesCommand zcat \ --readFilesCommand zcat \
--outFilterMultimapNmax {params.multimappers} \
--outFileNamePrefix {params.outFileNamePrefix} \ --outFileNamePrefix {params.outFileNamePrefix} \
--outSAMattributes All \ --outSAMattributes All \
--outStd BAM_SortedByCoordinate \ --outStd BAM_SortedByCoordinate \
--outSAMtype BAM SortedByCoordinate \ --outSAMtype BAM SortedByCoordinate \
--outSAMattrRGline ID:rnaseq_pipeline SM:{params.sample_id} \ --outSAMattrRGline ID:rnaseq_pipeline SM:{params.sample_id} \
--alignEndsType {params.soft_clip} \
{params.additional_params} \ {params.additional_params} \
> {output.bam};) \ > {output.bam};) \
2> {log.stderr}" 2> {log.stderr}"
......
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Please register or to comment