diff --git a/pipeline_documentation.md b/pipeline_documentation.md index 41bc09574a0795e94593744c4be876cfc55f52f8..3774eb0fbc930234fea3eb9c38ac3eb065d545e5 100644 --- a/pipeline_documentation.md +++ b/pipeline_documentation.md @@ -39,7 +39,7 @@ This document describes the individual rules of the pipeline for information pur ## Detailed description of steps The pipeline consists of three snakefiles: A main Snakefile and an individual Snakefile for each sequencing mode (single-end and paired-end), as parameters to individual tools differ between the sequencing modes. The main Snakefile contains some general rules for the creation of indices, rules that are applicable to both sequencing modes, and rules that deal with summary steps and combining results across samples of the run. Individual rules of the pipeline are described briefly, and links to the respective software manuals are given. If parameters can be influenced by the user (via the samples table) they are also described. -Description of paired- and single-end rules are combined, only differences are highlighted. +Description of paired and single-end rules are combined, only differences are highlighted. ### General @@ -74,9 +74,10 @@ soft_clip | "Local": standard local alignment with soft-clipping allowed. "EndTo pass_mode | "None": 1-pass mapping; "Basic": basic 2-pass mapping, with all 1st pass junctions inserted into the genome indices on the fly; for star mapping (type=STRING) libtype | "A": automatically infer. For more info see [salmon manual](https://salmon.readthedocs.io/en/latest/salmon.html) (type=STRING) kallisto_directionality | "--fr-stranded":Strand specific reads, first read forward. "--rf-stranded": Strand specific reads, first read reverse; for kallisto (type=STRING) -fq1_polya | stretch of As or Ts, depending on read orientation; for cutadapt (type=STRING) -fq2_polya | stretch of As or Ts, depending on read orientation; for cutadapt (type=STRING) - +fq1_polya3p | stretch of As or Ts, depending on read orientation, trimmed from the 3' end of the read; for cutadapt (type=STRING) +fq1_polya5p | stretch of As or Ts, depending on read orientation, trimmed from the 5' end of the read; for cutadapt (type=STRING) +fq2_polya3p| stretch of As or Ts, depending on read orientation, trimmed from the 3' end of the read; for cutadapt (type=STRING) +fq2_polya5p| stretch of As or Ts, depending on read orientation, trimmed from the 5' end of the read; for cutadapt (type=STRING) #### create log directories Currently not implemented as Snakemake rule, but general statement. @@ -260,33 +261,26 @@ Creates an interactive report after the pipeline is finished. [MultiQC](https:// **Output:** fastq files with adapters removed, reads shorter than 10nt will be discarded. -**Arguments not influencable by user:** +**Non-customisable arguments:** -e 0.1 maximum error-rate of 10% -j 8 use 8 threads -m 10 Discard processed reads that are shorter than 10 --n 3 search for all the given adapter sequences repeatedly, either until no adapter match was found or until 3 rounds have been performed. +-n 2 search for all the given adapter sequences repeatedly, either until no adapter match was found or until 2 rounds have been performed. *paired end:* ---pair-filter=both filtering criteria must apply to both reads in order for a read pair to be discarded - -*single end:* --O 1 minimal overlap of 1 +--pair-filter=any filtering criteria must apply to any of the two reads in order for a read pair to be discarded #### (pe_)remove_polya_cutadapt -Here, [Cutadapt](https://cutadapt.readthedocs.io/en/stable/)t is used to remove poly(A) tails. +Here, [Cutadapt](https://cutadapt.readthedocs.io/en/stable/) is used to remove poly(A) tails. **Input:** fastq reads **Parameters:** Adapters to be removed, specified by user in the columns 'fq1_polya', 'fq2_polya', respectively. **Output:** fastq files with poly(A) tails removed, reads shorter than 10nt will be discarded. -**Arguments like in remove_adapters_cutadapt and additionally:** ---match-read-wildcards This option is used to allow matching wildcard characters also within reads, because if no tail should be trimmed "XXXXXX" is specified in the samples table, which doesn't match any nucleotides, and thus nothing will be done here. --n 2 search for all the given adapter sequences repeatedly, either until no adapter match was found or until 2 rounds have been performed. --q 6 trim low-quality 3'ends with a cutoff of 6 nucleotides - - +**Arguments similar to remove_adapters_cutadapt and additionally:** +-n 1 search for all the given adapter sequences repeatedly, either until no adapter match was found or until 1 round has been performed. *paired end:* ---pair-filter=both filtering criteria must apply to both reads in order for a read pair to be discarded +--pair-filter=any filtering criteria must apply to both reads in order for a read pair to be discarded *single end:* -O 1 minimal overlap of 1 @@ -318,8 +312,6 @@ Spliced Transcripts Alignment to a Reference; Read the [Publication](https://www *Same for single- and paired-end.* - - #### (pe_)quantification_salmon [Salmon](https://salmon.readthedocs.io/en/latest/salmon.html) is a tool for wicked-fast transcript quantification from RNA-seq data. @@ -358,4 +350,3 @@ Spliced Transcripts Alignment to a Reference; Read the [Publication](https://www * -l: fragment length, user specified as `mean` * -s: fragment length SD, user specified as `sd` - diff --git a/tests/test_integration_workflow/expected_output.md5 b/tests/test_integration_workflow/expected_output.md5 index dd4c974ae2e254917c30287c94ff8da005a2232d..31cbf6460fa03ea3e0cdc1edd6d132791d44a8ba 100644 --- a/tests/test_integration_workflow/expected_output.md5 +++ b/tests/test_integration_workflow/expected_output.md5 @@ -19,7 +19,7 @@ ea36f062eedc7f54ceffea2b635a25a8 results/star_indexes/homo_sapiens/75/STAR_inde 500dd49da40b16799aba62aa5cf239ba results/samples/synthetic_10_reads_paired_synthetic_10_reads_paired/synthetic_10_reads_paired_synthetic_10_reads_paired.remove_adapters_mate1.fastq e90e31db1ce51d930645eb74ff70d21b results/samples/synthetic_10_reads_paired_synthetic_10_reads_paired/synthetic_10_reads_paired_synthetic_10_reads_paired.remove_adapters_mate2.fastq 500dd49da40b16799aba62aa5cf239ba results/samples/synthetic_10_reads_paired_synthetic_10_reads_paired/synthetic_10_reads_paired_synthetic_10_reads_paired.remove_polya_mate1.fastq -e90e31db1ce51d930645eb74ff70d21b results/samples/synthetic_10_reads_paired_synthetic_10_reads_paired/synthetic_10_reads_paired_synthetic_10_reads_paired.remove_polya_mate2.fastq +1c0796d7e0bdab0e99780b2e11d80c19 results/samples/synthetic_10_reads_paired_synthetic_10_reads_paired/synthetic_10_reads_paired_synthetic_10_reads_paired.remove_polya_mate2.fastq d41d8cd98f00b204e9800998ecf8427e results/samples/synthetic_10_reads_paired_synthetic_10_reads_paired/map_genome/synthetic_10_reads_paired_synthetic_10_reads_paired_SJ.out.tab f551ff091e920357ec0a76807cb51dba results/samples/synthetic_10_reads_paired_synthetic_10_reads_paired/mate1_fastqc/synthetic.mate_1_fastqc/fastqc_data.txt c0df759ceab72ea4b1a560f991fe6497 results/samples/synthetic_10_reads_paired_synthetic_10_reads_paired/mate1_fastqc/synthetic.mate_1_fastqc/fastqc.fo @@ -45,8 +45,8 @@ b28aac49f537b8cba364b6422458ad28 results/samples/synthetic_10_reads_paired_synt 69b70e3f561b749bf10b186dd2480a8a results/samples/synthetic_10_reads_paired_synthetic_10_reads_paired/mate2_fastqc/synthetic.mate_2_fastqc/Images/per_sequence_quality.png b28aac49f537b8cba364b6422458ad28 results/samples/synthetic_10_reads_paired_synthetic_10_reads_paired/mate2_fastqc/synthetic.mate_2_fastqc/Images/per_tile_quality.png 5b950b5dfe3c7407e9aac153db330a38 results/samples/synthetic_10_reads_paired_synthetic_10_reads_paired/mate2_fastqc/synthetic.mate_2_fastqc/Images/sequence_length_distribution.png -5e07e870d516a91647808bd84068d829 results/samples/synthetic_10_reads_paired_synthetic_10_reads_paired/quant_kallisto/abundance.tsv -6180a904511292b0f173794ae98af991 results/samples/synthetic_10_reads_paired_synthetic_10_reads_paired/quant_kallisto/pseudoalignments.bam +2e77276535976efccb244627231624bf results/samples/synthetic_10_reads_paired_synthetic_10_reads_paired/quant_kallisto/abundance.tsv +d013650f813b815a790c9e6a51c7559b results/samples/synthetic_10_reads_paired_synthetic_10_reads_paired/quant_kallisto/pseudoalignments.bam d41d8cd98f00b204e9800998ecf8427e results/samples/synthetic_10_reads_paired_synthetic_10_reads_paired/quant_kallisto/synthetic_10_reads_paired_synthetic_10_reads_paired.kallisto.pseudo.sam c77480e0235761f2d7f80dbceb2e2806 results/samples/synthetic_10_reads_paired_synthetic_10_reads_paired/salmon_quant/synthetic_10_reads_paired_synthetic_10_reads_paired/lib_format_counts.json 989d6ee63b728fced9ec0249735ab83d results/samples/synthetic_10_reads_paired_synthetic_10_reads_paired/salmon_quant/synthetic_10_reads_paired_synthetic_10_reads_paired/aux_info/ambig_info.tsv @@ -78,10 +78,10 @@ e72f5d798c99272f8c0166dc77247db1 results/samples/synthetic_10_reads_mate_1_synt 92bcd0592d22a6a58d0360fc76103e56 results/samples/synthetic_10_reads_mate_1_synthetic_10_reads_mate_1/salmon_quant/synthetic_10_reads_mate_1_synthetic_10_reads_mate_1/aux_info/observed_bias 92bcd0592d22a6a58d0360fc76103e56 results/samples/synthetic_10_reads_mate_1_synthetic_10_reads_mate_1/salmon_quant/synthetic_10_reads_mate_1_synthetic_10_reads_mate_1/aux_info/observed_bias_3p d41d8cd98f00b204e9800998ecf8427e results/samples/synthetic_10_reads_mate_1_synthetic_10_reads_mate_1/salmon_quant/synthetic_10_reads_mate_1_synthetic_10_reads_mate_1/aux_info/unmapped_names.txt -0139e75ddbfe6eb081c2c2d9b9108ab4 results/samples/synthetic_10_reads_paired_synthetic_10_reads_paired/STAR_coverage/synthetic_10_reads_paired_synthetic_10_reads_paired_Signal.UniqueMultiple.str1.out.bg -c266d31e0a2ad84975cb9de335891e64 results/samples/synthetic_10_reads_paired_synthetic_10_reads_paired/STAR_coverage/synthetic_10_reads_paired_synthetic_10_reads_paired_Signal.UniqueMultiple.str2.out.bg -0139e75ddbfe6eb081c2c2d9b9108ab4 results/samples/synthetic_10_reads_paired_synthetic_10_reads_paired/STAR_coverage/synthetic_10_reads_paired_synthetic_10_reads_paired_Signal.Unique.str1.out.bg -c266d31e0a2ad84975cb9de335891e64 results/samples/synthetic_10_reads_paired_synthetic_10_reads_paired/STAR_coverage/synthetic_10_reads_paired_synthetic_10_reads_paired_Signal.Unique.str2.out.bg +16652c037090f3eed1123618a2e75107 results/samples/synthetic_10_reads_paired_synthetic_10_reads_paired/STAR_coverage/synthetic_10_reads_paired_synthetic_10_reads_paired_Signal.UniqueMultiple.str1.out.bg +90ae442ebf35015eab2dd4e804c2bafb results/samples/synthetic_10_reads_paired_synthetic_10_reads_paired/STAR_coverage/synthetic_10_reads_paired_synthetic_10_reads_paired_Signal.UniqueMultiple.str2.out.bg +16652c037090f3eed1123618a2e75107 results/samples/synthetic_10_reads_paired_synthetic_10_reads_paired/STAR_coverage/synthetic_10_reads_paired_synthetic_10_reads_paired_Signal.Unique.str1.out.bg +90ae442ebf35015eab2dd4e804c2bafb results/samples/synthetic_10_reads_paired_synthetic_10_reads_paired/STAR_coverage/synthetic_10_reads_paired_synthetic_10_reads_paired_Signal.Unique.str2.out.bg ea91b4f85622561158bff2f7c9c312b3 results/samples/synthetic_10_reads_mate_1_synthetic_10_reads_mate_1/STAR_coverage/synthetic_10_reads_mate_1_synthetic_10_reads_mate_1_Signal.UniqueMultiple.str1.out.bg bcccf679a8c083d01527514c9f5680a0 results/samples/synthetic_10_reads_mate_1_synthetic_10_reads_mate_1/STAR_coverage/synthetic_10_reads_mate_1_synthetic_10_reads_mate_1_Signal.UniqueMultiple.str2.out.bg ea91b4f85622561158bff2f7c9c312b3 results/samples/synthetic_10_reads_mate_1_synthetic_10_reads_mate_1/STAR_coverage/synthetic_10_reads_mate_1_synthetic_10_reads_mate_1_Signal.Unique.str1.out.bg @@ -89,9 +89,9 @@ bcccf679a8c083d01527514c9f5680a0 results/samples/synthetic_10_reads_mate_1_synt 3ce47cb1d62482c5d62337751d7e8552 results/transcriptome/homo_sapiens/transcriptome.fa 6b44c507f0a1c9f7369db0bb1deef0fd results/alfa_indexes/homo_sapiens/75/ALFA/sorted_genes.stranded.ALFA_index 2caebc23faf78fdbbbdbb118d28bd6b5 results/alfa_indexes/homo_sapiens/75/ALFA/sorted_genes.unstranded.ALFA_index -c1254a0bae19ac3ffc39f73099ffcf2b results/samples/synthetic_10_reads_paired_synthetic_10_reads_paired/ALFA/synthetic_10_reads_paired_synthetic_10_reads_paired.ALFA_feature_counts.tsv -c266d31e0a2ad84975cb9de335891e64 results/samples/synthetic_10_reads_paired_synthetic_10_reads_paired/ALFA/synthetic_10_reads_paired_synthetic_10_reads_paired_Signal.UniqueMultiple.out.minus.bg -0139e75ddbfe6eb081c2c2d9b9108ab4 results/samples/synthetic_10_reads_paired_synthetic_10_reads_paired/ALFA/synthetic_10_reads_paired_synthetic_10_reads_paired_Signal.UniqueMultiple.out.plus.bg +53fd53f884352d0493b2ca99cef5d76d results/samples/synthetic_10_reads_paired_synthetic_10_reads_paired/ALFA/synthetic_10_reads_paired_synthetic_10_reads_paired.ALFA_feature_counts.tsv +90ae442ebf35015eab2dd4e804c2bafb results/samples/synthetic_10_reads_paired_synthetic_10_reads_paired/ALFA/synthetic_10_reads_paired_synthetic_10_reads_paired_Signal.UniqueMultiple.out.minus.bg +16652c037090f3eed1123618a2e75107 results/samples/synthetic_10_reads_paired_synthetic_10_reads_paired/ALFA/synthetic_10_reads_paired_synthetic_10_reads_paired_Signal.UniqueMultiple.out.plus.bg c1254a0bae19ac3ffc39f73099ffcf2b results/samples/synthetic_10_reads_mate_1_synthetic_10_reads_mate_1/ALFA/synthetic_10_reads_mate_1_synthetic_10_reads_mate_1.ALFA_feature_counts.tsv bcccf679a8c083d01527514c9f5680a0 results/samples/synthetic_10_reads_mate_1_synthetic_10_reads_mate_1/ALFA/synthetic_10_reads_mate_1_synthetic_10_reads_mate_1_Signal.UniqueMultiple.out.minus.bg -ea91b4f85622561158bff2f7c9c312b3 results/samples/synthetic_10_reads_mate_1_synthetic_10_reads_mate_1/ALFA/synthetic_10_reads_mate_1_synthetic_10_reads_mate_1_Signal.UniqueMultiple.out.plus.bg +ea91b4f85622561158bff2f7c9c312b3 results/samples/synthetic_10_reads_mate_1_synthetic_10_reads_mate_1/ALFA/synthetic_10_reads_mate_1_synthetic_10_reads_mate_1_Signal.UniqueMultiple.out.plus.bg \ No newline at end of file diff --git a/tests/test_integration_workflow/test.local.sh b/tests/test_integration_workflow/test.local.sh index ac6e45ec44898fb8204019cc26e3d85faaaecb62..6b641f2923a28beb5937a1668b54b38ae627716b 100755 --- a/tests/test_integration_workflow/test.local.sh +++ b/tests/test_integration_workflow/test.local.sh @@ -7,7 +7,7 @@ cleanup () { rm -rf .java/ rm -rf .snakemake/ rm -rf logs/ - rm -rf results/ + # rm -rf results/ cd $user_dir echo "Exit status: $rc" } diff --git a/tests/test_scripts_labkey_to_snakemake_table/test.sh b/tests/test_scripts_labkey_to_snakemake_table/test.sh index 37014eda5e9fb41b564f6e4b78743c09511e5f3c..dd2707e95c8f98156c994440c63b4f15239e53b2 100755 --- a/tests/test_scripts_labkey_to_snakemake_table/test.sh +++ b/tests/test_scripts_labkey_to_snakemake_table/test.sh @@ -6,6 +6,7 @@ cleanup () { rm -rf .snakemake/ rm -rf config.yaml rm -rf samples.tsv + rm -rf logs cd $user_dir echo "Exit status: $rc" }