Skip to content
Snippets Groups Projects
Commit bc6a3bd2 authored by cstritt's avatar cstritt
Browse files

Assembly pipeline including a short-read polishing step, for older PacBio Reads

parent d9b6e7bc
Branches rs2
No related tags found
No related merge requests found
# Genome assembly workflow
Adapted for older PacBio reads: includes a short-read polishing step. The sample table (samples.tsv) requires to additional columns, sr1 and sr2, indicating paths to forward and reverse short reads.
The genome assembly workflow includes the following tools/steps:
- [LongQC](https://doi.org/10.1534/g3.119.400864): Get some read summary statistics. The reads are not modified in any way before assembly,
- [Flye](https://doi.org/10.1038/s41587-019-0072-8): Assembly.
- [Polypolish](https://github.com/rrwick/Polypolish): Polishing.
- [circlator](https://doi.org/10.1186/s13059-015-0849-0): Reorient the assembly such that it begins with dnaA.
- [bakta](https://doi.org/10.1099/mgen.0.000685): Annotate the reoriented assembly.
- [minimap2](https://doi.org/10.1093/bioinformatics/bty191): Map the long reads back against the assembly. The resulting alignments can be used to check for inconsistencies between reads and assemblies.
......
......@@ -6,6 +6,8 @@ include: "rules/readQC.smk"
include: "rules/assemble.smk"
include: "rules/polish.smk"
include: "rules/circularize.smk"
if config["annotate"] == "Yes":
......
......@@ -13,7 +13,7 @@ rule flye:
shell:
"""
flye \
--pacbio-hifi {input} \
--pacbio-raw {input} \
--genome-size {params.genome_size} \
--iterations {params.iterations} \
--threads {threads} \
......
rule circlator_mapreads:
input:
assembly = config["outdir"] + "/{sample}/flye/assembly.fasta",
assembly = config["outdir"] + "/{sample}/polypolish/assembly.polished.fasta",
reads = lambda wildcards: expand(samples[wildcards.sample].longread_fastq)
output: config["outdir"] + "/{sample}/circlator/01.mapreads.bam"
params:
......
rule align_SR:
input:
assembly = config["outdir"] + "/{sample}/flye/assembly.fasta",
sr1 = lambda wildcards: samples[wildcards.sample].sr1,
sr2 = lambda wildcards: samples[wildcards.sample].sr2
output:
aln1 = config["outdir"] + "/{sample}/polypolish/aln_R1.sam",
aln2 = config["outdir"] + "/{sample}/polypolish/aln_R2.sam"
threads: config["threads_per_job"]
shell:
"""
bwa index {input.assembly}
bwa mem -t {threads} -a {input.assembly} {input.sr1} > {output.aln1}
bwa mem -t {threads} -a {input.assembly} {input.sr2} > {output.aln2}
"""
rule insert_filter:
input:
aln1 = config["outdir"] + "/{sample}/polypolish/aln_R1.sam",
aln2 = config["outdir"] + "/{sample}/polypolish/aln_R2.sam"
output:
alnfilt1 = config["outdir"] + "/{sample}/polypolish/aln_R1.filtered.sam",
alnfitl2 = config["outdir"] + "/{sample}/polypolish/aln_R2.filtered.sam"
shell:
"""
polypolish filter --in1 {input.aln1} --in2 {input.aln2} --out1 {output.alnfilt1} --out2 {output.alnfilt2}
"""
rule polypolish:
input:
assembly = config["outdir"] + "/{sample}/flye/assembly.fasta",
alnfil1 = config["outdir"] + "/{sample}/polypolish/aln_R1.filtered.sam",
alnfil2 = config["outdir"] + "/{sample}/polypolish/aln_R2.filtered.sam"
output: config["outdir"] + "/{sample}/polypolish/assembly.polished.fasta"
shell:
"""
polypolish polish {input.assembly} {input.alnfilt1} {input.alnfilt2} > {output}
"""
\ No newline at end of file
......@@ -8,7 +8,7 @@ rule longQC:
shell:
"""
rm -r {params.outfolder}
python /LongQC/longQC.py sampleqc -x pb-sequel -o {params.outfolder}/ -p {params.threads} {input}
python /LongQC/longQC.py sampleqc -x pb-r2 -o {params.outfolder}/ -p {params.threads} {input}
"""
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment