Skip to content
Snippets Groups Projects
Commit dca40749 authored by Christoph Stritt's avatar Christoph Stritt
Browse files

Duplicate read removal step added in circularize rule

parent 3e441ff6
No related branches found
No related tags found
No related merge requests found
......@@ -16,7 +16,7 @@ default-resources:
restart-times: 3
max-jobs-per-second: 10
max-status-checks-per-second: 1
local-cores: 1
local-cores: 20
latency-wait: 60
jobs: 500
keep-going: True
......
......@@ -2,8 +2,10 @@
#
##############################
samples: config/samples.tsv
outdir: ./results
samples: config/samples.tsv # overwritten by run_assembly_pipeline.py
outdir: ./results # overwritten by run_assembly_pipeline.py
annotate: "No"
ref:
genome_size: 4.4m
......@@ -12,7 +14,7 @@ ref:
bakta_db: /scicore/home/gagneux/GROUP/PacbioSnake_resources/databases/bakta_db
container: /scicore/home/gagneux/GROUP/PacbioSnake_resources/containers/assemblySC.sif
threads_per_job: 4
threads_per_job: 10 # Max. 20
assembly_iterations: 3
......
......@@ -24,8 +24,27 @@ rule circlator_bam2reads:
"""
rule circlator_removeduplicates:
input: config["outdir"] +"/{sample}/circlator/02.bam2reads.fasta"
output: config["outdir"] +"/{sample}/circlator/02.bam2reads.nodup.fasta"
run:
import sys
from Bio import SeqIO
record_dict = {}
for record in SeqIO.parse(input[0], "fasta"):
record_dict[record.id] = record
# record_dict = SeqIO.to_dict(SeqIO.parse(input[0], "fasta")) # Does not allow duplicate entries...
with open(output[0], "w") as output_handle:
SeqIO.write(record_dict.values(), output_handle, "fasta")
rule circlator_localassembly:
input: config["outdir"] + "/{sample}/circlator/02.bam2reads.fasta"
input: config["outdir"] + "/{sample}/circlator/02.bam2reads.nodup.fasta"
output: config["outdir"] + "/{sample}/circlator/03.assemble/assembly.fasta"
params:
outdir = config["outdir"] + "/{sample}/circlator/03.assemble",
......
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Please register or to comment