Snakemake vs MultiQC reports

added Discuss Need more info To Do labels

How tied are you to snakemake when you use snakemake reports? Is it compatible with Common Workflow Languages?

And how tied are you to MultiQC? Is it easy to replace the functionality of MultiQC? I mean, this has been developed for bioinformatic analyses.

Which one would be easier to maintain? This goes into the discussion I had with @kanitz about using ALFA and not creating the plots ourselves.

Completely tied - it is a built-in mechanism of snakemake which is being triggered by a report keyword inside the Snakefile, as in description. Therefore there is no "compatibility with CWL".

We are not tied to MultiQC at all, it is just an existing aggregator that became popular. It is very nice, because it automatically parses output of many well-known tools. It is a problem to parse custom scripts/plots/logs, though. No, in my opinion it would not be easy to replace the functionality of MultiQC. But this also refers to what does it mean "to replace a functionality". Would we aim to have an aggregator of our own or just create a custom HTML+CSS to "report the summaries"? Do we want to make as nice interactive layout and plots as MultiQC or just a HTML 1.0 simple page?

In my opinion it would be easier to maintain snakemake reports - mainly because it would be easier to keep everything under control without unnecessary overhead related to coding the data presentation.

Let me summarise some thought I have so far on the issue:

I personally think we should not mix up result plots in reports. We should either choose one of the reporting mechanisms or keep both - MultiQC for the information related to the data processing, Snakemake report for the information related to workflow execution. In case we choose to keep both - I am against plugging in custom plots (TIN/ALFA) into the Snakemake report.

MultiQC is nice because of the advanced interface and automatic parsing of the output of the tools we use (extracting statistics etc) I honestly doubt we could reach a similar level of clarity in results presentation if we were to code the report ourselves. Also, we could later have a nice server to collect all the report of all ZARP runs in MegaQC (https://github.com/ewels/MegaQC).

Snakemake reports is nice because it is a built in mechanism of the workflow management system, therefore this one has an easy access to technical information about workflow execution (notably: MultiQC has no way to learn the information about execution of a "workflow"). It can also provide access to a summary plots (related to a specific rule), but the presentation is nowhere near as fancy. A big advantage is that there is no constraint on the type of the plot to show. However (as far as I recall) these cannot be interactive plots as for MultiQC. Also we would have to write some summarising rules ourselves.

@katsanto I would propose to have a small transition period and see how the two play out in our pipeline.
I would happily implement a Snakemake report at the end with only the technical information related to workflow execution. After I get some feel for it I would be able to give you more insight into "should we replace one with the other?".

Sounds good!

OK, I'm on it

assigned to @bakma

I have included a mechanism to create Snakemake reports at the end of zarp. Merge request with these changes are under: !64 (merged)

In order to create a Snakemake report the bash test scripts execute:

# Create a Snakemake report after the workflow execution
snakemake \
    --snakefile="../../Snakefile" \
    --configfile="../input_files/config.yaml" \
    --report="snakemake_report.html"

Importantly, notice that Snakemake reports have to be created after the workflow finishes, as a separate call from the command line. Therefore the problem arises that the path to the HTML file has to be provided by the user, cannot be parsed from a config file. This is a potential inconvenience. However, since we would like to RO-crate the pipeline output it should not be a problem. We will move & package the files anyway, probably by another post-pipeline script(?)

So do we close this (at least for now) with !64 (merged) being merged? @bakma @katsanto @gypas?

Yes, IMO close for now, might revisit later as if we have to incorporate more results into reports or the workflow expands.

closed

Snakemake vs MultiQC reports

Child items 0

Activity