MultiQC is very useful in obtaining summary reports of runs.
However, snakemake also provides a report as pointed out by @bakma in issue #118 (closed) In that case, all summary figures are created by custom rules, and there is an added benefit of having runtime reports. The question now is: do we want to keep only one of the two, keep the best parts of each to avoid the overhead pf snakemake reports with the figure creation, or have a temporary solution for now and a plan for extension in v1.0.0? This is more pertinent to @zavolan@bakma@kanitz , but if you have insights on use and scenarios, contribute here.
To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information
Child items 0
Show closed items
No child items are currently assigned. Use child items to break down this issue into smaller parts.
Linked items 0
Link issues together to show that they're related.
Learn more.
Completely tied - it is a built-in mechanism of snakemake which is being triggered by a report keyword inside the Snakefile, as in description. Therefore there is no "compatibility with CWL".
We are not tied to MultiQC at all, it is just an existing aggregator that became popular. It is very nice, because it automatically parses output of many well-known tools. It is a problem to parse custom scripts/plots/logs, though.
No, in my opinion it would not be easy to replace the functionality of MultiQC. But this also refers to what does it mean "to replace a functionality". Would we aim to have an aggregator of our own or just create a custom HTML+CSS to "report the summaries"? Do we want to make as nice interactive layout and plots as MultiQC or just a HTML 1.0 simple page?
In my opinion it would be easier to maintain snakemake reports - mainly because it would be easier to keep everything under control without unnecessary overhead related to coding the data presentation.
Let me summarise some thought I have so far on the issue:
I personally think we should not mix up result plots in reports. We should either choose one of the reporting mechanisms or keep both - MultiQC for the information related to the data processing, Snakemake report for the information related to workflow execution. In case we choose to keep both - I am against plugging in custom plots (TIN/ALFA) into the Snakemake report.
MultiQC is nice because of the advanced interface and automatic parsing of the output of the tools we use (extracting statistics etc) I honestly doubt we could reach a similar level of clarity in results presentation if we were to code the report ourselves. Also, we could later have a nice server to collect all the report of all ZARP runs in MegaQC (https://github.com/ewels/MegaQC).
Snakemake reports is nice because it is a built in mechanism of the workflow management system, therefore this one has an easy access to technical information about workflow execution (notably: MultiQC has no way to learn the information about execution of a "workflow"). It can also provide access to a summary plots (related to a specific rule), but the presentation is nowhere near as fancy. A big advantage is that there is no constraint on the type of the plot to show. However (as far as I recall) these cannot be interactive plots as for MultiQC. Also we would have to write some summarising rules ourselves.
@katsanto
I would propose to have a small transition period and see how the two play out in our pipeline.
I would happily implement a Snakemake report at the end with only the technical information related to workflow execution. After I get some feel for it I would be able to give you more insight into "should we replace one with the other?".
I have included a mechanism to create Snakemake reports at the end of zarp.
Merge request with these changes are under: !64 (merged)
In order to create a Snakemake report the bash test scripts execute:
# Create a Snakemake report after the workflow executionsnakemake \--snakefile="../../Snakefile"\--configfile="../input_files/config.yaml"\--report="snakemake_report.html"
Importantly, notice that Snakemake reports have to be created after the workflow finishes, as a separate call from the command line. Therefore the problem arises that the path to the HTML file has to be provided by the user, cannot be parsed from a config file. This is a potential inconvenience. However, since we would like to RO-crate the pipeline output it should not be a problem. We will move & package the files anyway, probably by another post-pipeline script(?)