"We calculate TIN scores with a containerized in-house script. MultiQC have a limited support for external tools/data. The most reasonable way for now would be to include a .png with TIN score boxplots, generated by another scripts of ours. I don't think it makes sense to prepare a separate container though, so how about I write a simple scripts that generates boxplots, we put it under the same TIN container we already have for TIN calculation and I add a rule with plot generation to Snakefile?"
I would need to write two scrips:
merge TIN tables, as these are calculated separately for every file
Create boxplots from the merged table
I think these two scripts should be placed in the same container as the TIN score calculation since these are highly-related, consecutive operations. Therefore we would have an encapsulated functionality: "TIN scores".
Sounds good. Gave you "developer" status on the TIN score calculation repo, so you can add your scripts, add tests and update Dockerfile and CI accordingly
@kanitz
Please take a look at the structure of that repo.
I am about to add 2 scripts under src and 3 TSV files under tests. I will adjust README, CI yaml and contributors.
What about requirements? Why do we have two versions of this file? Is the script in Python 2 or 3 in the end? It matters for me since I guess I will have to write my scripts in the same Python version as the TIN calculation...
We have many Dockerfiles, should I update all three of them?
As for requirements: the original script was written in Python2. After linting re-factoring I think it also works in Python3, but I haven't extensively tested it - so no guarantees. I've left the requirements in in case somebody was interested to try/use it in Python3. But as I said, you'd have to try it out on a real library and compare it to what's expected from Python2. Might be worth it actually, given that Python2 is now deprecated...
As for Dockerfiles: Nah, just change the one that we're eventually gonna use. That should probably be the "slim" one. And please, try to keep it slim :)
Hi Maciek, I've removed the milestone as I don't think having the plots for TIN score is critical for the first version, ideally by next Friday (March 20). If by then you have the TIN plots included as well - great. If not it will be included in one of the next versions. I'll send an email about this later to put a stop on adding issues for v0.1.0 unless they are critical (bugs).
I am already working on this and the branch I develop for MultiQC is already designed in a sense to expect an external boxplot and it makes no sense to throw this away. I am working on these issues in parallel and it would require even more work from me to revert to a version where I actually do not have TIN boxplots. I am adding the milestone again,
@bakma: Great then - the point is that from an management point of view it's not a requirement to have this in the milestone. If it is a requirement for you that's a different story (you can't expect us to always know what your internal dependencies are, you need to manage that yourself). Nobody said it should not be in, it's just not necessary. The important thing is that we will have MultiQC in. If anybody can just keep on adding whatever they think is important to a milestone, it kinda defeats the purpose of milestones. But of course, any issue that is included by then makes the release better
I was testing the pipeline and found that calculated TIN scores are located not in the actual output file, but in the log (see here: /scicore/home/zavolan/boersch/ALS_project/rnaseqpipeline/logs_test/paired_end/MN4_d28_FUS_KO_A). The log file calculate_TIN_scores.log looks like this:
The actual file with TIN scores is empty, plots are also empty. See files TIN_scores_boxplot.pdf, TIN_scores_boxplot.png and TIN_scores_merged.tsv here:
/scicore/home/zavolan/boersch/ALS_project/rnaseqpipeline/results_test
There was a minor bug in Ralf script: sys.stderr instead of sys.stdout at the end. We did not notice it previously since running the script in the terminal with no stream redirection resulted in everything nicely printed back in the command line. Then in this pipeline we have been previously redirecting both streams to the same file (&>) which resulted in a similar misinterpretation. It was easy to overlook then... This is a perfect example why it is important to keep separate streams, though :)
We will have to ask @kanitz to merge and re-build the dockerfiles once again, sorry, Alex ;)
This repo will not be affected, however you might need to rebuild the environment to pull and updated singularity image. Please let me know if the problem won't be fixed