ZARP wiki
Meeting notes
Scope statement
What similar workflows are out there?
Design discussions
Discussion points
- What is the user story?
- What are the defined use cases?
- What sets ZARP apart from other similar efforts?
Road map
- Test current version with real-world data
- In the meantime: Plan future development
- Plan upstream API
- Interface with SRA
- Derive library params from data
- Plan user interaction (CLI, UI, website)
- Prepare for integration into web service?
- Discuss code organization, especially with respect to extending functionality
- Tools for individual samples
- Tools for multiple samples
- Tools for comparing groups of samples
- Discuss reuse and storage of results
- Minimize storage
- Minimize reruns
- How to determine whether samples have been run (database, folder structure, hashes)
- How to pass data between consecutive stages/pipelines (compare code organization)
- Plan downstream API
- How to extend MultiQC report?
- Snakemake report?
- Directory and file naming structure
- Prepare for integration into web service?
- Overall code organization
- How to coordinate different pipelines
- How to deal with interactive parts in between
- Plan upstream API
Other RNA-Seq analysis pipelines
Pipeline comparison in table https://docs.google.com/spreadsheets/d/1P4eefWLBTbCLYizajXTrG6lC016ASDMEuW06makzZvo/edit?usp=sharing
Published
date | name - link |
---|---|
2017 | aRNApipe - https://academic.oup.com/bioinformatics/article/33/11/1727/2929343 |
2017 | RNACocktail - https://www.nature.com/articles/s41467-017-00050-4 |
2018 | VIPER - https://link.springer.com/article/10.1186/s12859-018-2139-9 |
2018 | hppRNA - https://academic.oup.com/bib/article-abstract/19/4/622/2918128 |
2019 | ARMOR - https://www.g3journal.org/content/9/7/2089 |
2019 | UTAP - https://bmcbioinformatics.biomedcentral.com/articles/10.1186/s12859-019-2728-2 |
2019 | BISR - https://link.springer.com/article/10.1186/s12859-019-3251-1 |
2019 | GEO2RNAseq - https://www.biorxiv.org/content/10.1101/771063v1 |
2019 | Shiny-Seq - https://bmcresnotes.biomedcentral.com/articles/10.1186/s13104-019-4471-1 |
https://github.com/nf-core/rnaseq | |
NGS-pipe - https://github.com/cbg-ethz/NGS-pipe/wiki |
Metrics for comparing pipelines for RNA-seq analysis
- Input
- size of the input that needs to be provided
- type: Accessions - repositories / Fastq files
- automated inferences: type of sample preparation, adaptors
- custom genomes and annotations possible?
- Output
- sample quality metrics
- complexity
- mapping rate
- duplication level
- saturation level
- RNA integrity
- bam files
- quantification
- gene level
- isoform level
- annotation categories
- proportion structural RNAs
- includes multi-mappers
- inference
- novel isoforms: splicing/polyadenylation
- SNPs/mutations
- sample quality metrics
- Workflow engine
- Modern (CWL, WDL, Snakemake, Galaxy, Nextflow)
- Legacy (anything else that's not homegrown)
- Custom
- User interface
- Browser app
- Desktop app
- Mobile app
- Command line interface (programmable)
- HTTP/"REST" API available (programmable)
- Supported platforms
- Cloud (AWS, GCP, Azure, ...)
- Kubernetes / OpenShift
- HPC (Slurm, SGE/OGE/UGA, ...)
- Desktop/laptop
- Installation
- Type
- VM
- Containerized
- Virtual env cross-language (conda)
- Virtual env single language (e.g., virtualenv)
- Package manager without virtualization (e.g., pip, Bioconductor)
- Manual install
- Instructions & auxiliary scripts available for supported platforms
- Type
- Openness
- License
- Open source development encouraged?
- Contribution instructions available?
- Contact options
- Issue tracker
- Q/A forum (e.g., Biostars)
- Chat (e.g., Slack or Gitter channel)
- Maintenance
- Actively developed?
- number of contributors
- last update
- frequency of commits
- Tests available
- Test code coverage published?
- Actively developed?
- Usability
- Clean, well-documented, programmable, easy-to-use, intuitive API for input and outputs?
- Workflow can be started quickly for one sample (incl. preparation of configs)
- Workflow can be started quickly for multiple samples (incl. preparation of configs)
- Supports local data
- Supports uploading data
- Supports cloud data (e.g., Amazon S3)
- Interfacing with SRA and/or other relevant data repos
- Interfacing with Ensembl and/or other genome resource repos
- Tooling available to filter genome resources or compile custom ones?