Skip to content
Snippets Groups Projects

var3d

Stitch it together - The variant annotation pipeline of your dreams. var3d is a textbook example for the strategy pattern in software design. It provides standardized interfaces to 1) Annotate sequences 2) Import variants from arbitrary sources 3) Annotate those variants only using sequence features 4) Import structures from arbitrary sources 5) Annotate variants also using structural features. Actual implementations of those interfaces are available and can be complemented by the user to obtain flexible variant annotation workflows.

var3d annotation workflow

alt text

To make this work, var3d implements the {class}var3d.base.Variant and {class}var3d.base.Structure classes. Processing is performed using the following standardized interfaces:

  • {class}var3d.base.VarImporter: Imports variants from arbitrary sources
  • {class}var3d.base.StructImporter: Imports structures from arbitrary sources
  • {class}var3d.base.SeqAnno: Annotates sequences (variant independent)
  • {class}var3d.base.StructAnno: Annotates structures (variant independent)
  • {class}var3d.base.VarSeqAnno: Annotates variants only using sequence features
  • {class}var3d.base.VarStructAnno: Annotates variants using sequence and structural features

A full annotation workflow consists of two steps that are implemented with the {class}var3d.pipeline.DataImportPipeline and {class}var3d.pipeline.AnnotationPipeline. They take user defined lists of interface implementations to derive a standardized annotation output in json format.

Executing var3d code

Using many external tools sometimes comes with funky dependencies which are hard to satisfy in an HPC environment. We opted for a Singularity container approach. No container is shipped in the repository, so you need to build a container as a mandatory setup step. Check the README in the singularity directory for instructions.

Container execution gets simplified by the v3d wrapper script available in the root directory of the var3d repository. It expects a container named var3d.sif in the singularity directory, i.e. the container built with the specified instructions. Details are given when executing with --help:

$ <path_to_var3d_repo>/v3d --help
usage: v3d [-h] [--singularity_exec SINGULARITY_EXEC] [--sif SIF]
           [--module_mount_path MODULE_MOUNT_PATH] [--mount MOUNT] [--test]
           [--doc] [--doc_out DOC_OUT] [--notebook]

Var3D runscript - Convenience wrapper to execute Python scripts using ost in a
container. All arguments preceding an item ending with ".py" are considered
arguments for v3d, the rest is considered input arguments for ost, i.e. you
can execute "./v3d my_script.py --script_arg XYZ script_arg1 script_arg2".
Path to default container is estimated based on this scripts location (see sif
argument). The var3d Python module becomes available through mounting magic
(see module_mount_path argument). We can thus edit var3d code on your local
machine and it has a direct effect when running the code in the container.
More mounting is required if the executed script requires access to any
directory beyond what Singularity mounts by default (see Singularity docs).
E.g. If you want to access file /this/is/an/absolute/path/x you must
explicitely mount /this/is/an/absolute/path with the mount argument.

optional arguments:
  -h, --help            show this help message and exit
  --singularity_exec SINGULARITY_EXEC
                        Path to singularity executable, defaults to
                        singularity (i.e. takes what is in the system path)
  --sif SIF             Path to singularity image file, defaults to
                        <dir_containing_v3d_exec>/singularity/var3d.sif
  --module_mount_path MODULE_MOUNT_PATH
                        Path IN the container where the var3d module gets
                        mounted. Must be in the container's
                        PYTHONPATH.Defaults to the first path returned by
                        'singularity exec <sif> printenv PYTHONPATH' if not
                        given. The var3d module from the local host is
                        <dir_containing_v3d_exec>/var3d.
  --mount MOUNT         Mounts all given paths into the singularity container,
                        i.e. adds '--bind <p>:<p>' for each p to the
                        singularity exec command.
  --test                Runs the unit tests
  --doc                 Builds HTML documentation, out dir is set with
                        --doc_out (default: var3d_doc)
  --doc_out DOC_OUT     Output directory if --doc enabled
  --notebook            instead of ost, a jupyter notebook server is started
                        with "jupyter notebook --no-browser". Additional args
                        are appended to that command.

You can execute the unit tests of the project, i.e. all tests defined in <path_to_var3d_repo>/tests:

$ <path_to_var3d_repo>/v3d --test

build the HTML documentation:

$ <path_to_var3d_repo>/v3d --doc --doc_out var3d_html_doc

fire a Jupyter notebook server for data analysis:

$ <path_to_var3d_repo>/v3d --notebook juypyter_arg1 jupyter_arg2

or execute an arbitrary Python script that makes use of the var3d Python module:

$ <path_to_var3d_repo>/v3d my_script.py arg1 arg2 arg3

Be aware of the mounting behaviour when using Singularity containers. More info on that can be found here. So if your pipeline requires access to files not available in these mounted directories, you have to make them available manually. Mount the directories containing the required files with the --mount flag in the v3d runscript. So let's say your script requires access to databases in two different directories that are not automatically mounted by Singularity, this might look like:

$ <path_to_var3d_repo>/v3d --mount /path/a --mount /path/b my_script.py arg1 arg2 arg3

Interface Implementations

  • VarImporter
    • {class}var3d.var_importer.HRVarImporter - Parses variants from human readable (HR) strings, e.g. A123AB
    • {class}var3d.var_importer.UniprotEntryVarImporter - Fetches and parses variants from uniprot entry
  • StructImporter
    • {class}var3d.struct_importer.SMRStructImporter - Fetches structures from the SWISS-MODEL repository (SMR)
    • {class}var3d.struct_importer.FilesystemStructImporter - Fetches user defined structures from disk
  • SeqAnno
    • {class}var3d.seq_anno.EntropySeqAnno - Shannon entroy based on MSA
    • {class}var3d.seq_anno.ConsurfSeqAnno - Annotation with ConsurfDB pipeline [1]
    • {class}var3d.seq_anno.AAIndexSeqAnno - Annotations based on the AAindex DB [2]
  • StructAnno
    • {class}var3d.struct_anno.AccessibilityStructAnno - Annotations based on Solvent Accessibilitites
    • {class}var3d.struct_anno.TransmembraneStructAnno - Classifies is structure has transmembrane like properties. If yes, the optimal membrane positioning is added too.
    • {class}var3d.struct_anno.InterfaceStructAnno - Annotates interface residues
    • {class}var3d.struct_anno.PLIPStructAnno - Annotates protein ligand interactions with PLIP [3]
  • VarSeqAnno
    • {class}var3d.var_seq_anno.ProveanVarSeqAnno - Variant annotation based on provean [4]
    • {class}var3d.var_seq_anno.AAIndexVarSeqAnno - Variant annotation based on the AAindex DB [2]

References

[1] Chorin A.B., Masrati G., Kessel A., Narunsky A., Sprinzak J., Lahav S., Ashkenazy H., Ben-Tal N. (2019). ConSurf-DB: An accessible repository for the evolutionary conservation patterns of the majority of PDB proteins. Protein Sci.

[2] Kawashima S. and Kanehisa M. (2000). AAindex: amino acid index database. Nucleic Acids Res.

[3] Adasme M.F., Linnemann K.L., Bolz S.N., Kaiser F., Salentin S., Haupt V.J., Schroeder M. (2021). PLIP 2021: expanding the scope of the protein–ligand interaction profiler to DNA and RNA. Nucleic Acids Res.

[4] Choi Y., Sims G.E., Murphy S., Miller J.R., Chan A.P. (2012). Predicting the Functional Effect of Amino Acid Substitutions and Indels. PLoS ONE