Skip to content
Snippets Groups Projects
Stefan Bienert's avatar
Bienchen authored
aa83f1e8
History

ModelCIF validation tool

This is a tool to check that the formatting of ModelCIF files complies with the ModelCIF format declaration (aka "dictionary"). Upon successful validation, a ModelCIF file can be extended with the dictionary version the file was compared to (option --extend-validated-file). For more basic mmCIF validation, the dictionary of the underlying PDBx/mmCIF format is also available.

The easiest way to run validation is from Docker container.

The tool itself is a wrapper around the CifCheck tool by RCSB.

If you have questions about ModelCIF validation, feel free to contact the MA team.

How to run the validation tool

This is just a description of the validation tool itself. When running it from inside a container, the command needs to be prefixed with the instructions to start the container. Find information for running the validation Docker container in "How to run the Docker container".

Upon completion, if there hasn't been any error running the command, the validation tool returns a concise report in JSON format. That output is meant to be input to a website or any kind of nicely formatted report. Output can also be stored as a JSON formatted file. If the tested ModelCIF file is fully compliant with the ModelCIF format, the JSON output has

  • status "completed"
  • no messages in the cifcheck-errors list
  • no messages in the diagnosis list
  • versions of the dictionaries the file was tested against

Format violations will be listed in diagnosis.

cifcheck-errors gathers errors from the CifCheck command. This has nothing to do with wrong formatting - messages in this list mean that CifCheck has "crashed". This should not happen, possible issues with CifCheck should be caught by the validation tool. Feel free to report them to the MA team.

The most basic way to invoke the validation tool is just with a ModelCIF file (example shows the command plus possible output):

$ validate-mmcif-file model.cif
{"cifcheck-errors":[],"status":"completed","diagnosis":[],"versions":[{"title":"mmcif_pdbx_v50.dic","version":"5.361","location":"https://raw.github.com/ihmwg/ModelCIF/master/base/mmcif_pdbx_v50.dic"},{"title":"mmcif_ma.dic","version":"1.4.3","location":"https://raw.github.com/ihmwg/ModelCIF/master/archive/mmcif_ma-v1.4.3.dic"}]}
$ 

Add dictionary information used for validation to ModelCIF file

Since both dictionaries, ModelCIF and PDBx/mmCIF, represent actively developed file formats, different versions exist. While extending them, quite some thinking goes into making only non-breaking changes. The idea is that a ModelCIF file formatted following dictionary version 1.3, is still valid with dictionary version 1.4. But the version number also tells you which features to expect in a ModelCIF file, so it seems like a good idea to keep the version inside the file.

The validation tool can add the version upon positive validation, enabled by --extend-validated-file (-e).

-e can take an alternative file name to write the validated ModelCIF file to, e.g. if one wants to keep the original ModelCIF file unaltered:

$ validate-mmcif-file -e validated_model.cif model.cif
{"cifcheck-errors":[],"status":"completed","diagnosis":[],"versions":[{"title":"mmcif_pdbx_v50.dic","version":"5.361","location":"https://raw.github.com/ihmwg/ModelCIF/master/base/mmcif_pdbx_v50.dic"},{"title":"mmcif_ma.dic","version":"1.4.3","location":"https://raw.github.com/ihmwg/ModelCIF/master/archive/mmcif_ma-v1.4.3.dic"}]}
$ 

The last command will generate a new file validated_model.cf upon positive validation (diagnosis points to an empty list), with the versions added to the _audit_conform list inside the file.

To add the validation dictionaries to _audit_conform in the original ModelCIF file, just invoke -e without an alternative file name... well almost. By the way Python handles this kind of command line arguments, -e consumes everything after it, that does not start with a -, as a file name. So validate-mmcif-file -e model.cif would mean that -e assumes model.cif as its file name but then the command fails because it is missing the ModelCIF file to be validated. The solution is either putting -e at the beginning of the arguments list or after the ModelCIF file name at the very end, if there are no other command line arguments:

$ validate-mmcif-file model.cif -e
{"cifcheck-errors":[],"status":"completed","diagnosis":[],"versions":[{"title":"mmcif_pdbx_v50.dic","version":"5.361","location":"https://raw.github.com/ihmwg/ModelCIF/master/base/mmcif_pdbx_v50.dic"},{"title":"mmcif_ma.dic","version":"1.4.3","location":"https://raw.github.com/ihmwg/ModelCIF/master/archive/mmcif_ma-v1.4.3.dic"}]}
$ 

Base directory for associated files

For a ModelCIF file using the _ma_entry_associated_files category, the validation tool tries to merge associated data into the ModelCIF file, if _ma_entry_associated_files.file_format is cif and _ma_entry_associated_files.file_content is local pairwise QA scores. That way the outsourced data is validated, too.

Command line argument --associates-dir (-a) is used to declare the base directory associated files are stored in. Inside the directory, the path must follow what is defined in _ma_entry_associated_files.file_url. If the URL is just the file name, the file must be stored right in the associates directory. The following example works for _ma_entry_associated_files.file_url model_pae.cif (grep and ls are just used to illustrate the data situation)

$ grep _ma_entry_associated_files.file_url model.cif
_ma_entry_associated_files.file_url model_pae.cif
$ ls extra
model_pae.cif
$ validate-mmcif-file -a extra model.cif
{"cifcheck-errors":[],"status":"completed","diagnosis":[],"versions":[{"title":"mmcif_pdbx_v50.dic","version":"5.361","location":"https://raw.github.com/ihmwg/ModelCIF/master/base/mmcif_pdbx_v50.dic"},{"title":"mmcif_ma.dic","version":"1.4.3","location":"https://raw.github.com/ihmwg/ModelCIF/master/archive/mmcif_ma-v1.4.3.dic"}]}
$ 

If the URL points to a subdirectory, this must be reflected by the associates directory tree declared to the validation tool. The following example illustrates that the extra directory needs a pae directory storing the associated file as expected by _ma_entry_associated_files.file_url:

$ grep _ma_entry_associated_files.file_url model.cif
_ma_entry_associated_files.file_url pae/model_pae.cif
$ ls extra
pae
$ ls extra/pae
model_pae.cif
$ validate-mmcif-file -a extra model.cif
{"cifcheck-errors":[],"status":"completed","diagnosis":[],"versions":[{"title":"mmcif_pdbx_v50.dic","version":"5.361","location":"https://raw.github.com/ihmwg/ModelCIF/master/base/mmcif_pdbx_v50.dic"},{"title":"mmcif_ma.dic","version":"1.4.3","location":"https://raw.github.com/ihmwg/ModelCIF/master/archive/mmcif_ma-v1.4.3.dic"}]}
$ 

Misc. arguments

--help (-h) Print a help/ usage page for the validation tool.

--dict-sdb <SDB FILE> (-d) Format dictionary in (binary) SDB format used for validating a ModelCIF file. The Docker container comes with a SDB for ModelCIF (/usr/local/share/mmcif-dict-suite/mmcif_ma.sdb) and one for the original PDBx/mmCIF (/usr/local/share/mmcif-dict-suite/mmcif_pdbx_v50.dic.sdb) format.

--out-file <JSON FILE> (-o) Instead of printing the output to stdout, store it in a JSON formatted file.

--verbose (-v) Be more talkative.

How to run the Docker container

Calling the validation tool (almost) stays the same, it just needs instructions to start the Docker container as a prefix:

$ docker run --rm -v /home/user/models:/data registry.scicore.unibas.ch/schwede/modelcif-converters/mmcif-dict-suite:latest validate-mmcif-file /data/model.cif
{"cifcheck-errors":[],"status":"completed","diagnosis":[],"versions":[{"title":"mmcif_pdbx_v50.dic","version":"5.361","location":"https://raw.github.com/ihmwg/ModelCIF/master/base/mmcif_pdbx_v50.dic"},{"title":"mmcif_ma.dic","version":"1.4.3","location":"https://raw.github.com/ihmwg/ModelCIF/master/archive/mmcif_ma-v1.4.3.dic"}]}
$ 
  • docker run starts a new Docker container from image registry.scicore.unibas.ch/schwede/modelcif-converters/mmcif-dict-suite:latest and executes the validate-mmcif-file command inside the container.
  • --rm makes sure that the container is removed from the system once the job completed.
  • -v mounts directory /home/user/models from the local host computer to /data inside the Docker container, otherwise the validate-mmcif-file command has no access to local files. The bind mount makes the ModelCIF file /home/user/models/model.cif available as /data/model.cif to commands executed by docker run. Keep in mind, validate-mmcif-file -e and validate-mmcif-file -a also need to refer to /data (or any other local directory mounted in the Docker container).

How to get the Docker container

Before running the Docker container, you need a local copy of its image. There are three ways to get it:

  • docker run will pull it automatically upon first call
  • docker pull the Docker image yourself before running it
  • docker build the Docker image from scratch

How to pull a copy of the Docker container from our registry

With docker pull, the ready-made Docker image can be fetched from our Docker registry. Two kinds of Docker images are available, differentiated by tags. The latest tag refers to the Docker image with the most recent ModelCIF dictionary. This should be the default choice. For specific use cases, e.g. debugging, we also provide Docker images for older versions of the ModelCIF dictionary, those are tagged with the version number of the dictionary. The latest image is pulled like this:

$ docker pull registry.scicore.unibas.ch/schwede/modelcif-converters/mmcif-dict-suite:latest

How to build the Docker container from scratch

Here is the command we use to generate the Docker image. It works when executed from within the validation/ subdirectory of the Git repository:

docker build -t registry.scicore.unibas.ch/schwede/modelcif-converters/mmcif-dict-suite:latest .

When developing you own tools using the Docker image, there is one build argument that adds an editor, Black, Pylint and bash to ease working in interactive sessions inside the Docker container:

docker build --build-arg ADD_DEV=yes -t registry.scicore.unibas.ch/schwede/modelcif-converters/mmcif-dict-suite:dev .

The pyproject.toml we use can be found in the Git repository root.

Files in this directory

Path Content
Dockerfile Build instructions for the Docker image
README.md This README
entrypoint.sh Script executed on Docker container start
get-mmcif-dict-versions.py Extract versions of mmCIF dictionaries, used for building the Docker image. Copied into the image as get-mmcif-dict-versions.py.
validate-mmcif-file.py Validation tool, copied into the image as validate-mmcif-file.