ModelCIF validation tool
This is a tool to check that the formatting of ModelCIF files complies with the ModelCIF format declaration (aka "dictionary"). Upon successful validation, a ModelCIF file can be extended with the dictionary version the file was compared to (option --extend-validated-file
). For more basic mmCIF validation, the dictionary of the underlying PDBx/mmCIF format is also available.
The easiest way to run validation is from Docker container.
The tool itself is a wrapper around the CifCheck
tool by RCSB.
If you have questions about ModelCIF validation, feel free to contact the MA team.
- ModelCIF validation tool
- Files in this directory
How to run the validation tool
This is just a description of the validation tool itself. When running it from inside a container, the command needs to be prefixed with the instructions to start the container. Find information for running the validation Docker container in "How to run the Docker container".
Upon completion, if there hasn't been any error running the command, the validation tool returns a concise report in JSON format. That output is meant to be input to a website or any kind of nicely formatted report. Output can also be stored as a JSON formatted file. If the tested ModelCIF file is fully compliant with the ModelCIF format, the JSON output has
-
status
"completed" - no messages in the
cifcheck-errors
list - no messages in the
diagnosis
list -
versions
of the dictionaries the file was tested against
Format violations will be listed in diagnosis
.
cifcheck-errors
gathers errors from the CifCheck
command. This has nothing to do with wrong formatting - messages in this list mean that CifCheck
has "crashed". This should not happen, possible issues with CifCheck
should be caught by the validation tool. Feel free to report them to the MA team.
The most basic way to invoke the validation tool is just with a ModelCIF file (example shows the command plus possible output):
$ validate-mmcif-file model.cif
{"cifcheck-errors":[],"status":"completed","diagnosis":[],"versions":[{"title":"mmcif_pdbx_v50.dic","version":"5.361","location":"https://raw.github.com/ihmwg/ModelCIF/master/base/mmcif_pdbx_v50.dic"},{"title":"mmcif_ma.dic","version":"1.4.3","location":"https://raw.github.com/ihmwg/ModelCIF/master/archive/mmcif_ma-v1.4.3.dic"}]}
$
Add dictionary information used for validation to ModelCIF file
Since both dictionaries, ModelCIF and PDBx/mmCIF, represent actively developed file formats, different versions exist. While extending them, quite some thinking goes into making only non-breaking changes. The idea is that a ModelCIF file formatted following dictionary version 1.3, is still valid with dictionary version 1.4. But the version number also tells you which features to expect in a ModelCIF file, so it seems like a good idea to keep the version inside the file.
The validation tool can add the version upon positive validation, enabled by --extend-validated-file
(-e
).
-e
can take an alternative file name to write the validated ModelCIF file to, e.g. if one wants to keep the original ModelCIF file unaltered:
$ validate-mmcif-file -e validated_model.cif model.cif
{"cifcheck-errors":[],"status":"completed","diagnosis":[],"versions":[{"title":"mmcif_pdbx_v50.dic","version":"5.361","location":"https://raw.github.com/ihmwg/ModelCIF/master/base/mmcif_pdbx_v50.dic"},{"title":"mmcif_ma.dic","version":"1.4.3","location":"https://raw.github.com/ihmwg/ModelCIF/master/archive/mmcif_ma-v1.4.3.dic"}]}
$
The last command will generate a new file validated_model.cf
upon positive validation (diagnosis
points to an empty list), with the versions
added to the _audit_conform
list inside the file.
To add the validation dictionaries to _audit_conform
in the original ModelCIF file, just invoke -e
without an alternative file name... well almost. By the way Python handles this kind of command line arguments, -e
consumes everything after it, that does not start with a -
, as a file name. So validate-mmcif-file -e model.cif
would mean that -e
assumes model.cif
as its file name but then the command fails because it is missing the ModelCIF file to be validated. The solution is either putting -e
at the beginning of the arguments list or after the ModelCIF file name at the very end, if there are no other command line arguments:
$ validate-mmcif-file model.cif -e
{"cifcheck-errors":[],"status":"completed","diagnosis":[],"versions":[{"title":"mmcif_pdbx_v50.dic","version":"5.361","location":"https://raw.github.com/ihmwg/ModelCIF/master/base/mmcif_pdbx_v50.dic"},{"title":"mmcif_ma.dic","version":"1.4.3","location":"https://raw.github.com/ihmwg/ModelCIF/master/archive/mmcif_ma-v1.4.3.dic"}]}
$
Base directory for associated files
For a ModelCIF file using the _ma_entry_associated_files
category, the validation tool tries to merge associated data into the ModelCIF file, if _ma_entry_associated_files.file_format
is cif
and _ma_entry_associated_files.file_content
is local pairwise QA scores
. That way the outsourced data is validated, too.
Command line argument --associates-dir
(-a
) is used to declare the base directory associated files are stored in. Inside the directory, the path must follow what is defined in _ma_entry_associated_files.file_url
. If the URL is just the file name, the file must be stored right in the associates directory. The following example works for _ma_entry_associated_files.file_url model_pae.cif
(grep
and ls
are just used to illustrate the data situation)
$ grep _ma_entry_associated_files.file_url model.cif
_ma_entry_associated_files.file_url model_pae.cif
$ ls extra
model_pae.cif
$ validate-mmcif-file -a extra model.cif
{"cifcheck-errors":[],"status":"completed","diagnosis":[],"versions":[{"title":"mmcif_pdbx_v50.dic","version":"5.361","location":"https://raw.github.com/ihmwg/ModelCIF/master/base/mmcif_pdbx_v50.dic"},{"title":"mmcif_ma.dic","version":"1.4.3","location":"https://raw.github.com/ihmwg/ModelCIF/master/archive/mmcif_ma-v1.4.3.dic"}]}
$
If the URL points to a subdirectory, this must be reflected by the associates directory tree declared to the validation tool. The following example illustrates that the extra
directory needs a pae
directory storing the associated file as expected by _ma_entry_associated_files.file_url
:
$ grep _ma_entry_associated_files.file_url model.cif
_ma_entry_associated_files.file_url pae/model_pae.cif
$ ls extra
pae
$ ls extra/pae
model_pae.cif
$ validate-mmcif-file -a extra model.cif
{"cifcheck-errors":[],"status":"completed","diagnosis":[],"versions":[{"title":"mmcif_pdbx_v50.dic","version":"5.361","location":"https://raw.github.com/ihmwg/ModelCIF/master/base/mmcif_pdbx_v50.dic"},{"title":"mmcif_ma.dic","version":"1.4.3","location":"https://raw.github.com/ihmwg/ModelCIF/master/archive/mmcif_ma-v1.4.3.dic"}]}
$
Misc. arguments
--help
(-h
) Print a help/ usage page for the validation tool.
--dict-sdb <SDB FILE>
(-d
) Format dictionary in (binary) SDB format used for validating a ModelCIF file. The Docker container comes with a SDB for ModelCIF (/usr/local/share/mmcif-dict-suite/mmcif_ma.sdb
) and one for the original PDBx/mmCIF (/usr/local/share/mmcif-dict-suite/mmcif_pdbx_v50.dic.sdb
) format.
--out-file <JSON FILE>
(-o
) Instead of printing the output to stdout
, store it in a JSON formatted file.
--verbose
(-v
) Be more talkative.
How to run the Docker container
Calling the validation tool (almost) stays the same, it just needs instructions to start the Docker container as a prefix:
$ docker run --rm -v /home/user/models:/data registry.scicore.unibas.ch/schwede/modelcif-converters/mmcif-dict-suite:latest validate-mmcif-file /data/model.cif
{"cifcheck-errors":[],"status":"completed","diagnosis":[],"versions":[{"title":"mmcif_pdbx_v50.dic","version":"5.361","location":"https://raw.github.com/ihmwg/ModelCIF/master/base/mmcif_pdbx_v50.dic"},{"title":"mmcif_ma.dic","version":"1.4.3","location":"https://raw.github.com/ihmwg/ModelCIF/master/archive/mmcif_ma-v1.4.3.dic"}]}
$
-
docker run
starts a new Docker container from imageregistry.scicore.unibas.ch/schwede/modelcif-converters/mmcif-dict-suite:latest
and executes thevalidate-mmcif-file
command inside the container. -
--rm
makes sure that the container is removed from the system once the job completed. -
-v
mounts directory/home/user/models
from the local host computer to/data
inside the Docker container, otherwise thevalidate-mmcif-file
command has no access to local files. The bind mount makes the ModelCIF file/home/user/models/model.cif
available as/data/model.cif
to commands executed bydocker run
. Keep in mind,validate-mmcif-file -e
andvalidate-mmcif-file -a
also need to refer to/data
(or any other local directory mounted in the Docker container).
How to get the Docker container
Before running the Docker container, you need a local copy of its image. There are three ways to get it:
-
docker run
will pull it automatically upon first call -
docker pull
the Docker image yourself before running it -
docker build
the Docker image from scratch
How to pull a copy of the Docker container from our registry
With docker pull
, the ready-made Docker image can be fetched from our Docker registry. Two kinds of Docker images are available, differentiated by tags. The latest
tag refers to the Docker image with the most recent ModelCIF dictionary. This should be the default choice. For specific use cases, e.g. debugging, we also provide Docker images for older versions of the ModelCIF dictionary, those are tagged with the version number of the dictionary. The latest
image is pulled like this:
$ docker pull registry.scicore.unibas.ch/schwede/modelcif-converters/mmcif-dict-suite:latest
How to build the Docker container from scratch
Here is the command we use to generate the Docker image. It works when executed from within the validation/
subdirectory of the Git repository:
docker build -t registry.scicore.unibas.ch/schwede/modelcif-converters/mmcif-dict-suite:latest .
When developing you own tools using the Docker image, there is one build argument that adds an editor, Black, Pylint and bash to ease working in interactive sessions inside the Docker container:
docker build --build-arg ADD_DEV=yes -t registry.scicore.unibas.ch/schwede/modelcif-converters/mmcif-dict-suite:dev .
The pyproject.toml
we use can be found in the Git repository root.
Files in this directory
Path | Content |
---|---|
Dockerfile | Build instructions for the Docker image |
README.md | This README |
entrypoint.sh | Script executed on Docker container start |
get-mmcif-dict-versions.py | Extract versions of mmCIF dictionaries, used for building the Docker image. Copied into the image as get-mmcif-dict-versions.py . |
validate-mmcif-file.py | Validation tool, copied into the image as validate-mmcif-file . |