This is a tool to check that the formatting of [ModelCIF](https://mmcif.wwpdb.org/dictionaries/mmcif_ma.dic/Index/) files complies with the ModelCIF format declaration (aka "dictionary"). Upon successful validation, a ModelCIF file can be extended with the dictionary version the file was compared to (option [`--extend-validated-file`](#add-dictionary-information-used-for-validation-to-modelcIF-file)). For more basic [mmCIF](https://mmcif.wwpdb.org) validation, the dictionary of the underlying [PDBx/mmCIF](https://mmcif.wwpdb.org/dictionaries/mmcif_pdbx_v50.dic/Index/) format is also available.
This is a tool to check that the formatting of [ModelCIF](https://mmcif.wwpdb.org/dictionaries/mmcif_ma.dic/Index/) files complies with the ModelCIF format declaration (aka "dictionary"). Upon successful validation, a ModelCIF file can be extended with the dictionary version the file was compared to (option [`--extend-validated-file`](#add-dictionary-information-used-for-validation-to-modelcif-file)). For more basic [mmCIF](https://mmcif.wwpdb.org) validation, the dictionary of the underlying [PDBx/mmCIF](https://mmcif.wwpdb.org/dictionaries/mmcif_pdbx_v50.dic/Index/) format is also available.
The easiest way to run validation is from [Docker](https://www.docker.com) container.
...
...
@@ -16,46 +16,51 @@ This is just a description of the [validation tool](./validate-mmcif-file.py) it
Upon completion, if there hasn't been any error running the command, the validation tool returns a concise report in [JSON](https://www.json.org/json-en.html) format. That output is meant to be input to a website or any kind of nicely formatted report. Output can also be stored as a JSON formatted file. If the tested ModelCIF file is fully compliant with the ModelCIF format, the JSON output has
-`status` "completed"
- no messages in the `cifcheck-errors` list
- no messages in the `diagnosis` list
-`versions` of the dictionaries the file was tested against
Format violations will be listed in `diagnosis`.
`cifcheck-errors` gathers errors from the `CifCheck` command. This has nothing to do with wrong formatting - messages in this list mean that `CifCheck` has "crashed". This should not happen, possible issues with `CifCheck` should be caught by the validation tool. Feel free to report them.
The most basic way to invoke the validation tool is just with a ModelCIF file (example shows the command plus possible output):
### Add dictionary information used for validation to ModelCIF file
Since both dictionaries, ModelCIF and PDBx/mmCIF, represent actively developed file formats, different versions exist. While extending them, quite some thinking goes into making only non-breaking changes. The idea is that a ModelCIF file formatted following dictionary 1.3, is still valid with version 1.4. But the version number also tells you which feature to expect from a ModelCIF file, so it seems like a good idea to keep the version inside the file.
Since both dictionaries, ModelCIF and PDBx/mmCIF, represent actively developed file formats, different versions exist. While extending them, quite some thinking goes into making only non-breaking changes. The idea is that a ModelCIF file formatted following dictionary 1.3, is still valid with version 1.4. But the version number also tells you which features to expect in a ModelCIF file, so it seems like a good idea to keep the version inside the file.
The validation tool can add the version upon positive validation, enabled by the `--extend-validated-file` (`-e`).
`-e` can take an alternative file name to write the validated ModelCIF file to, e.g. if one wants to keep the original ModelCIF file unaltered:
The last command will generate a new file `validated_model.cf` upon positive validation (`diagnosis` points to an empty list), with the `versions` added to the `_audit_conform` list inside the file.
The last command will generate a new file `validated_model.cf` upon positive validation (`diagnosis` points to an empty list), with the `versions` added to the [`_audit_conform`](https://mmcif.wwpdb.org/dictionaries/mmcif_ma.dic/Categories/audit_conform.html) list inside the file.
To add the validation dictionaries to `_audit_conform` in the original ModelCIF file, just invoke `-e` without an alternative file name... well almost. By the way Python handles this kind of command line arguments, `-e` consumes everything after it, that does not start with a `-`, as a file name. So `validate-mmcif-file -e model.cif` would mean that `-e` assumes `model.cif` as its file name but then the command fails because it is missing the ModelCIF file to be validated. The solution is either putting `-e` at the beginning of the arguments list or after the ModelCIF file name at the very end, if there are no other command line arguments:
For a ModelCIF file using the `_ma_entry_associated_files` category, the validation tool tries to merge associated data into the ModelCIF file, if `_ma_entry_associated_files.file_format` is `cif`. That way the outsourced data is validated, too.
For a ModelCIF file using the [`_ma_entry_associated_files`](https://mmcif.wwpdb.org/dictionaries/mmcif_ma.dic/Categories/ma_entry_associated_files.html) category, the validation tool tries to merge associated data into the ModelCIF file, if [`_ma_entry_associated_files.file_format`](https://mmcif.wwpdb.org/dictionaries/mmcif_ma.dic/Items/_ma_entry_associated_files.file_format.html) is `cif` and [`_ma_entry_associated_files.file_content`](https://mmcif.wwpdb.org/dictionaries/mmcif_ma.dic/Items/_ma_entry_associated_files.file_content.html) is `local pairwise QA scores`. That way the outsourced data is validated, too.
Command line argument `--associates-dir` (`-a`) is used to declare the base directory associated files are stored in. Inside the directory, the path must follow what is defined in `_ma_entry_associated_files.file_url`. If the URL is just the file name, the file must be stored right in the associates directory. The following example works for `_ma_entry_associated_files.file_url model_pae.cif`
Command line argument `--associates-dir` (`-a`) is used to declare the base directory associated files are stored in. Inside the directory, the path must follow what is defined in [`_ma_entry_associated_files.file_url`](https://mmcif.wwpdb.org/dictionaries/mmcif_ma.dic/Items/_ma_entry_associated_files.file_url.html). If the URL is just the file name, the file must be stored right in the associates directory. The following example works for `_ma_entry_associated_files.file_url model_pae.cif` (`grep` and `ls` are just used to illustrate the data situation)
**`--help`** (**`-h`**) Print a help/ usage page for the validation tool.
**`--dict-sdb <SDB FILE>`** (**`-d`**) Format dictionary in (binary) SDB format used for validating a ModelCIF file. The container comes with a SDB for ModelCIF and one for the original PDBx/mmCIF format.
**`--dict-sdb <SDB FILE>`** (**`-d`**) Format dictionary in (binary) SDB format used for validating a ModelCIF file. The container comes with a SDB for ModelCIF (`/usr/local/share/mmcif-dict-suite/mmcif_ma.sdb`) and one for the original PDBx/mmCIF (`/usr/local/share/mmcif-dict-suite/mmcif_pdbx_v50.dic.sdb`) format.
**`--out-file <JSON FILE>`** (**`-o`**) Instead of printing the output to `stdout`, store it in a JSON file.
**`--verbose`** (**`-v`**) Write information from intermediate steps to `stdout`. This includes the raw output of `CifCheck`.
**`--verbose`** (**`-v`**) Be more talkative.
## How to run the container
## How to run the Docker container
The call to the validation tool (almost) stays the same, it just needs instructions to start the Docker container as a prefix:
Calling the validation tool (almost) stays the same, it just needs instructions to start the Docker container as a prefix:
```bash
$ docker run --rm-v /home/user/models:/data registry.scicore.unibas.ch/schwede/mabakerimport/mmcif-dict-suite:dev validate-mmcif-file /data/model.cif
`docker run` is the call to execute a certain command inside a container. `--rm` makes sure that the container is removed from the system once the job completed.
-[`docker run`](https://docs.docker.com/engine/reference/commandline/run/) starts a new Docker container from image `registry.scicore.unibas.ch/schwede/mabakerimport/mmcif-dict-suite:latest` and executes the `validate-mmcif-file` command inside the container.
-`--rm` makes sure that the container is removed from the system once the job completed.
-`-v` mounts directory `/home/user/models` from the local host computer to `/data` inside the Docker container, otherwise the `validate-mmcif-file` command has no access to local files. This makes the ModelCIF file `/home/user/models/model.cif` available as `/data/model.cif` to commands executed by `docker run`. Keep in mind, `validate-mmcif-file -e` and `validate-mmcif-file -a` also need to refer to `/data` (or any other local directory mounted in the Docker container).
Since the container has its own internal file system separated, `-v` is utilised to mount a directory from the host into the container.
- explain what command does
- explain volumes/ external mounts
- explain for -e
- explain for -a
## How to get the Docker container
## How to pull a copy of the container from our registry
### How to pull a copy of the Docker container from our registry
- since we use it ourselves and are involved in the development, we usually notice when a new dictionary comes out
## How to build the container from scratch
### How to build the Docker container from scratch