diff --git a/validation/README.md b/validation/README.md index 957d0c73fe03cf62c9b2aef0552643f7bbc61732..45ed17d23ac357a05deeacfd69717c7417e1b4b4 100644 --- a/validation/README.md +++ b/validation/README.md @@ -1,6 +1,6 @@ # ModelCIF validation tool -This is a tool to check that the formatting of [ModelCIF](https://mmcif.wwpdb.org/dictionaries/mmcif_ma.dic/Index/) files complies with the ModelCIF format declaration (aka "dictionary"). Upon successful validation, a ModelCIF file can be extended with the dictionary version the file was compared to (option [`--extend-validated-file`](#add-dictionary-information-used-for-validation-to-modelcIF-file)). For more basic [mmCIF](https://mmcif.wwpdb.org) validation, the dictionary of the underlying [PDBx/mmCIF](https://mmcif.wwpdb.org/dictionaries/mmcif_pdbx_v50.dic/Index/) format is also available. +This is a tool to check that the formatting of [ModelCIF](https://mmcif.wwpdb.org/dictionaries/mmcif_ma.dic/Index/) files complies with the ModelCIF format declaration (aka "dictionary"). Upon successful validation, a ModelCIF file can be extended with the dictionary version the file was compared to (option [`--extend-validated-file`](#add-dictionary-information-used-for-validation-to-modelcif-file)). For more basic [mmCIF](https://mmcif.wwpdb.org) validation, the dictionary of the underlying [PDBx/mmCIF](https://mmcif.wwpdb.org/dictionaries/mmcif_pdbx_v50.dic/Index/) format is also available. The easiest way to run validation is from [Docker](https://www.docker.com) container. @@ -16,46 +16,51 @@ This is just a description of the [validation tool](./validate-mmcif-file.py) it Upon completion, if there hasn't been any error running the command, the validation tool returns a concise report in [JSON](https://www.json.org/json-en.html) format. That output is meant to be input to a website or any kind of nicely formatted report. Output can also be stored as a JSON formatted file. If the tested ModelCIF file is fully compliant with the ModelCIF format, the JSON output has - `status` "completed" +- no messages in the `cifcheck-errors` list - no messages in the `diagnosis` list - `versions` of the dictionaries the file was tested against Format violations will be listed in `diagnosis`. +`cifcheck-errors` gathers errors from the `CifCheck` command. This has nothing to do with wrong formatting - messages in this list mean that `CifCheck` has "crashed". This should not happen, possible issues with `CifCheck` should be caught by the validation tool. Feel free to report them. + The most basic way to invoke the validation tool is just with a ModelCIF file (example shows the command plus possible output): ```bash $ validate-mmcif-file model.cif -{"status":"completed","diagnosis":[],"versions":[{"title":"mmcif_pdbx_v50.dic","version":"5.358","location":"https://raw.githubusercontent.com/ihmwg/ModelCIF/a24fcfa8d6c3ceb4b6f0676bcc341ac0cd24ff1f/base/mmcif_pdbx_v50.dic"},{"title":"mmcif_ma.dic","version":"1.4.0","location":"https://raw.githubusercontent.com/ihmwg/ModelCIF/a24fcfa8d6c3ceb4b6f0676bcc341ac0cd24ff1f/dist/mmcif_ma.dic"}]} +{"cifcheck-errors":[],"status":"completed","diagnosis":[],"versions":[{"title":"mmcif_pdbx_v50.dic","version":"5.361","location":"https://raw.github.com/ihmwg/ModelCIF/master/base/mmcif_pdbx_v50.dic"},{"title":"mmcif_ma.dic","version":"1.4.3","location":"https://raw.github.com/ihmwg/ModelCIF/master/archive/mmcif_ma-v1.4.3.dic"}]} $ ``` + ### Add dictionary information used for validation to ModelCIF file -Since both dictionaries, ModelCIF and PDBx/mmCIF, represent actively developed file formats, different versions exist. While extending them, quite some thinking goes into making only non-breaking changes. The idea is that a ModelCIF file formatted following dictionary 1.3, is still valid with version 1.4. But the version number also tells you which feature to expect from a ModelCIF file, so it seems like a good idea to keep the version inside the file. +Since both dictionaries, ModelCIF and PDBx/mmCIF, represent actively developed file formats, different versions exist. While extending them, quite some thinking goes into making only non-breaking changes. The idea is that a ModelCIF file formatted following dictionary 1.3, is still valid with version 1.4. But the version number also tells you which features to expect in a ModelCIF file, so it seems like a good idea to keep the version inside the file. The validation tool can add the version upon positive validation, enabled by the `--extend-validated-file` (`-e`). `-e` can take an alternative file name to write the validated ModelCIF file to, e.g. if one wants to keep the original ModelCIF file unaltered: ```bash $ validate-mmcif-file -e validated_model.cif model.cif -{"status":"completed","diagnosis":[],"versions":[{"title":"mmcif_pdbx_v50.dic","version":"5.358","location":"https://raw.githubusercontent.com/ihmwg/ModelCIF/a24fcfa8d6c3ceb4b6f0676bcc341ac0cd24ff1f/base/mmcif_pdbx_v50.dic"},{"title":"mmcif_ma.dic","version":"1.4.0","location":"https://raw.githubusercontent.com/ihmwg/ModelCIF/a24fcfa8d6c3ceb4b6f0676bcc341ac0cd24ff1f/dist/mmcif_ma.dic"}]} +{"cifcheck-errors":[],"status":"completed","diagnosis":[],"versions":[{"title":"mmcif_pdbx_v50.dic","version":"5.361","location":"https://raw.github.com/ihmwg/ModelCIF/master/base/mmcif_pdbx_v50.dic"},{"title":"mmcif_ma.dic","version":"1.4.3","location":"https://raw.github.com/ihmwg/ModelCIF/master/archive/mmcif_ma-v1.4.3.dic"}]} $ ``` -The last command will generate a new file `validated_model.cf` upon positive validation (`diagnosis` points to an empty list), with the `versions` added to the `_audit_conform` list inside the file. +The last command will generate a new file `validated_model.cf` upon positive validation (`diagnosis` points to an empty list), with the `versions` added to the [`_audit_conform`](https://mmcif.wwpdb.org/dictionaries/mmcif_ma.dic/Categories/audit_conform.html) list inside the file. To add the validation dictionaries to `_audit_conform` in the original ModelCIF file, just invoke `-e` without an alternative file name... well almost. By the way Python handles this kind of command line arguments, `-e` consumes everything after it, that does not start with a `-`, as a file name. So `validate-mmcif-file -e model.cif` would mean that `-e` assumes `model.cif` as its file name but then the command fails because it is missing the ModelCIF file to be validated. The solution is either putting `-e` at the beginning of the arguments list or after the ModelCIF file name at the very end, if there are no other command line arguments: ```bash $ validate-mmcif-file model.cif -e -{"status":"completed","diagnosis":[],"versions":[{"title":"mmcif_pdbx_v50.dic","version":"5.358","location":"https://raw.githubusercontent.com/ihmwg/ModelCIF/a24fcfa8d6c3ceb4b6f0676bcc341ac0cd24ff1f/base/mmcif_pdbx_v50.dic"},{"title":"mmcif_ma.dic","version":"1.4.0","location":"https://raw.githubusercontent.com/ihmwg/ModelCIF/a24fcfa8d6c3ceb4b6f0676bcc341ac0cd24ff1f/dist/mmcif_ma.dic"}]} +{"cifcheck-errors":[],"status":"completed","diagnosis":[],"versions":[{"title":"mmcif_pdbx_v50.dic","version":"5.361","location":"https://raw.github.com/ihmwg/ModelCIF/master/base/mmcif_pdbx_v50.dic"},{"title":"mmcif_ma.dic","version":"1.4.3","location":"https://raw.github.com/ihmwg/ModelCIF/master/archive/mmcif_ma-v1.4.3.dic"}]} $ ``` + ### Base directory for associated files -For a ModelCIF file using the `_ma_entry_associated_files` category, the validation tool tries to merge associated data into the ModelCIF file, if `_ma_entry_associated_files.file_format` is `cif`. That way the outsourced data is validated, too. +For a ModelCIF file using the [`_ma_entry_associated_files`](https://mmcif.wwpdb.org/dictionaries/mmcif_ma.dic/Categories/ma_entry_associated_files.html) category, the validation tool tries to merge associated data into the ModelCIF file, if [`_ma_entry_associated_files.file_format`](https://mmcif.wwpdb.org/dictionaries/mmcif_ma.dic/Items/_ma_entry_associated_files.file_format.html) is `cif` and [`_ma_entry_associated_files.file_content`](https://mmcif.wwpdb.org/dictionaries/mmcif_ma.dic/Items/_ma_entry_associated_files.file_content.html) is `local pairwise QA scores`. That way the outsourced data is validated, too. -Command line argument `--associates-dir` (`-a`) is used to declare the base directory associated files are stored in. Inside the directory, the path must follow what is defined in `_ma_entry_associated_files.file_url`. If the URL is just the file name, the file must be stored right in the associates directory. The following example works for `_ma_entry_associated_files.file_url model_pae.cif` +Command line argument `--associates-dir` (`-a`) is used to declare the base directory associated files are stored in. Inside the directory, the path must follow what is defined in [`_ma_entry_associated_files.file_url`](https://mmcif.wwpdb.org/dictionaries/mmcif_ma.dic/Items/_ma_entry_associated_files.file_url.html). If the URL is just the file name, the file must be stored right in the associates directory. The following example works for `_ma_entry_associated_files.file_url model_pae.cif` (`grep` and `ls` are just used to illustrate the data situation) ```bash $ grep _ma_entry_associated_files.file_url model.cif @@ -63,7 +68,7 @@ _ma_entry_associated_files.file_url model_pae.cif $ ls extra model_pae.cif $ validate-mmcif-file -a extra model.cif -{"status":"completed","diagnosis":[],"versions":[{"title":"mmcif_pdbx_v50.dic","version":"5.358","location":"https://raw.githubusercontent.com/ihmwg/ModelCIF/a24fcfa8d6c3ceb4b6f0676bcc341ac0cd24ff1f/base/mmcif_pdbx_v50.dic"},{"title":"mmcif_ma.dic","version":"1.4.0","location":"https://raw.githubusercontent.com/ihmwg/ModelCIF/a24fcfa8d6c3ceb4b6f0676bcc341ac0cd24ff1f/dist/mmcif_ma.dic"}]} +{"cifcheck-errors":[],"status":"completed","diagnosis":[],"versions":[{"title":"mmcif_pdbx_v50.dic","version":"5.361","location":"https://raw.github.com/ihmwg/ModelCIF/master/base/mmcif_pdbx_v50.dic"},{"title":"mmcif_ma.dic","version":"1.4.3","location":"https://raw.github.com/ihmwg/ModelCIF/master/archive/mmcif_ma-v1.4.3.dic"}]} $ ``` @@ -77,49 +82,48 @@ pae $ ls extra/pae model_pae.cif $ validate-mmcif-file -a extra model.cif -{"status":"completed","diagnosis":[],"versions":[{"title":"mmcif_pdbx_v50.dic","version":"5.358","location":"https://raw.githubusercontent.com/ihmwg/ModelCIF/a24fcfa8d6c3ceb4b6f0676bcc341ac0cd24ff1f/base/mmcif_pdbx_v50.dic"},{"title":"mmcif_ma.dic","version":"1.4.0","location":"https://raw.githubusercontent.com/ihmwg/ModelCIF/a24fcfa8d6c3ceb4b6f0676bcc341ac0cd24ff1f/dist/mmcif_ma.dic"}]} +{"cifcheck-errors":[],"status":"completed","diagnosis":[],"versions":[{"title":"mmcif_pdbx_v50.dic","version":"5.361","location":"https://raw.github.com/ihmwg/ModelCIF/master/base/mmcif_pdbx_v50.dic"},{"title":"mmcif_ma.dic","version":"1.4.3","location":"https://raw.github.com/ihmwg/ModelCIF/master/archive/mmcif_ma-v1.4.3.dic"}]} $ ``` + ### Misc. arguments **`--help`** (**`-h`**) Print a help/ usage page for the validation tool. -**`--dict-sdb <SDB FILE>`** (**`-d`**) Format dictionary in (binary) SDB format used for validating a ModelCIF file. The container comes with a SDB for ModelCIF and one for the original PDBx/mmCIF format. +**`--dict-sdb <SDB FILE>`** (**`-d`**) Format dictionary in (binary) SDB format used for validating a ModelCIF file. The container comes with a SDB for ModelCIF (`/usr/local/share/mmcif-dict-suite/mmcif_ma.sdb`) and one for the original PDBx/mmCIF (`/usr/local/share/mmcif-dict-suite/mmcif_pdbx_v50.dic.sdb`) format. **`--out-file <JSON FILE>`** (**`-o`**) Instead of printing the output to `stdout`, store it in a JSON file. -**`--verbose`** (**`-v`**) Write information from intermediate steps to `stdout`. This includes the raw output of `CifCheck`. +**`--verbose`** (**`-v`**) Be more talkative. -## How to run the container +## How to run the Docker container -The call to the validation tool (almost) stays the same, it just needs instructions to start the Docker container as a prefix: +Calling the validation tool (almost) stays the same, it just needs instructions to start the Docker container as a prefix: ```bash -$ docker run --rm -v /home/user/models:/data registry.scicore.unibas.ch/schwede/mabakerimport/mmcif-dict-suite:dev validate-mmcif-file /data/model.cif -{"status":"completed","diagnosis":[],"versions":[{"title":"mmcif_pdbx_v50.dic","version":"5.358","location":"https://raw.githubusercontent.com/ihmwg/ModelCIF/a24fcfa8d6c3ceb4b6f0676bcc341ac0cd24ff1f/base/mmcif_pdbx_v50.dic"},{"title":"mmcif_ma.dic","version":"1.4.0","location":"https://raw.githubusercontent.com/ihmwg/ModelCIF/a24fcfa8d6c3ceb4b6f0676bcc341ac0cd24ff1f/dist/mmcif_ma.dic"}]} +$ docker run --rm -v /home/user/models:/data registry.scicore.unibas.ch/schwede/mabakerimport/mmcif-dict-suite:latest validate-mmcif-file /data/model.cif +{"cifcheck-errors":[],"status":"completed","diagnosis":[],"versions":[{"title":"mmcif_pdbx_v50.dic","version":"5.361","location":"https://raw.github.com/ihmwg/ModelCIF/master/base/mmcif_pdbx_v50.dic"},{"title":"mmcif_ma.dic","version":"1.4.3","location":"https://raw.github.com/ihmwg/ModelCIF/master/archive/mmcif_ma-v1.4.3.dic"}]} $ ``` -`docker run` is the call to execute a certain command inside a container. `--rm` makes sure that the container is removed from the system once the job completed. +- [`docker run`](https://docs.docker.com/engine/reference/commandline/run/) starts a new Docker container from image `registry.scicore.unibas.ch/schwede/mabakerimport/mmcif-dict-suite:latest` and executes the `validate-mmcif-file` command inside the container. +- `--rm` makes sure that the container is removed from the system once the job completed. +- `-v` mounts directory `/home/user/models` from the local host computer to `/data` inside the Docker container, otherwise the `validate-mmcif-file` command has no access to local files. This makes the ModelCIF file `/home/user/models/model.cif` available as `/data/model.cif` to commands executed by `docker run`. Keep in mind, `validate-mmcif-file -e` and `validate-mmcif-file -a` also need to refer to `/data` (or any other local directory mounted in the Docker container). -Since the container has its own internal file system separated, `-v` is utilised to mount a directory from the host into the container. -- explain what command does -- explain volumes/ external mounts -- explain for -e -- explain for -a +## How to get the Docker container -## How to pull a copy of the container from our registry +### How to pull a copy of the Docker container from our registry - since we use it ourselves and are involved in the development, we usually notice when a new dictionary comes out -## How to build the container from scratch +### How to build the Docker container from scratch # Files in this directory <!-- LocalWords: PDBx ModelCIF TOC JSON CifCheck RCSB mmcif cif pdbx dic dir - LocalWords: url pae sdb SDB stdout + LocalWords: url pae sdb SDB stdout modelcif cifcheck -->