Translate ColabFold PDB files into ModelCIF
This Git repository holds a (containerised) application to format PDB files together with some extra data into ModelCIF files. Its primary target are ColabFold projects, so together with the input PDB file, the ColabFold configuration is needed and scores.
The actual translation script uses OpenStructure to deal with some of the biological/ chemical data. To spare you the installation process, we provide a Docker container ready to convert your data.
Get the translation app
There are two ways to get the Docker container: pull it from our GitLab registry or build it from the Dockerfile in this Git repository.
Pull the Docker container from GitLab registry
Our GitLab registry keeps a copy of the container with the latest code, ready to be downloaded and used right away. With Docker installed, issue the following command in a terminal:
$ docker pull registry.scicore.unibas.ch/schwede/ma-wilkins-import/converter:latest
Output of the command above will be similar to this:
$ docker pull registry.scicore.unibas.ch/schwede/ma-wilkins-import/converter:latest
latest: Pulling from schwede/ma-wilkins-import/converter
c549ccf8d472: Already exists
...
4d4a9d119f0d: Already exists
15d67c338561: Pull complete
...
a1cbfcc89f24: Pull complete
Digest: sha256:f67507f6c84a0090b06ceadc2aaf927885a5889e924b6578453ae9ddfc0c346d
Status: Downloaded newer image for registry.scicore.unibas.ch/schwede/ma-wilkins-import/converter:latest
registry.scicore.unibas.ch/schwede/ma-wilkins-import/converter:latest
$
The hash values may be different, but with that you have a local copy of the Docker container and can proceed at Run the translation app.
Build the Docker container from scratch
If you want to build the container yourself, first clone this Git repository:
$ git clone https://git.scicore.unibas.ch/schwede/ma-wilkins-import.git ma-wilkins-import.git
And switch into it, so you are in the same directory as the Dockerfile:
cd ma-wilkins-import.git
Now you can build the Docker container with the following command:
$ DOCKER_BUILDKIT=1 docker build -t registry.scicore.unibas.ch/schwede/ma-wilkins-import/converter:latest .
DOCKER_BUILDKIT=1
is only needed for some older versions of Docker.
The Dockerfile knows two build time arguments (--build-arg
), MMCIF_USER_ID
and ADD_DEV
. The latter is only used for developing the app. MMCIF_USER_ID
sets the ID of the user running the translation script inside the Docker container. So files written by the Docker container belong to the user ID of the internal user. When you run into file permission issues with the produced ModelCIF files, build the Docker container using your own user ID:
$ DOCKER_BUILDKIT=1 docker build --build-arg MMCIF_USER_ID=<YOUR USER ID> -t registry.scicore.unibas.ch/schwede/ma-wilkins-import/converter:latest .
Replace <YOUR USER ID>
with your own ID, e.g. after checking it with the id
command (look for uid
in the output).
Run the translation app
If you just run the Docker container, it prints a little usage description:
$ docker run --rm registry.scicore.unibas.ch/schwede/ma-wilkins-import/converter:latest
ModelCIF file formatting tool.
------------------------------------------
Provided by SWISS-MODEL / Schwede group
(swissmodel.expasy.org / schwedelab.org)
This container takes a directory of
ColabFold models and turns them into
ModelCIF files.
usage: translate2modelcif [-h] [--selected_rank SELECTED_RANK]
[--out_dir <OUTPUT DIR>] [--compress]
<MODEL DIR>
Translate models from Tara/ Xabi from PDB + extra data into ModelCIF.
positional arguments:
<MODEL DIR> Directory with model(s) to be translated. Must be of
form '<UniProtKB AC>-<UniProtKB AC>'
optional arguments:
-h, --help show this help message and exit
--selected_rank SELECTED_RANK
If a certain model of a modelling project is selected
by rank, the other models are still translated to
ModelCIF but stored as accompanying files to the
selected model.
--out_dir <OUTPUT DIR>
Path to separate path to store results (<MODEL DIR>
used, if none given).
--compress Compress ModelCIF file with gzip (note that QA file is
zipped either way).
To actually run the conversion, we assume the ColabFold projects are separated into individual directories. Project directories have to look like this:
<UniProtKB AC>-<UniProtKB AC>/
├── <UniProtKB AC>-<UniProtKB AC>_unrelaxed_rank_1_model_1.pdb
├── <UniProtKB AC>-<UniProtKB AC>_rank_1_model_1_scores.json
├── config.json
└── ...
There can be more models for the same combination of the two UniProtKB ACs in a directory, the above example just shows the minimum required files.
Since the Docker container does not see the file system of the computer it is running on, the top level directory of the projects needs to be mounted when running. This happens with the -v
option and gets absolute paths to the projects directory and the mount point inside the Docker container, e.g. -v /path/to/projects:/data
makes the directory /path/to/projects
on your computer, available as /data
to the Docker container.
Running the converter app to write the ModelCIF file into a separated directory modelcif
for model of rank 1 looks like this:
$ mkdir /<PROJECTS PARENT DIR>/modelcif
$ docker run --rm -v /<PROJECTS PARENT DIR>:/data registry.scicore.unibas.ch/schwede/ma-wilkins-import/converter:latest translate2modelcif --selected_rank 1 --out_dir /data/modelcif /data/<UniProtKB AC>-<UniProtKB AC>
Let's go to the CIF site of life!
Working on <UniProtKB AC>-<UniProtKB AC>...
translating <UniProtKB AC>-<UniProtKB AC>_unrelaxed_rank_1_model_1.pdb...
preparing data... (0.77s)
generating ModelCIF objects... (0.00s)
processing QA scores... (11.68s)
write to disk... (42.85s)
... done with /data/<UniProtKB AC>-<UniProtKB AC>/<UniProtKB AC>-<UniProtKB AC>_unrelaxed_rank_1_model_1.pdb (55.61s).
... done with /data/<UniProtKB AC>-<UniProtKB AC>.
$
After this, you see two new files in the modelcif
directory, the ModelCIF file (ending with .cif
) and an archive with the pairwise alignment errors (ending with .zip
):
modelcif/
├── <UniProtKB AC>-<UniProtKB AC>_unrelaxed_rank_1_model_1.cif
├── <UniProtKB AC>-<UniProtKB AC>_unrelaxed_rank_1_model_1.zip
└── ...
With --out_dir
, the converted files can be easily gathered for all projects and then handed over to the ModelArchive team to load them.
If --selected_rank
is omitted, all models in the project's directory are converted. Otherwise, still all models are converted but added to the Zip archive of the selected model.
Troubleshooting
File permission problems
Depending on your local user/ permissions setup, it may happen that either the Docker container cannot read or write the project directories. Otherwise, it can happen that files written by the app, are not readable for you on your local file system. This can be solved by building the Docker container with the same user ID as the local user executing the app. This is described here.