diff --git a/projects/human-heterodimers-w-crosslinks/README.md b/projects/human-heterodimers-w-crosslinks/README.md index 137f00a256dfea6b52b484b80d12d2e3f7003c62..5f4fd88137dc60efac48571c8833b933846189a6 100644 --- a/projects/human-heterodimers-w-crosslinks/README.md +++ b/projects/human-heterodimers-w-crosslinks/README.md @@ -6,12 +6,49 @@ This project consists of around 800 dimer models (vast majority heteros) for the human reference proteome. Modelling was done with [ColabFold](https://colabfold.mmseqs.com)/ [LocalColabFold](https://github.com/YoshitakaMo/localcolabfold). Model selection is special in a sense that for some heterodimers experimental crosslinking data is available guiding the choice, otherwise top-ranking models are used. -Since some of the models were build for UniProtKB entries whose sequence were updated in the meantime, the conversion script goes down entry history until it finds a matching sequence. So the ModelCIF file will reference a version of the UniProtKB entry with the sequence used during modelling. +These models qualify as "de novo modelling". -<how are the ModelCIF files created using this software> -These models qualify as "de novo modelling". +### Project setup + +- Used [ColabFold](https://colabfold.mmseqs.com)/ [LocalColabFold](https://github.com/YoshitakaMo/localcolabfold) +- Produce dimers + + +### Input + +- One directory per modelling targets +- PDB files of models +- ColabFold configuration +- ColabFold scores as JSON + + +### Output + +- Without `--selected_rank`, each PDB file in a directory will be turned into ModelCIF +- An accompanying Zip archive per model with pairwise alignment errors (PAE) +- With `--selected_rank`, each PDB file in a directory will be turned into ModelCIF +- All ModelCIF files but the selected one will be stored in the accompanying Zip archive of the selected model +- PAE files will also go into the Zip archive of the selected model + + +### Special features + +- For changed sequences of UniProtKB entries by an UniProtKB update +- Search history of an UniProtKB entry for a matching sequence +- Version in ModelCIF will be with the latest matching sequence +- Please note: this mechanism is only and solely meant for different versions of UniProtKB sequences, it is not usable with user modified sequences, user modified sequences will make the conversion script crash + + +### Usage + +- The [conversion script](./translate2modelcif.py) runs on a single target (modelling project) directory, if you have multiple targets, like in a whole proteome, you need to loop over the directory and call the script for each separately +- Output can be written either to the model directory or a separated directory (`--out`) +- In this project, for each dimer only one model is stored at [ModelArchive](https://modelarchive.org/) (MA) but the other models for the same dimer are stored in a Zip archive that goes into MA with the model (`--selected_rank`) +- Following our [Docker README](../docker/README.md), the conversion can be called like this: + ```terminal + $ docker run --rm -v /home/user/models:/data -t converter:latest convert2modelcif --selected_rank 1Q9Y5J9-Q9Y5L4/ + ``` + for a target directory `Q9Y5J9-Q9Y5L4` -<!-- LocalWords: crosslinking ModelArchive heterodimer ColabFold de novo - LocalWords: LocalColabFold heterodimers - --> +### Content