Skip to content
Snippets Groups Projects
Commit b0270985 authored by Bienchen's avatar Bienchen
Browse files

Update README.md

parent 4108df88
Branches
No related tags found
No related merge requests found
......@@ -6,12 +6,49 @@
This project consists of around 800 dimer models (vast majority heteros) for the human reference proteome. Modelling was done with [ColabFold](https://colabfold.mmseqs.com)/ [LocalColabFold](https://github.com/YoshitakaMo/localcolabfold). Model selection is special in a sense that for some heterodimers experimental crosslinking data is available guiding the choice, otherwise top-ranking models are used.
Since some of the models were build for UniProtKB entries whose sequence were updated in the meantime, the conversion script goes down entry history until it finds a matching sequence. So the ModelCIF file will reference a version of the UniProtKB entry with the sequence used during modelling.
These models qualify as "de novo modelling".
<how are the ModelCIF files created using this software>
These models qualify as "de novo modelling".
### Project setup
- Used [ColabFold](https://colabfold.mmseqs.com)/ [LocalColabFold](https://github.com/YoshitakaMo/localcolabfold)
- Produce dimers
### Input
- One directory per modelling targets
- PDB files of models
- ColabFold configuration
- ColabFold scores as JSON
### Output
- Without `--selected_rank`, each PDB file in a directory will be turned into ModelCIF
- An accompanying Zip archive per model with pairwise alignment errors (PAE)
- With `--selected_rank`, each PDB file in a directory will be turned into ModelCIF
- All ModelCIF files but the selected one will be stored in the accompanying Zip archive of the selected model
- PAE files will also go into the Zip archive of the selected model
### Special features
- For changed sequences of UniProtKB entries by an UniProtKB update
- Search history of an UniProtKB entry for a matching sequence
- Version in ModelCIF will be with the latest matching sequence
- Please note: this mechanism is only and solely meant for different versions of UniProtKB sequences, it is not usable with user modified sequences, user modified sequences will make the conversion script crash
### Usage
- The [conversion script](./translate2modelcif.py) runs on a single target (modelling project) directory, if you have multiple targets, like in a whole proteome, you need to loop over the directory and call the script for each separately
- Output can be written either to the model directory or a separated directory (`--out`)
- In this project, for each dimer only one model is stored at [ModelArchive](https://modelarchive.org/) (MA) but the other models for the same dimer are stored in a Zip archive that goes into MA with the model (`--selected_rank`)
- Following our [Docker README](../docker/README.md), the conversion can be called like this:
```terminal
$ docker run --rm -v /home/user/models:/data -t converter:latest convert2modelcif --selected_rank 1Q9Y5J9-Q9Y5L4/
```
for a target directory `Q9Y5J9-Q9Y5L4`
<!-- LocalWords: crosslinking ModelArchive heterodimer ColabFold de novo
LocalWords: LocalColabFold heterodimers
-->
### Content
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Please register or to comment