Update README.md

b0270985 · Bienchen · 4108df88 · b0270985
Commit b0270985 authored 2 years ago by Bienchen
--- a/projects/human-heterodimers-w-crosslinks/README.md
+++ b/projects/human-heterodimers-w-crosslinks/README.md
@@ -6,12 +6,49 @@

 This project consists of around 800 dimer models (vast majority heteros) for the human reference proteome. Modelling was done with [ColabFold](https://colabfold.mmseqs.com)/ [LocalColabFold](https://github.com/YoshitakaMo/localcolabfold). Model selection is special in a sense that for some heterodimers experimental crosslinking data is available guiding the choice, otherwise top-ranking models are used.

-Since some of the models were build for UniProtKB entries whose sequence were updated in the meantime, the conversion script goes down entry history until it finds a matching sequence. So the ModelCIF file will reference a version of the UniProtKB entry with the sequence used during modelling.
+These models qualify as "de novo modelling".

-<how are the ModelCIF files created using this software>

-These models qualify as "de novo modelling".
+### Project setup
+
+- Used [ColabFold](https://colabfold.mmseqs.com)/ [LocalColabFold](https://github.com/YoshitakaMo/localcolabfold)
+- Produce dimers
+
+
+### Input
+
+- One directory per modelling targets
+- PDB files of models
+- ColabFold configuration
+- ColabFold scores as JSON
+
+
+### Output
+
+- Without `--selected_rank`, each PDB file in a directory will be turned into ModelCIF
+- An accompanying Zip archive per model with pairwise alignment errors (PAE)
+- With `--selected_rank`, each PDB file in a directory will be turned into ModelCIF
+- All ModelCIF files but the selected one will be stored in the accompanying Zip archive of the selected model
+- PAE files will also go into the Zip archive of the selected model
+
+
+### Special features
+
+- For changed sequences of UniProtKB entries by an UniProtKB update
+- Search history of an UniProtKB entry for a matching sequence
+- Version in ModelCIF will be with the latest matching sequence
+- Please note: this mechanism is only and solely meant for different versions of UniProtKB sequences, it is not usable with user modified sequences, user modified sequences will make the conversion script crash
+
+
+### Usage
+
+- The [conversion script](./translate2modelcif.py) runs on a single target (modelling project) directory, if you have multiple targets, like in a whole proteome, you need to loop over the directory and call the script for each separately
+- Output can be written either to the model directory or a separated directory (`--out`)
+- In this project, for each dimer only one model is stored at [ModelArchive](https://modelarchive.org/) (MA) but the other models for the same dimer are stored in a Zip archive that goes into MA with the model (`--selected_rank`)
+- Following our [Docker README](../docker/README.md), the conversion can be called like this:
+  ```terminal
+  $ docker run --rm -v /home/user/models:/data -t converter:latest convert2modelcif --selected_rank 1Q9Y5J9-Q9Y5L4/
+  ```
+  for a target directory `Q9Y5J9-Q9Y5L4`

-<!--  LocalWords:  crosslinking ModelArchive heterodimer ColabFold de novo
-      LocalWords:  LocalColabFold heterodimers
- -->
+### Content