README.md



Modelling of Spongilla lacustris proteome with functional annotations
Link to project in ModelArchive (incl. background on project itself)
Setup:

Domains of interacting proteins extracted from full length proteins (sequences from UniProtKB)
Models generated using sequences of domains which can have discontinuous mapping to full length sequence
Same protocol used as in model set for core eukaryotic protein complexes

Paired multiple sequence alignment (MSA) generated for each dimer
Model using AlphaFold ("model 3" parameters; pTM monomer version) with a 200 residue gap between the two chains, without templates and without model relaxation


Input from them:

one zip file with all the PDB files (no b-factor values, residue numbers matching position in UniProtKB sequence)
one zip file with all the extra files (1 fasta file for alignment, 1 npz file with pLDDT, PAE and contact probabilities)
a CSV file with description and UniProtKB links for each protein


Special features here:

Custom MSA generation with intermediate result in accompanying data
PAE and contact probabilities only kept for inter-chain residue-pairs
Author provided residue numbers kept as auth_seq_num
Mapping to most recent UniProtKB sequence generated, checked and stored as fasta files (ModelCIF file only has covered range with respect to the originally used sequence)

Content:

translate2modelcif.py : script to do conversion; compatible with Docker setup from ma-wilkins-import (and script based on code there)