Skip to content
Snippets Groups Projects

Modelling of Spongilla lacustris proteome with functional annotations

Link to project in ModelArchive (incl. background on project itself)

Setup:

  • Domains of interacting proteins extracted from full length proteins (sequences from UniProtKB)
  • Models generated using sequences of domains which can have discontinuous mapping to full length sequence
  • Same protocol used as in model set for core eukaryotic protein complexes
    • Paired multiple sequence alignment (MSA) generated for each dimer
    • Model using AlphaFold ("model 3" parameters; pTM monomer version) with a 200 residue gap between the two chains, without templates and without model relaxation
  • Input from them:
    • one zip file with all the PDB files (no b-factor values, residue numbers matching position in UniProtKB sequence)
    • one zip file with all the extra files (1 fasta file for alignment, 1 npz file with pLDDT, PAE and contact probabilities)
    • a CSV file with description and UniProtKB links for each protein

Special features here:

  • Custom MSA generation with intermediate result in accompanying data
  • PAE and contact probabilities only kept for inter-chain residue-pairs
  • Author provided residue numbers kept as auth_seq_num
  • Mapping to most recent UniProtKB sequence generated, checked and stored as fasta files (ModelCIF file only has covered range with respect to the originally used sequence)

Content:

  • translate2modelcif.py : script to do conversion; compatible with Docker setup from ma-wilkins-import (and script based on code there)