# Modelling of Spongilla lacustris proteome with functional annotations
[Link to project in ModelArchive](https://modelarchive.org/doi/10.5452/ma-t3vr3)(incl. background on project itself)
Setup:
- Domains of interacting proteins extracted from full length proteins (sequences from UniProtKB)
- Models generated using sequences of domains which can have discontinuous mapping to full length sequence
- Same protocol used as in [model set for core eukaryotic protein complexes](https://www.modelarchive.org/doi/10.5452/ma-bak-cepc)
- Paired multiple sequence alignment (MSA) generated for each dimer
- Model using AlphaFold ("model 3" parameters; pTM monomer version) with a 200 residue gap between the two chains, without templates and without model relaxation
- Input from them:
- one zip file with all the PDB files (no b-factor values, residue numbers matching position in UniProtKB sequence)
- one zip file with all the extra files (1 fasta file for alignment, 1 npz file with pLDDT, PAE and contact probabilities)
- a CSV file with description and UniProtKB links for each protein
Special features here:
- Custom MSA generation with intermediate result in accompanying data
- PAE and contact probabilities only kept for inter-chain residue-pairs
- Author provided residue numbers kept as auth_seq_num
- Mapping to most recent UniProtKB sequence generated, checked and stored as fasta files (ModelCIF file only has covered range with respect to the originally used sequence)
Content:
- translate2modelcif.py : script to do conversion; compatible with Docker setup from [ma-wilkins-import](https://git.scicore.unibas.ch/schwede/ma-wilkins-import/-/tree/6bbd6fa7ec53e1a0971fba40c96fa971d1022f74)(and script based on code there)