-
Gerardo Tauriello authoredGerardo Tauriello authored
README.md 1.71 KiB
Modelling of Spongilla lacustris proteome with functional annotations
Link to project in ModelArchive (incl. background on project itself)
Setup:
- Domains of interacting proteins extracted from full length proteins (sequences from UniProtKB)
- Models generated using sequences of domains which can have discontinuous mapping to full length sequence
- Same protocol used as in model set for core eukaryotic protein complexes
- Paired multiple sequence alignment (MSA) generated for each dimer
- Model using AlphaFold ("model 3" parameters; pTM monomer version) with a 200 residue gap between the two chains, without templates and without model relaxation
- Input from them:
- one zip file with all the PDB files (no b-factor values, residue numbers matching position in UniProtKB sequence)
- one zip file with all the extra files (1 fasta file for alignment, 1 npz file with pLDDT, PAE and contact probabilities)
- a CSV file with description and UniProtKB links for each protein
Special features here:
- Custom MSA generation with intermediate result in accompanying data
- PAE and contact probabilities only kept for inter-chain residue-pairs
- Author provided residue numbers kept as auth_seq_num
- Mapping to most recent UniProtKB sequence generated, checked and stored as fasta files (ModelCIF file only has covered range with respect to the originally used sequence)
Content:
- translate2modelcif.py : script to do conversion; compatible with Docker setup from ma-wilkins-import (and script based on code there)