Modelling of African Swine Fever proteome from USDA
Main links:
- Link Project in MA (incl. background on project itself)
- Jira-story
Setup:
- Using AlphaFold for monomer predictions with default CASP14 setup (no PAE, no pTM, templates used and relaxation enabled)
- 196 models done with default setup, 1 model (QP509L) with done with AF colab notebook and separate GROMACS relaxation step
- Input from them:
- PDB files for top ranked relaxed model
- CSV file with crosslinks (UniProt and NCBI), title, description and original filename
Special features here:
- Somewhat generic code for AlphaFold modeling step and sequence DBs used (can distinguish full_dbs and reduced_dbs and template search)
- pLDDT extracted from b-factors (simplest setup since no other QA scores anyway)
- Model file names did not contain information on AlphaFold model number (hence info in CSV file)
- Crosslinks to UniProt and NCBI (with sanity checks on both)
- Dealing with entries which cover subset of reference sequence (CP2475L.. for UniProt A0A2X0THU5)
- Special case (QP509L) with GROMACS model relaxation step (pLDDT fetched from separate file)
Content:
- translate2modelcif.py : script to do conversion based on CoFFE-sponge-proteins project (identical Docker setup used)
- tests folder with
- test_modelCIF_MA.py to convert ModelCIF to content displayed in ModelArchive (needs gemmi library)
- test.ipynb and .html for tests performed during development