diff --git a/projects/CoFFE-sponge-proteins/README.md b/projects/2022-08-ma-coffe-slac/README.md similarity index 100% rename from projects/CoFFE-sponge-proteins/README.md rename to projects/2022-08-ma-coffe-slac/README.md diff --git a/projects/CoFFE-sponge-proteins/translate2modelcif.py b/projects/2022-08-ma-coffe-slac/translate2modelcif.py similarity index 100% rename from projects/CoFFE-sponge-proteins/translate2modelcif.py rename to projects/2022-08-ma-coffe-slac/translate2modelcif.py diff --git a/projects/USDA-ASFVG/README.md b/projects/2022-10-ma-asfv-asfvg/README.md similarity index 100% rename from projects/USDA-ASFVG/README.md rename to projects/2022-10-ma-asfv-asfvg/README.md diff --git a/projects/USDA-ASFVG/translate2modelcif.py b/projects/2022-10-ma-asfv-asfvg/translate2modelcif.py similarity index 100% rename from projects/USDA-ASFVG/translate2modelcif.py rename to projects/2022-10-ma-asfv-asfvg/translate2modelcif.py diff --git a/projects/human-heterodimers-w-crosslinks/README.md b/projects/2022-11-ma-low-csi/README.md similarity index 97% rename from projects/human-heterodimers-w-crosslinks/README.md rename to projects/2022-11-ma-low-csi/README.md index 73a40bdc99f0a75ae26e03116423f825c85908ce..8f05de1384becf792855efe2b48a2e7f01a65aad 100644 --- a/projects/human-heterodimers-w-crosslinks/README.md +++ b/projects/2022-11-ma-low-csi/README.md @@ -1,8 +1,6 @@ # Modelling protein-interactions of human proteins with experimental crosslinking information -[Link to project in ModelArchive](https://www.modelarchive.org/doi/10.5452/ma-low-csi) (incl. background on project itself) - -**I.O.U. the link to the publication** +[Link to project in ModelArchive](https://www.modelarchive.org/doi/10.5452/ma-low-csi) (incl. background on project itself and link to the publication) This project consists of around 800 dimer models (vast majority heteros) for the human reference proteome. Modelling was done with [ColabFold](https://colabfold.mmseqs.com)/ [LocalColabFold](https://github.com/YoshitakaMo/localcolabfold). Model selection is special in a sense that for some heterodimers experimental crosslinking data is available guiding the choice, otherwise top-ranking models are used. diff --git a/projects/human-heterodimers-w-crosslinks/example/Q9Y5J9-Q9Y5L4/Q9Y5J9-Q9Y5L4_unrelaxed_rank_1_model_5.pdb b/projects/2022-11-ma-low-csi/example/Q9Y5J9-Q9Y5L4/Q9Y5J9-Q9Y5L4_unrelaxed_rank_1_model_5.pdb similarity index 100% rename from projects/human-heterodimers-w-crosslinks/example/Q9Y5J9-Q9Y5L4/Q9Y5J9-Q9Y5L4_unrelaxed_rank_1_model_5.pdb rename to projects/2022-11-ma-low-csi/example/Q9Y5J9-Q9Y5L4/Q9Y5J9-Q9Y5L4_unrelaxed_rank_1_model_5.pdb diff --git a/projects/human-heterodimers-w-crosslinks/example/Q9Y5J9-Q9Y5L4/Q9Y5J9-Q9Y5L4_unrelaxed_rank_1_model_5_scores.json b/projects/2022-11-ma-low-csi/example/Q9Y5J9-Q9Y5L4/Q9Y5J9-Q9Y5L4_unrelaxed_rank_1_model_5_scores.json similarity index 100% rename from projects/human-heterodimers-w-crosslinks/example/Q9Y5J9-Q9Y5L4/Q9Y5J9-Q9Y5L4_unrelaxed_rank_1_model_5_scores.json rename to projects/2022-11-ma-low-csi/example/Q9Y5J9-Q9Y5L4/Q9Y5J9-Q9Y5L4_unrelaxed_rank_1_model_5_scores.json diff --git a/projects/human-heterodimers-w-crosslinks/example/Q9Y5J9-Q9Y5L4/Q9Y5J9-Q9Y5L4_unrelaxed_rank_2_model_4.pdb b/projects/2022-11-ma-low-csi/example/Q9Y5J9-Q9Y5L4/Q9Y5J9-Q9Y5L4_unrelaxed_rank_2_model_4.pdb similarity index 100% rename from projects/human-heterodimers-w-crosslinks/example/Q9Y5J9-Q9Y5L4/Q9Y5J9-Q9Y5L4_unrelaxed_rank_2_model_4.pdb rename to projects/2022-11-ma-low-csi/example/Q9Y5J9-Q9Y5L4/Q9Y5J9-Q9Y5L4_unrelaxed_rank_2_model_4.pdb diff --git a/projects/human-heterodimers-w-crosslinks/example/Q9Y5J9-Q9Y5L4/Q9Y5J9-Q9Y5L4_unrelaxed_rank_2_model_4_scores.json b/projects/2022-11-ma-low-csi/example/Q9Y5J9-Q9Y5L4/Q9Y5J9-Q9Y5L4_unrelaxed_rank_2_model_4_scores.json similarity index 100% rename from projects/human-heterodimers-w-crosslinks/example/Q9Y5J9-Q9Y5L4/Q9Y5J9-Q9Y5L4_unrelaxed_rank_2_model_4_scores.json rename to projects/2022-11-ma-low-csi/example/Q9Y5J9-Q9Y5L4/Q9Y5J9-Q9Y5L4_unrelaxed_rank_2_model_4_scores.json diff --git a/projects/human-heterodimers-w-crosslinks/example/Q9Y5J9-Q9Y5L4/Q9Y5J9-Q9Y5L4_unrelaxed_rank_3_model_3.pdb b/projects/2022-11-ma-low-csi/example/Q9Y5J9-Q9Y5L4/Q9Y5J9-Q9Y5L4_unrelaxed_rank_3_model_3.pdb similarity index 100% rename from projects/human-heterodimers-w-crosslinks/example/Q9Y5J9-Q9Y5L4/Q9Y5J9-Q9Y5L4_unrelaxed_rank_3_model_3.pdb rename to projects/2022-11-ma-low-csi/example/Q9Y5J9-Q9Y5L4/Q9Y5J9-Q9Y5L4_unrelaxed_rank_3_model_3.pdb diff --git a/projects/human-heterodimers-w-crosslinks/example/Q9Y5J9-Q9Y5L4/Q9Y5J9-Q9Y5L4_unrelaxed_rank_3_model_3_scores.json b/projects/2022-11-ma-low-csi/example/Q9Y5J9-Q9Y5L4/Q9Y5J9-Q9Y5L4_unrelaxed_rank_3_model_3_scores.json similarity index 100% rename from projects/human-heterodimers-w-crosslinks/example/Q9Y5J9-Q9Y5L4/Q9Y5J9-Q9Y5L4_unrelaxed_rank_3_model_3_scores.json rename to projects/2022-11-ma-low-csi/example/Q9Y5J9-Q9Y5L4/Q9Y5J9-Q9Y5L4_unrelaxed_rank_3_model_3_scores.json diff --git a/projects/human-heterodimers-w-crosslinks/example/Q9Y5J9-Q9Y5L4/Q9Y5J9-Q9Y5L4_unrelaxed_rank_4_model_2.pdb b/projects/2022-11-ma-low-csi/example/Q9Y5J9-Q9Y5L4/Q9Y5J9-Q9Y5L4_unrelaxed_rank_4_model_2.pdb similarity index 100% rename from projects/human-heterodimers-w-crosslinks/example/Q9Y5J9-Q9Y5L4/Q9Y5J9-Q9Y5L4_unrelaxed_rank_4_model_2.pdb rename to projects/2022-11-ma-low-csi/example/Q9Y5J9-Q9Y5L4/Q9Y5J9-Q9Y5L4_unrelaxed_rank_4_model_2.pdb diff --git a/projects/human-heterodimers-w-crosslinks/example/Q9Y5J9-Q9Y5L4/Q9Y5J9-Q9Y5L4_unrelaxed_rank_4_model_2_scores.json b/projects/2022-11-ma-low-csi/example/Q9Y5J9-Q9Y5L4/Q9Y5J9-Q9Y5L4_unrelaxed_rank_4_model_2_scores.json similarity index 100% rename from projects/human-heterodimers-w-crosslinks/example/Q9Y5J9-Q9Y5L4/Q9Y5J9-Q9Y5L4_unrelaxed_rank_4_model_2_scores.json rename to projects/2022-11-ma-low-csi/example/Q9Y5J9-Q9Y5L4/Q9Y5J9-Q9Y5L4_unrelaxed_rank_4_model_2_scores.json diff --git a/projects/human-heterodimers-w-crosslinks/example/Q9Y5J9-Q9Y5L4/Q9Y5J9-Q9Y5L4_unrelaxed_rank_5_model_1.pdb b/projects/2022-11-ma-low-csi/example/Q9Y5J9-Q9Y5L4/Q9Y5J9-Q9Y5L4_unrelaxed_rank_5_model_1.pdb similarity index 100% rename from projects/human-heterodimers-w-crosslinks/example/Q9Y5J9-Q9Y5L4/Q9Y5J9-Q9Y5L4_unrelaxed_rank_5_model_1.pdb rename to projects/2022-11-ma-low-csi/example/Q9Y5J9-Q9Y5L4/Q9Y5J9-Q9Y5L4_unrelaxed_rank_5_model_1.pdb diff --git a/projects/human-heterodimers-w-crosslinks/example/Q9Y5J9-Q9Y5L4/Q9Y5J9-Q9Y5L4_unrelaxed_rank_5_model_1_scores.json b/projects/2022-11-ma-low-csi/example/Q9Y5J9-Q9Y5L4/Q9Y5J9-Q9Y5L4_unrelaxed_rank_5_model_1_scores.json similarity index 100% rename from projects/human-heterodimers-w-crosslinks/example/Q9Y5J9-Q9Y5L4/Q9Y5J9-Q9Y5L4_unrelaxed_rank_5_model_1_scores.json rename to projects/2022-11-ma-low-csi/example/Q9Y5J9-Q9Y5L4/Q9Y5J9-Q9Y5L4_unrelaxed_rank_5_model_1_scores.json diff --git a/projects/human-heterodimers-w-crosslinks/example/Q9Y5J9-Q9Y5L4/config.json b/projects/2022-11-ma-low-csi/example/Q9Y5J9-Q9Y5L4/config.json similarity index 100% rename from projects/human-heterodimers-w-crosslinks/example/Q9Y5J9-Q9Y5L4/config.json rename to projects/2022-11-ma-low-csi/example/Q9Y5J9-Q9Y5L4/config.json diff --git a/projects/human-heterodimers-w-crosslinks/translate2modelcif.py b/projects/2022-11-ma-low-csi/translate2modelcif.py similarity index 100% rename from projects/human-heterodimers-w-crosslinks/translate2modelcif.py rename to projects/2022-11-ma-low-csi/translate2modelcif.py diff --git a/projects/cancer-PPI-domains/README.md b/projects/2022-12-ma-t3vr3/README.md similarity index 100% rename from projects/cancer-PPI-domains/README.md rename to projects/2022-12-ma-t3vr3/README.md diff --git a/projects/cancer-PPI-domains/translate2modelcif.py b/projects/2022-12-ma-t3vr3/translate2modelcif.py similarity index 100% rename from projects/cancer-PPI-domains/translate2modelcif.py rename to projects/2022-12-ma-t3vr3/translate2modelcif.py diff --git a/projects/dark-matter-metagenomics/README.md b/projects/2023-10-ma-nmpfamsdb/README.md similarity index 100% rename from projects/dark-matter-metagenomics/README.md rename to projects/2023-10-ma-nmpfamsdb/README.md diff --git a/projects/dark-matter-metagenomics/translate2modelcif.py b/projects/2023-10-ma-nmpfamsdb/translate2modelcif.py similarity index 100% rename from projects/dark-matter-metagenomics/translate2modelcif.py rename to projects/2023-10-ma-nmpfamsdb/translate2modelcif.py diff --git a/projects/phytoplasma-effectors/README.md b/projects/2023-10-ma-saps/README.md similarity index 100% rename from projects/phytoplasma-effectors/README.md rename to projects/2023-10-ma-saps/README.md diff --git a/projects/phytoplasma-effectors/input_data/accessions.csv b/projects/2023-10-ma-saps/input_data/accessions.csv similarity index 100% rename from projects/phytoplasma-effectors/input_data/accessions.csv rename to projects/2023-10-ma-saps/input_data/accessions.csv diff --git a/projects/phytoplasma-effectors/input_data/structures.zip b/projects/2023-10-ma-saps/input_data/structures.zip similarity index 100% rename from projects/phytoplasma-effectors/input_data/structures.zip rename to projects/2023-10-ma-saps/input_data/structures.zip diff --git a/projects/phytoplasma-effectors/translate2modelcif.py b/projects/2023-10-ma-saps/translate2modelcif.py similarity index 99% rename from projects/phytoplasma-effectors/translate2modelcif.py rename to projects/2023-10-ma-saps/translate2modelcif.py index e26ae244f202dc4d3ed8bfd173e4b19163fe916f..cc02a763181608bbb72f2d87f4c9c83185fbb351 100644 --- a/projects/phytoplasma-effectors/translate2modelcif.py +++ b/projects/2023-10-ma-saps/translate2modelcif.py @@ -23,7 +23,6 @@ from timeit import default_timer as timer import numpy as np import requests import ujson as json -import gemmi import pandas as pd import ihm diff --git a/projects/2024-04-ma-dm-prc/README.md b/projects/2024-04-ma-dm-prc/README.md new file mode 100644 index 0000000000000000000000000000000000000000..a0ca19b4d55a0181c5955d172ea7d0867b3ff226 --- /dev/null +++ b/projects/2024-04-ma-dm-prc/README.md @@ -0,0 +1,32 @@ +# Modelling of PRC1 and 2 complexes (structural prediction screen) + +[Link to project in ModelArchive](https://modelarchive.org/doi/10.5452/ma-dm-prc) (incl. background on project itself) + +Input files for conversion: +- Annotations.csv and Annotations.json with metadata (incl. per chain mapping to entity description and UniProtKB AC) +- Config_Files directory with config_[X].json files for all X listed in Config column of Annotations.csv +- Zip_Files directory with files named [X]-[NAME].zip for each X listed in the metadata files +- ZIP files are expected to contain the 3 top ranked models (.pdb) with their respective scores (.json) and .png files for coverage-, pLDDT-, and PAE-plots as produced by ColabFold 1.5.2 (all with their original file names) + +Modelling setup: +- Screening exercise to identify protein pairs which may interact +- 742 structural predictions deposited; 740 dimers (homomers and heteromers) and 2 larger complexes +- All done with ColabFold 1.5.2 using 2 different setups (distinguished by 2 config.json files created by ColabFold) +- Sequences all taken from UniProtKB + +Special features here: +- Using ColabFold data directly from zipped project folders (convenient to reduce data to exchange with depositors) +- UniProtKB entry parser which can work both with remote URLs and local TXT files +- Generic sequence aligner which can fill gaps for non-default AA (i.e. "XOUBJZ") using reference sequence and returns covered ranges, sequence mismatches and alignment (with option to do aln. to atom seq.). Note: here we cannot use auto-fill of gaps due to pLDDT and PAE scores parsing. +- Entity generation automatically groups chains for homomers +- Automatic processing of ColabFold setup using config.json and DB versions provided by user (valid until v1.5.5 up to March 2024) +- Compatible with recently added python-modelcif changes of adding SW parameters per SW (instead of per SW group) and writing "struct_ref" in addition to "ma_target_ref_db_details" (here restricted to full models being covered by reference seq. without mismatches) +- Extended LPeptideAlphabet which can handle all 26 one-letter-codes +- Accompanying data includes PAE, extra models (described with global scores), and PNG files for ColabFold plots +- System to save identified issues in a JSON file (here distinguishing minor and major mismatches between modelled and reference sequence and checking for score mismatches between JSON file and b-factors in PDB file and scores in Annotations.csv) +- Saves entries to be exported to 3D-Beacons in separate JSON file (most interactions were only meant for checking and are not expected to be real interactions; only models for known interactions exported to 3D-Beacons) + +Content: +- translate2modelcif.py : script to do conversion (run in virtual environment with same setup as Docker container here but with OST 2.7 and very latest main branch of python-modelcif and python-ihm from 23.4.2024) +- minimal_example: example input to convert a single complex from this set +- minimal_example_modelcif: output from running conversion of minimal_example diff --git a/projects/2024-04-ma-dm-prc/minimal_example/Annotations.csv b/projects/2024-04-ma-dm-prc/minimal_example/Annotations.csv new file mode 100644 index 0000000000000000000000000000000000000000..30e7f74aa9f9ffc4d3ccebec25c3040ee5fb301a --- /dev/null +++ b/projects/2024-04-ma-dm-prc/minimal_example/Annotations.csv @@ -0,0 +1,2 @@ +Number,Component_1,Component_2,Component_1_UniProtKB ID,Component_2_UniProtKB ID,Interaction_note,pLDDT_Avg,pTM_Avg,ipTM_Avg,pLDDT_1,pTM_1,ipTM_1,pLDDT_2,pTM_2,ipTM_2,pLDDT_3,pTM_3,ipTM_3,Config,Recycle_number +692,Ubiquitin,SKP1,P0CG48,P63208,low confidence interaction,70.96,0.5212,0.13578,74.1,0.544,0.163,69.9,0.531,0.144,69.4,0.495,0.14,2,5 \ No newline at end of file diff --git a/projects/2024-04-ma-dm-prc/minimal_example/Annotations.json b/projects/2024-04-ma-dm-prc/minimal_example/Annotations.json new file mode 100644 index 0000000000000000000000000000000000000000..d284b73811d3f5b811ad32975ebe659e19c555f4 --- /dev/null +++ b/projects/2024-04-ma-dm-prc/minimal_example/Annotations.json @@ -0,0 +1,18 @@ +{ + "692": { + "title": "AlphaFold2-Multimer predicted structure of Homo sapiens Ubiquitin and SKP1", + "abstract": "Unrelaxed predicted structure of Homo sapiens Ubiquitin (UniProtKB ID: P0CG48) and SKP1 (UniProtKB ID: P63208) by AlphaFold-Multimer v3. The pLDDT score of the rank 1 model is 74.1, the pTM score of the rank 1 model is 0.544, and the ipTM score of the rank 1 model is 0.163. The average pLDDT score of all five models is 70.96, the average pTM score of all five models is 0.5212, and the average ipTM score of all five models is 0.13578. This structural prediction pair is a low confidence interaction. ", + "chains": [ + { + "chain": "A", + "name": "Human Ubiquitin", + "up_ac": "P0CG48" + }, + { + "chain": "B", + "name": "Human SKP1", + "up_ac": "P63208" + } + ] + } +} \ No newline at end of file diff --git a/projects/2024-04-ma-dm-prc/minimal_example/Config_Files/config_1.json b/projects/2024-04-ma-dm-prc/minimal_example/Config_Files/config_1.json new file mode 100755 index 0000000000000000000000000000000000000000..d9a319e7c98695732355be45f0fa262dec74356a --- /dev/null +++ b/projects/2024-04-ma-dm-prc/minimal_example/Config_Files/config_1.json @@ -0,0 +1,34 @@ +{ + "num_queries": 82, + "use_templates": false, + "num_relax": 0, + "msa_mode": "mmseqs2_uniref_env", + "model_type": "alphafold2_multimer_v3", + "num_models": 5, + "num_recycles": 5, + "recycle_early_stop_tolerance": null, + "num_ensemble": 1, + "model_order": [ + 1, + 2, + 3, + 4, + 5 + ], + "keep_existing_results": true, + "rank_by": "multimer", + "max_seq": 508, + "max_extra_seq": 2048, + "pair_mode": "unpaired_paired", + "host_url": "https://api.colabfold.com", + "stop_at_score": 100, + "random_seed": 0, + "num_seeds": 1, + "recompile_padding": 10, + "commit": null, + "use_dropout": false, + "use_cluster_profile": true, + "use_fuse": true, + "use_bfloat16": true, + "version": "1.5.2" +} \ No newline at end of file diff --git a/projects/2024-04-ma-dm-prc/minimal_example/Config_Files/config_2.json b/projects/2024-04-ma-dm-prc/minimal_example/Config_Files/config_2.json new file mode 100755 index 0000000000000000000000000000000000000000..64d657403d6543372e2048deff55cd2a7a07d11c --- /dev/null +++ b/projects/2024-04-ma-dm-prc/minimal_example/Config_Files/config_2.json @@ -0,0 +1,35 @@ +{ + "num_queries": 779, + "use_templates": false, + "num_relax": 0, + "msa_mode": "mmseqs2_uniref_env", + "model_type": "alphafold2_multimer_v3", + "num_models": 5, + "num_recycles": 5, + "recycle_early_stop_tolerance": null, + "num_ensemble": 1, + "model_order": [ + 1, + 2, + 3, + 4, + 5 + ], + "keep_existing_results": true, + "rank_by": "multimer", + "max_seq": 508, + "max_extra_seq": 2048, + "pair_mode": "unpaired_paired", + "pairing_strategy": "greedy", + "host_url": "https://api.colabfold.com", + "stop_at_score": 100, + "random_seed": 0, + "num_seeds": 1, + "recompile_padding": 10, + "commit": "306d13cc38ccddd7486592b6ccb467261071cb4a", + "use_dropout": false, + "use_cluster_profile": true, + "use_fuse": true, + "use_bfloat16": true, + "version": "1.5.2" +} \ No newline at end of file diff --git a/projects/2024-04-ma-dm-prc/minimal_example/Zip_Files/692-Ubiquitin_SKP1.zip b/projects/2024-04-ma-dm-prc/minimal_example/Zip_Files/692-Ubiquitin_SKP1.zip new file mode 100755 index 0000000000000000000000000000000000000000..ef99f165b653871cf30d9f959dea6e9c8b9b57f4 Binary files /dev/null and b/projects/2024-04-ma-dm-prc/minimal_example/Zip_Files/692-Ubiquitin_SKP1.zip differ diff --git a/projects/2024-04-ma-dm-prc/minimal_example_modelcif/692-Ubiquitin_SKP1.cif b/projects/2024-04-ma-dm-prc/minimal_example_modelcif/692-Ubiquitin_SKP1.cif new file mode 100644 index 0000000000000000000000000000000000000000..989bd9b5a459434a1d32b32a9b015dfd020d2322 --- /dev/null +++ b/projects/2024-04-ma-dm-prc/minimal_example_modelcif/692-Ubiquitin_SKP1.cif @@ -0,0 +1,3110 @@ +data_692-UBIQUITIN_SKP1 +_entry.id 692-UBIQUITIN_SKP1 +_struct.entry_id 692-UBIQUITIN_SKP1 +_struct.pdbx_model_details 'Unrelaxed predicted structure of Homo sapiens Ubiquitin (UniProtKB ID: P0CG48) and SKP1 (UniProtKB ID: P63208) by AlphaFold-Multimer v3. The pLDDT score of the rank 1 model is 74.1, the pTM score of the rank 1 model is 0.544, and the ipTM score of the rank 1 model is 0.163. The average pLDDT score of all five models is 70.96, the average pTM score of all five models is 0.5212, and the average ipTM score of all five models is 0.13578. This structural prediction pair is a low confidence interaction.' +_struct.pdbx_structure_determination_methodology computational +_struct.title 'AlphaFold2-Multimer predicted structure of Homo sapiens Ubiquitin and SKP1' +_audit_conform.dict_location https://raw.githubusercontent.com/ihmwg/ModelCIF/ba728c4/base/mmcif_ma-core.dic +_audit_conform.dict_name mmcif_ma.dic +_audit_conform.dict_version 1.4.5 +# +loop_ +_citation.id +_citation.title +_citation.journal_abbrev +_citation.journal_volume +_citation.page_first +_citation.page_last +_citation.year +_citation.pdbx_database_id_PubMed +_citation.pdbx_database_id_DOI +1 'ColabFold: making protein folding accessible to all.' 'Nature Methods' 19 679 +682 2022 35637307 10.1038/s41592-022-01488-1 +2 +'MMseqs2 desktop and local web server app for fast, interactive sequence searches.' +Bioinformatics 35 2856 2858 2019 30615063 10.1093/bioinformatics/bty1057 +3 'Protein complex prediction with AlphaFold-Multimer.' bioRxiv . . . 2021 . +10.1101/2021.10.04.463034 +# +# +loop_ +_citation_author.citation_id +_citation_author.name +_citation_author.ordinal +1 'Mirdita, M.' 1 +1 'Schuetze, K.' 2 +1 'Moriwaki, Y.' 3 +1 'Heo, L.' 4 +1 'Ovchinnikov, S.' 5 +1 'Steinegger, M.' 6 +2 'Mirdita, M.' 7 +2 'Steinegger, M.' 8 +2 'Soeding, J.' 9 +3 'Evans, R.' 10 +3 "O'Neill, M." 11 +3 'Pritzel, A.' 12 +3 'Antropova, N.' 13 +3 'Senior, A.' 14 +3 'Green, T.' 15 +3 'Zidek, A.' 16 +3 'Bates, R.' 17 +3 'Blackwell, S.' 18 +3 'Yim, J.' 19 +3 'Ronneberger, O.' 20 +3 'Bodenstein, S.' 21 +3 'Zielinski, M.' 22 +3 'Bridgland, A.' 23 +3 'Potapenko, A.' 24 +3 'Cowie, A.' 25 +3 'Tunyasuvunakool, K.' 26 +3 'Jain, R.' 27 +3 'Clancy, E.' 28 +3 'Kohli, P.' 29 +3 'Jumper, J.' 30 +3 'Hassabis, D.' 31 +# +# +loop_ +_software.pdbx_ordinal +_software.name +_software.classification +_software.description +_software.version +_software.type +_software.location +_software.citation_id +1 ColabFold 'model building' 'Structure prediction' '1.5.2 (306d13c)' package +https://github.com/sokrypton/ColabFold 1 +2 MMseqs2 'data collection' 'Many-against-Many sequence searching' . package +https://github.com/soedinglab/mmseqs2 2 +3 AlphaFold-Multimer 'model building' 'Structure prediction' . package +https://github.com/deepmind/alphafold 3 +# +# +loop_ +_ma_software_parameter.parameter_id +_ma_software_parameter.group_id +_ma_software_parameter.data_type +_ma_software_parameter.name +_ma_software_parameter.value +_ma_software_parameter.description +1 1 boolean use_templates NO . +2 1 integer num_relax 0 . +3 1 string msa_mode mmseqs2_uniref_env . +4 1 string model_type alphafold2_multimer_v3 . +5 1 integer num_models 5 . +6 1 integer num_recycles 5 . +7 1 float recycle_early_stop_tolerance 0.500 . +8 1 integer num_ensemble 1 . +9 1 integer-csv model_order 1,2,3,4,5 . +10 1 boolean keep_existing_results YES . +11 1 string rank_by multimer . +12 1 integer max_seq 508 . +13 1 integer max_extra_seq 2048 . +14 1 string pair_mode unpaired_paired . +15 1 string pairing_strategy greedy . +16 1 string host_url https://api.colabfold.com . +17 1 integer 'stop_at_score' 100 . +18 1 integer random_seed 0 . +19 1 integer num_seeds 1 . +20 1 integer recompile_padding 10 . +21 1 boolean use_dropout NO . +22 1 boolean use_cluster_profile YES . +23 1 boolean use_fuse YES . +24 1 boolean use_bfloat16 YES . +# +# +loop_ +_ma_software_group.ordinal_id +_ma_software_group.group_id +_ma_software_group.software_id +_ma_software_group.parameter_group_id +1 1 1 1 +2 1 2 . +3 1 3 . +# +# +loop_ +_audit_author.name +_audit_author.pdbx_ordinal +'Shafiq, Tiasha A.' 1 +'Yu, Juntao' 2 +'Feng, Wenzhi' 3 +'Zhang, Yizhe' 4 +'Zhou, Haining' 5 +'Paulo, Joao A.' 6 +'Gygi, Steven P.' 7 +'Moazed, Danesh' 8 +# +# +loop_ +_chem_comp.id +_chem_comp.type +_chem_comp.name +_chem_comp.formula +_chem_comp.formula_weight +_chem_comp.ma_provenance +ALA 'L-peptide linking' ALANINE 'C3 H7 N O2' 89.094 'CCD Core' +ARG 'L-peptide linking' ARGININE 'C6 H15 N4 O2 1' 175.212 'CCD Core' +ASN 'L-peptide linking' ASPARAGINE 'C4 H8 N2 O3' 132.119 'CCD Core' +ASP 'L-peptide linking' 'ASPARTIC ACID' 'C4 H7 N O4' 133.103 'CCD Core' +CYS 'L-peptide linking' CYSTEINE 'C3 H7 N O2 S' 121.154 'CCD Core' +GLN 'L-peptide linking' GLUTAMINE 'C5 H10 N2 O3' 146.146 'CCD Core' +GLU 'L-peptide linking' 'GLUTAMIC ACID' 'C5 H9 N O4' 147.130 'CCD Core' +GLY 'peptide linking' GLYCINE 'C2 H5 N O2' 75.067 'CCD Core' +HIS 'L-peptide linking' HISTIDINE 'C6 H10 N3 O2 1' 156.165 'CCD Core' +ILE 'L-peptide linking' ISOLEUCINE 'C6 H13 N O2' 131.175 'CCD Core' +LEU 'L-peptide linking' LEUCINE 'C6 H13 N O2' 131.175 'CCD Core' +LYS 'L-peptide linking' LYSINE 'C6 H15 N2 O2 1' 147.198 'CCD Core' +MET 'L-peptide linking' METHIONINE 'C5 H11 N O2 S' 149.208 'CCD Core' +PHE 'L-peptide linking' PHENYLALANINE 'C9 H11 N O2' 165.192 'CCD Core' +PRO 'L-peptide linking' PROLINE 'C5 H9 N O2' 115.132 'CCD Core' +SER 'L-peptide linking' SERINE 'C3 H7 N O3' 105.093 'CCD Core' +THR 'L-peptide linking' THREONINE 'C4 H9 N O3' 119.120 'CCD Core' +TRP 'L-peptide linking' TRYPTOPHAN 'C11 H12 N2 O2' 204.229 'CCD Core' +TYR 'L-peptide linking' TYROSINE 'C9 H11 N O3' 181.191 'CCD Core' +VAL 'L-peptide linking' VALINE 'C5 H11 N O2' 117.148 'CCD Core' +# +# +loop_ +_entity.id +_entity.type +_entity.src_method +_entity.pdbx_description +_entity.formula_weight +_entity.pdbx_number_of_molecules +_entity.details +1 polymer nat 'Human Ubiquitin' 9928.088 1 . +2 polymer nat 'Human SKP1' 21598.645 1 . +# +# +loop_ +_entity_src_nat.entity_id +_entity_src_nat.pdbx_src_id +_entity_src_nat.pdbx_ncbi_taxonomy_id +_entity_src_nat.pdbx_organism_scientific +_entity_src_nat.common_name +_entity_src_nat.strain +1 1 9606 'Homo sapiens (Human)' . . +2 2 9606 'Homo sapiens (Human)' . . +# +# +loop_ +_struct_ref.id +_struct_ref.entity_id +_struct_ref.db_name +_struct_ref.db_code +_struct_ref.pdbx_db_accession +_struct_ref.pdbx_align_begin +_struct_ref.pdbx_seq_one_letter_code +_struct_ref.details +1 1 UNP UBC_HUMAN P0CG48 1 +;MQIFVKTLTGKTITLEVEPSDTIENVKAKIQDKEGIPPDQQRLIFAGKQLEDGRTLSDYNIQKESTLHLV +LRLRGG +; +. +2 2 UNP SKP1_HUMAN P63208 1 +;MPSIKLQSSDGEIFEVDVEIAKQSVTIKTMLEDLGMDDEGDDDPVPLPNVNAAILKKVIQWCTHHKDDPP +PPEDDENKEKRTDDIPVWDQEFLKVDQGTLFELILAANYLDIKGLLDVTCKTVANMIKGKTPEEIRKTFN +IKNDFTEEEEAQVRKENQWCEEK +; +. +# +# +loop_ +_struct_ref_seq.align_id +_struct_ref_seq.ref_id +_struct_ref_seq.seq_align_beg +_struct_ref_seq.seq_align_end +_struct_ref_seq.db_align_beg +_struct_ref_seq.db_align_end +1 1 1 76 1 76 +2 2 1 163 1 163 +# +# +loop_ +_ma_target_ref_db_details.target_entity_id +_ma_target_ref_db_details.db_name +_ma_target_ref_db_details.db_name_other_details +_ma_target_ref_db_details.db_code +_ma_target_ref_db_details.db_accession +_ma_target_ref_db_details.seq_db_isoform +_ma_target_ref_db_details.seq_db_align_begin +_ma_target_ref_db_details.seq_db_align_end +_ma_target_ref_db_details.ncbi_taxonomy_id +_ma_target_ref_db_details.organism_scientific +_ma_target_ref_db_details.seq_db_sequence_version_date +_ma_target_ref_db_details.seq_db_sequence_checksum +1 UNP . UBC_HUMAN P0CG48 . 1 76 9606 'Homo sapiens (Human)' 2012-06-13 +B6E7BC06FEE77196 +2 UNP . SKP1_HUMAN P63208 . 1 163 9606 'Homo sapiens (Human)' 2007-01-23 +C794D62AFB75528A +# +# +loop_ +_entity_poly.entity_id +_entity_poly.type +_entity_poly.nstd_linkage +_entity_poly.nstd_monomer +_entity_poly.pdbx_strand_id +_entity_poly.pdbx_seq_one_letter_code +_entity_poly.pdbx_seq_one_letter_code_can +1 polypeptide(L) no no A +;MQIFVKTLTGKTITLEVEPSDTIENVKAKIQDKEGIPPDQQRLIFAGKQLEDGRTLSDYNIQKESTLHLV +LRLRGG +; + +;MQIFVKTLTGKTITLEVEPSDTIENVKAKIQDKEGIPPDQQRLIFAGKQLEDGRTLSDYNIQKESTLHLV +LRLRGG +; + +2 polypeptide(L) no no B +;MPSIKLQSSDGEIFEVDVEIAKQSVTIKTMLEDLGMDDEGDDDPVPLPNVNAAILKKVIQWCTHHKDDPP +PPEDDENKEKRTDDIPVWDQEFLKVDQGTLFELILAANYLDIKGLLDVTCKTVANMIKGKTPEEIRKTFN +IKNDFTEEEEAQVRKENQWCEEK +; + +;MPSIKLQSSDGEIFEVDVEIAKQSVTIKTMLEDLGMDDEGDDDPVPLPNVNAAILKKVIQWCTHHKDDPP +PPEDDENKEKRTDDIPVWDQEFLKVDQGTLFELILAANYLDIKGLLDVTCKTVANMIKGKTPEEIRKTFN +IKNDFTEEEEAQVRKENQWCEEK +; + +# +# +loop_ +_entity_poly_seq.entity_id +_entity_poly_seq.num +_entity_poly_seq.mon_id +_entity_poly_seq.hetero +1 1 MET . +1 2 GLN . +1 3 ILE . +1 4 PHE . +1 5 VAL . +1 6 LYS . +1 7 THR . +1 8 LEU . +1 9 THR . +1 10 GLY . +1 11 LYS . +1 12 THR . +1 13 ILE . +1 14 THR . +1 15 LEU . +1 16 GLU . +1 17 VAL . +1 18 GLU . +1 19 PRO . +1 20 SER . +1 21 ASP . +1 22 THR . +1 23 ILE . +1 24 GLU . +1 25 ASN . +1 26 VAL . +1 27 LYS . +1 28 ALA . +1 29 LYS . +1 30 ILE . +1 31 GLN . +1 32 ASP . +1 33 LYS . +1 34 GLU . +1 35 GLY . +1 36 ILE . +1 37 PRO . +1 38 PRO . +1 39 ASP . +1 40 GLN . +1 41 GLN . +1 42 ARG . +1 43 LEU . +1 44 ILE . +1 45 PHE . +1 46 ALA . +1 47 GLY . +1 48 LYS . +1 49 GLN . +1 50 LEU . +1 51 GLU . +1 52 ASP . +1 53 GLY . +1 54 ARG . +1 55 THR . +1 56 LEU . +1 57 SER . +1 58 ASP . +1 59 TYR . +1 60 ASN . +1 61 ILE . +1 62 GLN . +1 63 LYS . +1 64 GLU . +1 65 SER . +1 66 THR . +1 67 LEU . +1 68 HIS . +1 69 LEU . +1 70 VAL . +1 71 LEU . +1 72 ARG . +1 73 LEU . +1 74 ARG . +1 75 GLY . +1 76 GLY . +2 1 MET . +2 2 PRO . +2 3 SER . +2 4 ILE . +2 5 LYS . +2 6 LEU . +2 7 GLN . +2 8 SER . +2 9 SER . +2 10 ASP . +2 11 GLY . +2 12 GLU . +2 13 ILE . +2 14 PHE . +2 15 GLU . +2 16 VAL . +2 17 ASP . +2 18 VAL . +2 19 GLU . +2 20 ILE . +2 21 ALA . +2 22 LYS . +2 23 GLN . +2 24 SER . +2 25 VAL . +2 26 THR . +2 27 ILE . +2 28 LYS . +2 29 THR . +2 30 MET . +2 31 LEU . +2 32 GLU . +2 33 ASP . +2 34 LEU . +2 35 GLY . +2 36 MET . +2 37 ASP . +2 38 ASP . +2 39 GLU . +2 40 GLY . +2 41 ASP . +2 42 ASP . +2 43 ASP . +2 44 PRO . +2 45 VAL . +2 46 PRO . +2 47 LEU . +2 48 PRO . +2 49 ASN . +2 50 VAL . +2 51 ASN . +2 52 ALA . +2 53 ALA . +2 54 ILE . +2 55 LEU . +2 56 LYS . +2 57 LYS . +2 58 VAL . +2 59 ILE . +2 60 GLN . +2 61 TRP . +2 62 CYS . +2 63 THR . +2 64 HIS . +2 65 HIS . +2 66 LYS . +2 67 ASP . +2 68 ASP . +2 69 PRO . +2 70 PRO . +2 71 PRO . +2 72 PRO . +2 73 GLU . +2 74 ASP . +2 75 ASP . +2 76 GLU . +2 77 ASN . +2 78 LYS . +2 79 GLU . +2 80 LYS . +2 81 ARG . +2 82 THR . +2 83 ASP . +2 84 ASP . +2 85 ILE . +2 86 PRO . +2 87 VAL . +2 88 TRP . +2 89 ASP . +2 90 GLN . +2 91 GLU . +2 92 PHE . +2 93 LEU . +2 94 LYS . +2 95 VAL . +2 96 ASP . +2 97 GLN . +2 98 GLY . +2 99 THR . +2 100 LEU . +2 101 PHE . +2 102 GLU . +2 103 LEU . +2 104 ILE . +2 105 LEU . +2 106 ALA . +2 107 ALA . +2 108 ASN . +2 109 TYR . +2 110 LEU . +2 111 ASP . +2 112 ILE . +2 113 LYS . +2 114 GLY . +2 115 LEU . +2 116 LEU . +2 117 ASP . +2 118 VAL . +2 119 THR . +2 120 CYS . +2 121 LYS . +2 122 THR . +2 123 VAL . +2 124 ALA . +2 125 ASN . +2 126 MET . +2 127 ILE . +2 128 LYS . +2 129 GLY . +2 130 LYS . +2 131 THR . +2 132 PRO . +2 133 GLU . +2 134 GLU . +2 135 ILE . +2 136 ARG . +2 137 LYS . +2 138 THR . +2 139 PHE . +2 140 ASN . +2 141 ILE . +2 142 LYS . +2 143 ASN . +2 144 ASP . +2 145 PHE . +2 146 THR . +2 147 GLU . +2 148 GLU . +2 149 GLU . +2 150 GLU . +2 151 ALA . +2 152 GLN . +2 153 VAL . +2 154 ARG . +2 155 LYS . +2 156 GLU . +2 157 ASN . +2 158 GLN . +2 159 TRP . +2 160 CYS . +2 161 GLU . +2 162 GLU . +2 163 LYS . +# +# +loop_ +_struct_asym.id +_struct_asym.entity_id +_struct_asym.details +A 1 . +B 2 . +# +# +loop_ +_pdbx_poly_seq_scheme.asym_id +_pdbx_poly_seq_scheme.entity_id +_pdbx_poly_seq_scheme.seq_id +_pdbx_poly_seq_scheme.mon_id +_pdbx_poly_seq_scheme.pdb_seq_num +_pdbx_poly_seq_scheme.auth_seq_num +_pdbx_poly_seq_scheme.pdb_mon_id +_pdbx_poly_seq_scheme.auth_mon_id +_pdbx_poly_seq_scheme.pdb_strand_id +_pdbx_poly_seq_scheme.pdb_ins_code +A 1 1 MET 1 1 MET MET A . +A 1 2 GLN 2 2 GLN GLN A . +A 1 3 ILE 3 3 ILE ILE A . +A 1 4 PHE 4 4 PHE PHE A . +A 1 5 VAL 5 5 VAL VAL A . +A 1 6 LYS 6 6 LYS LYS A . +A 1 7 THR 7 7 THR THR A . +A 1 8 LEU 8 8 LEU LEU A . +A 1 9 THR 9 9 THR THR A . +A 1 10 GLY 10 10 GLY GLY A . +A 1 11 LYS 11 11 LYS LYS A . +A 1 12 THR 12 12 THR THR A . +A 1 13 ILE 13 13 ILE ILE A . +A 1 14 THR 14 14 THR THR A . +A 1 15 LEU 15 15 LEU LEU A . +A 1 16 GLU 16 16 GLU GLU A . +A 1 17 VAL 17 17 VAL VAL A . +A 1 18 GLU 18 18 GLU GLU A . +A 1 19 PRO 19 19 PRO PRO A . +A 1 20 SER 20 20 SER SER A . +A 1 21 ASP 21 21 ASP ASP A . +A 1 22 THR 22 22 THR THR A . +A 1 23 ILE 23 23 ILE ILE A . +A 1 24 GLU 24 24 GLU GLU A . +A 1 25 ASN 25 25 ASN ASN A . +A 1 26 VAL 26 26 VAL VAL A . +A 1 27 LYS 27 27 LYS LYS A . +A 1 28 ALA 28 28 ALA ALA A . +A 1 29 LYS 29 29 LYS LYS A . +A 1 30 ILE 30 30 ILE ILE A . +A 1 31 GLN 31 31 GLN GLN A . +A 1 32 ASP 32 32 ASP ASP A . +A 1 33 LYS 33 33 LYS LYS A . +A 1 34 GLU 34 34 GLU GLU A . +A 1 35 GLY 35 35 GLY GLY A . +A 1 36 ILE 36 36 ILE ILE A . +A 1 37 PRO 37 37 PRO PRO A . +A 1 38 PRO 38 38 PRO PRO A . +A 1 39 ASP 39 39 ASP ASP A . +A 1 40 GLN 40 40 GLN GLN A . +A 1 41 GLN 41 41 GLN GLN A . +A 1 42 ARG 42 42 ARG ARG A . +A 1 43 LEU 43 43 LEU LEU A . +A 1 44 ILE 44 44 ILE ILE A . +A 1 45 PHE 45 45 PHE PHE A . +A 1 46 ALA 46 46 ALA ALA A . +A 1 47 GLY 47 47 GLY GLY A . +A 1 48 LYS 48 48 LYS LYS A . +A 1 49 GLN 49 49 GLN GLN A . +A 1 50 LEU 50 50 LEU LEU A . +A 1 51 GLU 51 51 GLU GLU A . +A 1 52 ASP 52 52 ASP ASP A . +A 1 53 GLY 53 53 GLY GLY A . +A 1 54 ARG 54 54 ARG ARG A . +A 1 55 THR 55 55 THR THR A . +A 1 56 LEU 56 56 LEU LEU A . +A 1 57 SER 57 57 SER SER A . +A 1 58 ASP 58 58 ASP ASP A . +A 1 59 TYR 59 59 TYR TYR A . +A 1 60 ASN 60 60 ASN ASN A . +A 1 61 ILE 61 61 ILE ILE A . +A 1 62 GLN 62 62 GLN GLN A . +A 1 63 LYS 63 63 LYS LYS A . +A 1 64 GLU 64 64 GLU GLU A . +A 1 65 SER 65 65 SER SER A . +A 1 66 THR 66 66 THR THR A . +A 1 67 LEU 67 67 LEU LEU A . +A 1 68 HIS 68 68 HIS HIS A . +A 1 69 LEU 69 69 LEU LEU A . +A 1 70 VAL 70 70 VAL VAL A . +A 1 71 LEU 71 71 LEU LEU A . +A 1 72 ARG 72 72 ARG ARG A . +A 1 73 LEU 73 73 LEU LEU A . +A 1 74 ARG 74 74 ARG ARG A . +A 1 75 GLY 75 75 GLY GLY A . +A 1 76 GLY 76 76 GLY GLY A . +B 2 1 MET 1 1 MET MET B . +B 2 2 PRO 2 2 PRO PRO B . +B 2 3 SER 3 3 SER SER B . +B 2 4 ILE 4 4 ILE ILE B . +B 2 5 LYS 5 5 LYS LYS B . +B 2 6 LEU 6 6 LEU LEU B . +B 2 7 GLN 7 7 GLN GLN B . +B 2 8 SER 8 8 SER SER B . +B 2 9 SER 9 9 SER SER B . +B 2 10 ASP 10 10 ASP ASP B . +B 2 11 GLY 11 11 GLY GLY B . +B 2 12 GLU 12 12 GLU GLU B . +B 2 13 ILE 13 13 ILE ILE B . +B 2 14 PHE 14 14 PHE PHE B . +B 2 15 GLU 15 15 GLU GLU B . +B 2 16 VAL 16 16 VAL VAL B . +B 2 17 ASP 17 17 ASP ASP B . +B 2 18 VAL 18 18 VAL VAL B . +B 2 19 GLU 19 19 GLU GLU B . +B 2 20 ILE 20 20 ILE ILE B . +B 2 21 ALA 21 21 ALA ALA B . +B 2 22 LYS 22 22 LYS LYS B . +B 2 23 GLN 23 23 GLN GLN B . +B 2 24 SER 24 24 SER SER B . +B 2 25 VAL 25 25 VAL VAL B . +B 2 26 THR 26 26 THR THR B . +B 2 27 ILE 27 27 ILE ILE B . +B 2 28 LYS 28 28 LYS LYS B . +B 2 29 THR 29 29 THR THR B . +B 2 30 MET 30 30 MET MET B . +B 2 31 LEU 31 31 LEU LEU B . +B 2 32 GLU 32 32 GLU GLU B . +B 2 33 ASP 33 33 ASP ASP B . +B 2 34 LEU 34 34 LEU LEU B . +B 2 35 GLY 35 35 GLY GLY B . +B 2 36 MET 36 36 MET MET B . +B 2 37 ASP 37 37 ASP ASP B . +B 2 38 ASP 38 38 ASP ASP B . +B 2 39 GLU 39 39 GLU GLU B . +B 2 40 GLY 40 40 GLY GLY B . +B 2 41 ASP 41 41 ASP ASP B . +B 2 42 ASP 42 42 ASP ASP B . +B 2 43 ASP 43 43 ASP ASP B . +B 2 44 PRO 44 44 PRO PRO B . +B 2 45 VAL 45 45 VAL VAL B . +B 2 46 PRO 46 46 PRO PRO B . +B 2 47 LEU 47 47 LEU LEU B . +B 2 48 PRO 48 48 PRO PRO B . +B 2 49 ASN 49 49 ASN ASN B . +B 2 50 VAL 50 50 VAL VAL B . +B 2 51 ASN 51 51 ASN ASN B . +B 2 52 ALA 52 52 ALA ALA B . +B 2 53 ALA 53 53 ALA ALA B . +B 2 54 ILE 54 54 ILE ILE B . +B 2 55 LEU 55 55 LEU LEU B . +B 2 56 LYS 56 56 LYS LYS B . +B 2 57 LYS 57 57 LYS LYS B . +B 2 58 VAL 58 58 VAL VAL B . +B 2 59 ILE 59 59 ILE ILE B . +B 2 60 GLN 60 60 GLN GLN B . +B 2 61 TRP 61 61 TRP TRP B . +B 2 62 CYS 62 62 CYS CYS B . +B 2 63 THR 63 63 THR THR B . +B 2 64 HIS 64 64 HIS HIS B . +B 2 65 HIS 65 65 HIS HIS B . +B 2 66 LYS 66 66 LYS LYS B . +B 2 67 ASP 67 67 ASP ASP B . +B 2 68 ASP 68 68 ASP ASP B . +B 2 69 PRO 69 69 PRO PRO B . +B 2 70 PRO 70 70 PRO PRO B . +B 2 71 PRO 71 71 PRO PRO B . +B 2 72 PRO 72 72 PRO PRO B . +B 2 73 GLU 73 73 GLU GLU B . +B 2 74 ASP 74 74 ASP ASP B . +B 2 75 ASP 75 75 ASP ASP B . +B 2 76 GLU 76 76 GLU GLU B . +B 2 77 ASN 77 77 ASN ASN B . +B 2 78 LYS 78 78 LYS LYS B . +B 2 79 GLU 79 79 GLU GLU B . +B 2 80 LYS 80 80 LYS LYS B . +B 2 81 ARG 81 81 ARG ARG B . +B 2 82 THR 82 82 THR THR B . +B 2 83 ASP 83 83 ASP ASP B . +B 2 84 ASP 84 84 ASP ASP B . +B 2 85 ILE 85 85 ILE ILE B . +B 2 86 PRO 86 86 PRO PRO B . +B 2 87 VAL 87 87 VAL VAL B . +B 2 88 TRP 88 88 TRP TRP B . +B 2 89 ASP 89 89 ASP ASP B . +B 2 90 GLN 90 90 GLN GLN B . +B 2 91 GLU 91 91 GLU GLU B . +B 2 92 PHE 92 92 PHE PHE B . +B 2 93 LEU 93 93 LEU LEU B . +B 2 94 LYS 94 94 LYS LYS B . +B 2 95 VAL 95 95 VAL VAL B . +B 2 96 ASP 96 96 ASP ASP B . +B 2 97 GLN 97 97 GLN GLN B . +B 2 98 GLY 98 98 GLY GLY B . +B 2 99 THR 99 99 THR THR B . +B 2 100 LEU 100 100 LEU LEU B . +B 2 101 PHE 101 101 PHE PHE B . +B 2 102 GLU 102 102 GLU GLU B . +B 2 103 LEU 103 103 LEU LEU B . +B 2 104 ILE 104 104 ILE ILE B . +B 2 105 LEU 105 105 LEU LEU B . +B 2 106 ALA 106 106 ALA ALA B . +B 2 107 ALA 107 107 ALA ALA B . +B 2 108 ASN 108 108 ASN ASN B . +B 2 109 TYR 109 109 TYR TYR B . +B 2 110 LEU 110 110 LEU LEU B . +B 2 111 ASP 111 111 ASP ASP B . +B 2 112 ILE 112 112 ILE ILE B . +B 2 113 LYS 113 113 LYS LYS B . +B 2 114 GLY 114 114 GLY GLY B . +B 2 115 LEU 115 115 LEU LEU B . +B 2 116 LEU 116 116 LEU LEU B . +B 2 117 ASP 117 117 ASP ASP B . +B 2 118 VAL 118 118 VAL VAL B . +B 2 119 THR 119 119 THR THR B . +B 2 120 CYS 120 120 CYS CYS B . +B 2 121 LYS 121 121 LYS LYS B . +B 2 122 THR 122 122 THR THR B . +B 2 123 VAL 123 123 VAL VAL B . +B 2 124 ALA 124 124 ALA ALA B . +B 2 125 ASN 125 125 ASN ASN B . +B 2 126 MET 126 126 MET MET B . +B 2 127 ILE 127 127 ILE ILE B . +B 2 128 LYS 128 128 LYS LYS B . +B 2 129 GLY 129 129 GLY GLY B . +B 2 130 LYS 130 130 LYS LYS B . +B 2 131 THR 131 131 THR THR B . +B 2 132 PRO 132 132 PRO PRO B . +B 2 133 GLU 133 133 GLU GLU B . +B 2 134 GLU 134 134 GLU GLU B . +B 2 135 ILE 135 135 ILE ILE B . +B 2 136 ARG 136 136 ARG ARG B . +B 2 137 LYS 137 137 LYS LYS B . +B 2 138 THR 138 138 THR THR B . +B 2 139 PHE 139 139 PHE PHE B . +B 2 140 ASN 140 140 ASN ASN B . +B 2 141 ILE 141 141 ILE ILE B . +B 2 142 LYS 142 142 LYS LYS B . +B 2 143 ASN 143 143 ASN ASN B . +B 2 144 ASP 144 144 ASP ASP B . +B 2 145 PHE 145 145 PHE PHE B . +B 2 146 THR 146 146 THR THR B . +B 2 147 GLU 147 147 GLU GLU B . +B 2 148 GLU 148 148 GLU GLU B . +B 2 149 GLU 149 149 GLU GLU B . +B 2 150 GLU 150 150 GLU GLU B . +B 2 151 ALA 151 151 ALA ALA B . +B 2 152 GLN 152 152 GLN GLN B . +B 2 153 VAL 153 153 VAL VAL B . +B 2 154 ARG 154 154 ARG ARG B . +B 2 155 LYS 155 155 LYS LYS B . +B 2 156 GLU 156 156 GLU GLU B . +B 2 157 ASN 157 157 ASN ASN B . +B 2 158 GLN 158 158 GLN GLN B . +B 2 159 TRP 159 159 TRP TRP B . +B 2 160 CYS 160 160 CYS CYS B . +B 2 161 GLU 161 161 GLU GLU B . +B 2 162 GLU 162 162 GLU GLU B . +B 2 163 LYS 163 163 LYS LYS B . +# +# +loop_ +_ma_data.id +_ma_data.name +_ma_data.content_type +_ma_data.content_type_other_details +1 'Human Ubiquitin' target . +2 'Human SKP1' target . +3 'Top ranked model (model_4_seed_000)' 'model coordinates' . +4 UniRef30 'reference database' . +5 'ColabFold DB' 'reference database' . +# +# +loop_ +_ma_data_group.ordinal_id +_ma_data_group.group_id +_ma_data_group.data_id +1 1 1 +2 1 2 +3 1 4 +4 1 5 +5 2 3 +# +# +loop_ +_ma_data_ref_db.data_id +_ma_data_ref_db.name +_ma_data_ref_db.location_url +_ma_data_ref_db.version +_ma_data_ref_db.release_date +4 UniRef30 https://wwwuser.gwdg.de/~compbiol/colabfold/uniref30_2302.tar.gz +2023_02 . +5 'ColabFold DB' +https://wwwuser.gwdg.de/~compbiol/colabfold/colabfold_envdb_202108.tar.gz +2021_08 . +# +# +loop_ +_ma_target_entity.entity_id +_ma_target_entity.data_id +_ma_target_entity.origin +1 1 'reference database' +2 2 'reference database' +# +# +loop_ +_ma_target_entity_instance.asym_id +_ma_target_entity_instance.entity_id +_ma_target_entity_instance.details +A 1 . +B 2 . +# +# +loop_ +_ma_protocol_step.ordinal_id +_ma_protocol_step.protocol_id +_ma_protocol_step.step_id +_ma_protocol_step.method_type +_ma_protocol_step.step_name +_ma_protocol_step.details +_ma_protocol_step.software_group_id +_ma_protocol_step.input_data_group_id +_ma_protocol_step.output_data_group_id +1 1 1 modeling . +'Model generated using ColabFold v1.5.2 (306d13c) with AlphaFold-Multimer (v3) producing 5 models with up to 5 recycles each, without model relaxation, without templates, ranked by 80*ipTM+20*pTM, starting from paired and unpaired MSAs from MMseqs2 (UniRef+Environmental).' +1 1 2 +# +# +loop_ +_ma_model_list.ordinal_id +_ma_model_list.model_id +_ma_model_list.model_group_id +_ma_model_list.model_name +_ma_model_list.model_group_name +_ma_model_list.data_id +_ma_model_list.model_type +_ma_model_list.model_type_other_details +1 1 1 'Top ranked model (model_4_seed_000)' . 3 'Ab initio model' . +# +# +loop_ +_atom_site.group_PDB +_atom_site.id +_atom_site.type_symbol +_atom_site.label_atom_id +_atom_site.label_alt_id +_atom_site.label_comp_id +_atom_site.label_seq_id +_atom_site.auth_seq_id +_atom_site.pdbx_PDB_ins_code +_atom_site.label_asym_id +_atom_site.Cartn_x +_atom_site.Cartn_y +_atom_site.Cartn_z +_atom_site.occupancy +_atom_site.label_entity_id +_atom_site.auth_asym_id +_atom_site.auth_comp_id +_atom_site.B_iso_or_equiv +_atom_site.pdbx_PDB_model_num +ATOM 1 N N . MET 1 1 ? A 12.594 -8.359 -7.895 1.000 1 A MET 73.940 1 +ATOM 2 C CA . MET 1 1 ? A 12.203 -7.594 -6.715 1.000 1 A MET 73.940 1 +ATOM 3 C C . MET 1 1 ? A 12.086 -8.500 -5.492 1.000 1 A MET 73.940 1 +ATOM 4 O O . MET 1 1 ? A 11.945 -9.719 -5.633 1.000 1 A MET 73.940 1 +ATOM 5 C CB . MET 1 1 ? A 10.875 -6.871 -6.953 1.000 1 A MET 73.940 1 +ATOM 6 C CG . MET 1 1 ? A 9.672 -7.797 -7.023 1.000 1 A MET 73.940 1 +ATOM 7 S SD . MET 1 1 ? A 8.102 -6.891 -7.301 1.000 1 A MET 73.940 1 +ATOM 8 C CE . MET 1 1 ? A 6.934 -8.273 -7.336 1.000 1 A MET 73.940 1 +ATOM 9 N N . GLN 2 2 ? A 12.383 -7.824 -4.391 1.000 1 A GLN 80.620 1 +ATOM 10 C CA . GLN 2 2 ? A 12.297 -8.594 -3.156 1.000 1 A GLN 80.620 1 +ATOM 11 C C . GLN 2 2 ? A 10.906 -8.477 -2.529 1.000 1 A GLN 80.620 1 +ATOM 12 O O . GLN 2 2 ? A 10.320 -7.395 -2.508 1.000 1 A GLN 80.620 1 +ATOM 13 C CB . GLN 2 2 ? A 13.359 -8.133 -2.162 1.000 1 A GLN 80.620 1 +ATOM 14 C CG . GLN 2 2 ? A 13.664 -9.156 -1.069 1.000 1 A GLN 80.620 1 +ATOM 15 C CD . GLN 2 2 ? A 14.820 -8.734 -0.178 1.000 1 A GLN 80.620 1 +ATOM 16 O OE1 . GLN 2 2 ? A 15.508 -7.750 -0.458 1.000 1 A GLN 80.620 1 +ATOM 17 N NE2 . GLN 2 2 ? A 15.031 -9.477 0.900 1.000 1 A GLN 80.620 1 +ATOM 18 N N . ILE 3 3 ? A 10.375 -9.688 -2.213 1.000 1 A ILE 82.810 1 +ATOM 19 C CA . ILE 3 3 ? A 9.156 -9.719 -1.411 1.000 1 A ILE 82.810 1 +ATOM 20 C C . ILE 3 3 ? A 9.391 -10.547 -0.149 1.000 1 A ILE 82.810 1 +ATOM 21 O O . ILE 3 3 ? A 10.367 -11.281 -0.055 1.000 1 A ILE 82.810 1 +ATOM 22 C CB . ILE 3 3 ? A 7.961 -10.281 -2.213 1.000 1 A ILE 82.810 1 +ATOM 23 C CG1 . ILE 3 3 ? A 8.242 -11.734 -2.627 1.000 1 A ILE 82.810 1 +ATOM 24 C CG2 . ILE 3 3 ? A 7.668 -9.414 -3.436 1.000 1 A ILE 82.810 1 +ATOM 25 C CD1 . ILE 3 3 ? A 7.020 -12.469 -3.164 1.000 1 A ILE 82.810 1 +ATOM 26 N N . PHE 4 4 ? A 8.609 -10.328 0.827 1.000 1 A PHE 79.250 1 +ATOM 27 C CA . PHE 4 4 ? A 8.680 -11.070 2.080 1.000 1 A PHE 79.250 1 +ATOM 28 C C . PHE 4 4 ? A 7.469 -11.992 2.230 1.000 1 A PHE 79.250 1 +ATOM 29 O O . PHE 4 4 ? A 6.367 -11.656 1.789 1.000 1 A PHE 79.250 1 +ATOM 30 C CB . PHE 4 4 ? A 8.758 -10.117 3.271 1.000 1 A PHE 79.250 1 +ATOM 31 C CG . PHE 4 4 ? A 9.898 -9.133 3.186 1.000 1 A PHE 79.250 1 +ATOM 32 C CD1 . PHE 4 4 ? A 11.203 -9.555 3.400 1.000 1 A PHE 79.250 1 +ATOM 33 C CD2 . PHE 4 4 ? A 9.664 -7.801 2.893 1.000 1 A PHE 79.250 1 +ATOM 34 C CE1 . PHE 4 4 ? A 12.266 -8.648 3.322 1.000 1 A PHE 79.250 1 +ATOM 35 C CE2 . PHE 4 4 ? A 10.719 -6.891 2.812 1.000 1 A PHE 79.250 1 +ATOM 36 C CZ . PHE 4 4 ? A 12.016 -7.320 3.029 1.000 1 A PHE 79.250 1 +ATOM 37 N N . VAL 5 5 ? A 7.793 -13.172 2.578 1.000 1 A VAL 84.380 1 +ATOM 38 C CA . VAL 5 5 ? A 6.738 -14.133 2.896 1.000 1 A VAL 84.380 1 +ATOM 39 C C . VAL 5 5 ? A 6.777 -14.461 4.387 1.000 1 A VAL 84.380 1 +ATOM 40 O O . VAL 5 5 ? A 7.785 -14.961 4.891 1.000 1 A VAL 84.380 1 +ATOM 41 C CB . VAL 5 5 ? A 6.875 -15.422 2.059 1.000 1 A VAL 84.380 1 +ATOM 42 C CG1 . VAL 5 5 ? A 5.777 -16.422 2.420 1.000 1 A VAL 84.380 1 +ATOM 43 C CG2 . VAL 5 5 ? A 6.836 -15.094 0.566 1.000 1 A VAL 84.380 1 +ATOM 44 N N . LYS 6 6 ? A 5.730 -14.094 5.086 1.000 1 A LYS 77.310 1 +ATOM 45 C CA . LYS 6 6 ? A 5.645 -14.383 6.512 1.000 1 A LYS 77.310 1 +ATOM 46 C C . LYS 6 6 ? A 4.824 -15.648 6.770 1.000 1 A LYS 77.310 1 +ATOM 47 O O . LYS 6 6 ? A 3.721 -15.789 6.242 1.000 1 A LYS 77.310 1 +ATOM 48 C CB . LYS 6 6 ? A 5.035 -13.203 7.266 1.000 1 A LYS 77.310 1 +ATOM 49 C CG . LYS 6 6 ? A 5.559 -13.039 8.688 1.000 1 A LYS 77.310 1 +ATOM 50 C CD . LYS 6 6 ? A 5.047 -11.750 9.328 1.000 1 A LYS 77.310 1 +ATOM 51 C CE . LYS 6 6 ? A 5.570 -11.586 10.742 1.000 1 A LYS 77.310 1 +ATOM 52 N NZ . LYS 6 6 ? A 5.109 -10.305 11.359 1.000 1 A LYS 77.310 1 +ATOM 53 N N . THR 7 7 ? A 5.434 -16.500 7.453 1.000 1 A THR 73.000 1 +ATOM 54 C CA . THR 7 7 ? A 4.746 -17.734 7.820 1.000 1 A THR 73.000 1 +ATOM 55 C C . THR 7 7 ? A 3.928 -17.531 9.094 1.000 1 A THR 73.000 1 +ATOM 56 O O . THR 7 7 ? A 4.062 -16.516 9.773 1.000 1 A THR 73.000 1 +ATOM 57 C CB . THR 7 7 ? A 5.742 -18.891 8.016 1.000 1 A THR 73.000 1 +ATOM 58 O OG1 . THR 7 7 ? A 6.547 -18.625 9.172 1.000 1 A THR 73.000 1 +ATOM 59 C CG2 . THR 7 7 ? A 6.648 -19.047 6.805 1.000 1 A THR 73.000 1 +ATOM 60 N N . LEU 8 8 ? A 3.123 -18.516 9.383 1.000 1 A LEU 63.970 1 +ATOM 61 C CA . LEU 8 8 ? A 2.285 -18.438 10.578 1.000 1 A LEU 63.970 1 +ATOM 62 C C . LEU 8 8 ? A 3.135 -18.484 11.836 1.000 1 A LEU 63.970 1 +ATOM 63 O O . LEU 8 8 ? A 2.740 -17.953 12.875 1.000 1 A LEU 63.970 1 +ATOM 64 C CB . LEU 8 8 ? A 1.270 -19.594 10.586 1.000 1 A LEU 63.970 1 +ATOM 65 C CG . LEU 8 8 ? A 0.066 -19.438 9.656 1.000 1 A LEU 63.970 1 +ATOM 66 C CD1 . LEU 8 8 ? A -0.695 -20.750 9.547 1.000 1 A LEU 63.970 1 +ATOM 67 C CD2 . LEU 8 8 ? A -0.851 -18.312 10.141 1.000 1 A LEU 63.970 1 +ATOM 68 N N . THR 9 9 ? A 4.293 -19.109 11.664 1.000 1 A THR 63.780 1 +ATOM 69 C CA . THR 9 9 ? A 5.172 -19.234 12.820 1.000 1 A THR 63.780 1 +ATOM 70 C C . THR 9 9 ? A 5.977 -17.969 13.031 1.000 1 A THR 63.780 1 +ATOM 71 O O . THR 9 9 ? A 6.730 -17.844 14.008 1.000 1 A THR 63.780 1 +ATOM 72 C CB . THR 9 9 ? A 6.129 -20.438 12.672 1.000 1 A THR 63.780 1 +ATOM 73 O OG1 . THR 9 9 ? A 6.918 -20.266 11.484 1.000 1 A THR 63.780 1 +ATOM 74 C CG2 . THR 9 9 ? A 5.359 -21.750 12.555 1.000 1 A THR 63.780 1 +ATOM 75 N N . GLY 10 10 ? A 5.867 -17.016 12.164 1.000 1 A GLY 62.690 1 +ATOM 76 C CA . GLY 10 10 ? A 6.562 -15.742 12.297 1.000 1 A GLY 62.690 1 +ATOM 77 C C . GLY 10 10 ? A 7.848 -15.672 11.492 1.000 1 A GLY 62.690 1 +ATOM 78 O O . GLY 10 10 ? A 8.555 -14.664 11.531 1.000 1 A GLY 62.690 1 +ATOM 79 N N . LYS 11 11 ? A 8.109 -16.766 10.906 1.000 1 A LYS 71.190 1 +ATOM 80 C CA . LYS 11 11 ? A 9.297 -16.750 10.047 1.000 1 A LYS 71.190 1 +ATOM 81 C C . LYS 11 11 ? A 9.062 -15.914 8.797 1.000 1 A LYS 71.190 1 +ATOM 82 O O . LYS 11 11 ? A 7.984 -15.969 8.203 1.000 1 A LYS 71.190 1 +ATOM 83 C CB . LYS 11 11 ? A 9.695 -18.172 9.664 1.000 1 A LYS 71.190 1 +ATOM 84 C CG . LYS 11 11 ? A 11.023 -18.281 8.930 1.000 1 A LYS 71.190 1 +ATOM 85 C CD . LYS 11 11 ? A 11.406 -19.719 8.656 1.000 1 A LYS 71.190 1 +ATOM 86 C CE . LYS 11 11 ? A 12.734 -19.828 7.914 1.000 1 A LYS 71.190 1 +ATOM 87 N NZ . LYS 11 11 ? A 13.117 -21.234 7.656 1.000 1 A LYS 71.190 1 +ATOM 88 N N . THR 12 12 ? A 10.016 -14.984 8.570 1.000 1 A THR 73.000 1 +ATOM 89 C CA . THR 12 12 ? A 9.953 -14.195 7.348 1.000 1 A THR 73.000 1 +ATOM 90 C C . THR 12 12 ? A 10.945 -14.719 6.312 1.000 1 A THR 73.000 1 +ATOM 91 O O . THR 12 12 ? A 12.133 -14.859 6.602 1.000 1 A THR 73.000 1 +ATOM 92 C CB . THR 12 12 ? A 10.242 -12.711 7.633 1.000 1 A THR 73.000 1 +ATOM 93 O OG1 . THR 12 12 ? A 9.328 -12.234 8.625 1.000 1 A THR 73.000 1 +ATOM 94 C CG2 . THR 12 12 ? A 10.086 -11.867 6.371 1.000 1 A THR 73.000 1 +ATOM 95 N N . ILE 13 13 ? A 10.414 -15.109 5.145 1.000 1 A ILE 80.880 1 +ATOM 96 C CA . ILE 13 13 ? A 11.219 -15.586 4.027 1.000 1 A ILE 80.880 1 +ATOM 97 C C . ILE 13 13 ? A 11.383 -14.477 2.994 1.000 1 A ILE 80.880 1 +ATOM 98 O O . ILE 13 13 ? A 10.414 -13.805 2.629 1.000 1 A ILE 80.880 1 +ATOM 99 C CB . ILE 13 13 ? A 10.594 -16.844 3.371 1.000 1 A ILE 80.880 1 +ATOM 100 C CG1 . ILE 13 13 ? A 10.297 -17.906 4.434 1.000 1 A ILE 80.880 1 +ATOM 101 C CG2 . ILE 13 13 ? A 11.508 -17.391 2.279 1.000 1 A ILE 80.880 1 +ATOM 102 C CD1 . ILE 13 13 ? A 9.352 -19.000 3.959 1.000 1 A ILE 80.880 1 +ATOM 103 N N . THR 14 14 ? A 12.594 -14.219 2.660 1.000 1 A THR 82.310 1 +ATOM 104 C CA . THR 14 14 ? A 12.852 -13.234 1.618 1.000 1 A THR 82.310 1 +ATOM 105 C C . THR 14 14 ? A 13.016 -13.914 0.260 1.000 1 A THR 82.310 1 +ATOM 106 O O . THR 14 14 ? A 13.805 -14.844 0.117 1.000 1 A THR 82.310 1 +ATOM 107 C CB . THR 14 14 ? A 14.109 -12.406 1.935 1.000 1 A THR 82.310 1 +ATOM 108 O OG1 . THR 14 14 ? A 14.008 -11.883 3.268 1.000 1 A THR 82.310 1 +ATOM 109 C CG2 . THR 14 14 ? A 14.258 -11.242 0.959 1.000 1 A THR 82.310 1 +ATOM 110 N N . LEU 15 15 ? A 12.180 -13.445 -0.650 1.000 1 A LEU 87.060 1 +ATOM 111 C CA . LEU 15 15 ? A 12.227 -14.047 -1.980 1.000 1 A LEU 87.060 1 +ATOM 112 C C . LEU 15 15 ? A 12.609 -13.008 -3.029 1.000 1 A LEU 87.060 1 +ATOM 113 O O . LEU 15 15 ? A 12.195 -11.852 -2.939 1.000 1 A LEU 87.060 1 +ATOM 114 C CB . LEU 15 15 ? A 10.875 -14.672 -2.332 1.000 1 A LEU 87.060 1 +ATOM 115 C CG . LEU 15 15 ? A 10.406 -15.820 -1.438 1.000 1 A LEU 87.060 1 +ATOM 116 C CD1 . LEU 15 15 ? A 8.977 -16.219 -1.783 1.000 1 A LEU 87.060 1 +ATOM 117 C CD2 . LEU 15 15 ? A 11.344 -17.016 -1.574 1.000 1 A LEU 87.060 1 +ATOM 118 N N . GLU 16 16 ? A 13.461 -13.328 -3.930 1.000 1 A GLU 86.250 1 +ATOM 119 C CA . GLU 16 16 ? A 13.711 -12.539 -5.133 1.000 1 A GLU 86.250 1 +ATOM 120 C C . GLU 16 16 ? A 12.773 -12.953 -6.266 1.000 1 A GLU 86.250 1 +ATOM 121 O O . GLU 16 16 ? A 12.836 -14.094 -6.742 1.000 1 A GLU 86.250 1 +ATOM 122 C CB . GLU 16 16 ? A 15.164 -12.680 -5.578 1.000 1 A GLU 86.250 1 +ATOM 123 C CG . GLU 16 16 ? A 15.688 -11.477 -6.352 1.000 1 A GLU 86.250 1 +ATOM 124 C CD . GLU 16 16 ? A 15.773 -10.211 -5.508 1.000 1 A GLU 86.250 1 +ATOM 125 O OE1 . GLU 16 16 ? A 15.898 -9.109 -6.078 1.000 1 A GLU 86.250 1 +ATOM 126 O OE2 . GLU 16 16 ? A 15.719 -10.328 -4.262 1.000 1 A GLU 86.250 1 +ATOM 127 N N . VAL 17 17 ? A 11.828 -12.055 -6.461 1.000 1 A VAL 86.060 1 +ATOM 128 C CA . VAL 17 17 ? A 10.820 -12.398 -7.453 1.000 1 A VAL 86.060 1 +ATOM 129 C C . VAL 17 17 ? A 10.750 -11.305 -8.516 1.000 1 A VAL 86.060 1 +ATOM 130 O O . VAL 17 17 ? A 11.391 -10.258 -8.383 1.000 1 A VAL 86.060 1 +ATOM 131 C CB . VAL 17 17 ? A 9.430 -12.594 -6.805 1.000 1 A VAL 86.060 1 +ATOM 132 C CG1 . VAL 17 17 ? A 9.445 -13.797 -5.859 1.000 1 A VAL 86.060 1 +ATOM 133 C CG2 . VAL 17 17 ? A 9.000 -11.336 -6.062 1.000 1 A VAL 86.060 1 +ATOM 134 N N . GLU 18 18 ? A 10.180 -11.672 -9.648 1.000 1 A GLU 83.250 1 +ATOM 135 C CA . GLU 18 18 ? A 9.812 -10.711 -10.688 1.000 1 A GLU 83.250 1 +ATOM 136 C C . GLU 18 18 ? A 8.305 -10.477 -10.719 1.000 1 A GLU 83.250 1 +ATOM 137 O O . GLU 18 18 ? A 7.527 -11.367 -10.375 1.000 1 A GLU 83.250 1 +ATOM 138 C CB . GLU 18 18 ? A 10.297 -11.195 -12.055 1.000 1 A GLU 83.250 1 +ATOM 139 C CG . GLU 18 18 ? A 11.797 -11.430 -12.133 1.000 1 A GLU 83.250 1 +ATOM 140 C CD . GLU 18 18 ? A 12.617 -10.148 -12.055 1.000 1 A GLU 83.250 1 +ATOM 141 O OE1 . GLU 18 18 ? A 13.625 -10.117 -11.312 1.000 1 A GLU 83.250 1 +ATOM 142 O OE2 . GLU 18 18 ? A 12.242 -9.172 -12.750 1.000 1 A GLU 83.250 1 +ATOM 143 N N . PRO 19 19 ? A 7.820 -9.305 -10.984 1.000 1 A PRO 79.440 1 +ATOM 144 C CA . PRO 19 19 ? A 6.383 -9.031 -11.055 1.000 1 A PRO 79.440 1 +ATOM 145 C C . PRO 19 19 ? A 5.645 -10.008 -11.977 1.000 1 A PRO 79.440 1 +ATOM 146 O O . PRO 19 19 ? A 4.453 -10.258 -11.781 1.000 1 A PRO 79.440 1 +ATOM 147 C CB . PRO 19 19 ? A 6.316 -7.609 -11.609 1.000 1 A PRO 79.440 1 +ATOM 148 C CG . PRO 19 19 ? A 7.652 -7.016 -11.297 1.000 1 A PRO 79.440 1 +ATOM 149 C CD . PRO 19 19 ? A 8.664 -8.125 -11.242 1.000 1 A PRO 79.440 1 +ATOM 150 N N . SER 20 20 ? A 6.410 -10.484 -12.875 1.000 1 A SER 83.810 1 +ATOM 151 C CA . SER 20 20 ? A 5.805 -11.414 -13.828 1.000 1 A SER 83.810 1 +ATOM 152 C C . SER 20 20 ? A 5.754 -12.828 -13.266 1.000 1 A SER 83.810 1 +ATOM 153 O O . SER 20 20 ? A 5.184 -13.727 -13.891 1.000 1 A SER 83.810 1 +ATOM 154 C CB . SER 20 20 ? A 6.582 -11.406 -15.148 1.000 1 A SER 83.810 1 +ATOM 155 O OG . SER 20 20 ? A 7.949 -11.703 -14.930 1.000 1 A SER 83.810 1 +ATOM 156 N N . ASP 21 21 ? A 6.395 -12.914 -12.141 1.000 1 A ASP 87.880 1 +ATOM 157 C CA . ASP 21 21 ? A 6.348 -14.250 -11.539 1.000 1 A ASP 87.880 1 +ATOM 158 C C . ASP 21 21 ? A 4.914 -14.633 -11.172 1.000 1 A ASP 87.880 1 +ATOM 159 O O . ASP 21 21 ? A 4.168 -13.812 -10.625 1.000 1 A ASP 87.880 1 +ATOM 160 C CB . ASP 21 21 ? A 7.246 -14.312 -10.305 1.000 1 A ASP 87.880 1 +ATOM 161 C CG . ASP 21 21 ? A 8.703 -14.570 -10.648 1.000 1 A ASP 87.880 1 +ATOM 162 O OD1 . ASP 21 21 ? A 8.992 -15.023 -11.773 1.000 1 A ASP 87.880 1 +ATOM 163 O OD2 . ASP 21 21 ? A 9.570 -14.320 -9.781 1.000 1 A ASP 87.880 1 +ATOM 164 N N . THR 22 22 ? A 4.523 -15.836 -11.594 1.000 1 A THR 88.690 1 +ATOM 165 C CA . THR 22 22 ? A 3.234 -16.344 -11.148 1.000 1 A THR 88.690 1 +ATOM 166 C C . THR 22 22 ? A 3.289 -16.750 -9.680 1.000 1 A THR 88.690 1 +ATOM 167 O O . THR 22 22 ? A 4.375 -16.906 -9.117 1.000 1 A THR 88.690 1 +ATOM 168 C CB . THR 22 22 ? A 2.793 -17.562 -11.992 1.000 1 A THR 88.690 1 +ATOM 169 O OG1 . THR 22 22 ? A 3.736 -18.625 -11.828 1.000 1 A THR 88.690 1 +ATOM 170 C CG2 . THR 22 22 ? A 2.713 -17.203 -13.477 1.000 1 A THR 88.690 1 +ATOM 171 N N . ILE 23 23 ? A 2.170 -16.812 -9.094 1.000 1 A ILE 91.690 1 +ATOM 172 C CA . ILE 23 23 ? A 2.096 -17.281 -7.711 1.000 1 A ILE 91.690 1 +ATOM 173 C C . ILE 23 23 ? A 2.695 -18.672 -7.605 1.000 1 A ILE 91.690 1 +ATOM 174 O O . ILE 23 23 ? A 3.354 -19.000 -6.613 1.000 1 A ILE 91.690 1 +ATOM 175 C CB . ILE 23 23 ? A 0.640 -17.281 -7.195 1.000 1 A ILE 91.690 1 +ATOM 176 C CG1 . ILE 23 23 ? A 0.057 -15.859 -7.238 1.000 1 A ILE 91.690 1 +ATOM 177 C CG2 . ILE 23 23 ? A 0.571 -17.859 -5.777 1.000 1 A ILE 91.690 1 +ATOM 178 C CD1 . ILE 23 23 ? A 0.895 -14.820 -6.504 1.000 1 A ILE 91.690 1 +ATOM 179 N N . GLU 24 24 ? A 2.520 -19.469 -8.672 1.000 1 A GLU 91.810 1 +ATOM 180 C CA . GLU 24 24 ? A 3.131 -20.781 -8.703 1.000 1 A GLU 91.810 1 +ATOM 181 C C . GLU 24 24 ? A 4.652 -20.688 -8.625 1.000 1 A GLU 91.810 1 +ATOM 182 O O . GLU 24 24 ? A 5.293 -21.453 -7.898 1.000 1 A GLU 91.810 1 +ATOM 183 C CB . GLU 24 24 ? A 2.713 -21.547 -9.969 1.000 1 A GLU 91.810 1 +ATOM 184 C CG . GLU 24 24 ? A 3.166 -23.000 -10.000 1.000 1 A GLU 91.810 1 +ATOM 185 C CD . GLU 24 24 ? A 2.676 -23.750 -11.219 1.000 1 A GLU 91.810 1 +ATOM 186 O OE1 . GLU 24 24 ? A 2.639 -25.000 -11.188 1.000 1 A GLU 91.810 1 +ATOM 187 O OE2 . GLU 24 24 ? A 2.330 -23.094 -12.227 1.000 1 A GLU 91.810 1 +ATOM 188 N N . ASN 25 25 ? A 5.137 -19.766 -9.281 1.000 1 A ASN 91.750 1 +ATOM 189 C CA . ASN 25 25 ? A 6.578 -19.578 -9.234 1.000 1 A ASN 91.750 1 +ATOM 190 C C . ASN 25 25 ? A 7.043 -19.156 -7.844 1.000 1 A ASN 91.750 1 +ATOM 191 O O . ASN 25 25 ? A 8.062 -19.641 -7.355 1.000 1 A ASN 91.750 1 +ATOM 192 C CB . ASN 25 25 ? A 7.008 -18.531 -10.266 1.000 1 A ASN 91.750 1 +ATOM 193 C CG . ASN 25 25 ? A 7.070 -19.094 -11.680 1.000 1 A ASN 91.750 1 +ATOM 194 O OD1 . ASN 25 25 ? A 7.102 -20.312 -11.875 1.000 1 A ASN 91.750 1 +ATOM 195 N ND2 . ASN 25 25 ? A 7.086 -18.203 -12.664 1.000 1 A ASN 91.750 1 +ATOM 196 N N . VAL 26 26 ? A 6.273 -18.328 -7.277 1.000 1 A VAL 93.190 1 +ATOM 197 C CA . VAL 26 26 ? A 6.605 -17.875 -5.934 1.000 1 A VAL 93.190 1 +ATOM 198 C C . VAL 26 26 ? A 6.586 -19.047 -4.969 1.000 1 A VAL 93.190 1 +ATOM 199 O O . VAL 26 26 ? A 7.504 -19.219 -4.168 1.000 1 A VAL 93.190 1 +ATOM 200 C CB . VAL 26 26 ? A 5.633 -16.766 -5.445 1.000 1 A VAL 93.190 1 +ATOM 201 C CG1 . VAL 26 26 ? A 5.902 -16.422 -3.984 1.000 1 A VAL 93.190 1 +ATOM 202 C CG2 . VAL 26 26 ? A 5.750 -15.523 -6.328 1.000 1 A VAL 93.190 1 +ATOM 203 N N . LYS 27 27 ? A 5.574 -19.844 -5.148 1.000 1 A LYS 91.190 1 +ATOM 204 C CA . LYS 27 27 ? A 5.477 -21.031 -4.293 1.000 1 A LYS 91.190 1 +ATOM 205 C C . LYS 27 27 ? A 6.656 -21.969 -4.516 1.000 1 A LYS 91.190 1 +ATOM 206 O O . LYS 27 27 ? A 7.156 -22.578 -3.568 1.000 1 A LYS 91.190 1 +ATOM 207 C CB . LYS 27 27 ? A 4.160 -21.766 -4.547 1.000 1 A LYS 91.190 1 +ATOM 208 C CG . LYS 27 27 ? A 2.930 -21.031 -4.020 1.000 1 A LYS 91.190 1 +ATOM 209 C CD . LYS 27 27 ? A 1.661 -21.844 -4.258 1.000 1 A LYS 91.190 1 +ATOM 210 C CE . LYS 27 27 ? A 0.426 -21.109 -3.760 1.000 1 A LYS 91.190 1 +ATOM 211 N NZ . LYS 27 27 ? A -0.821 -21.891 -3.986 1.000 1 A LYS 91.190 1 +ATOM 212 N N . ALA 28 28 ? A 7.023 -22.062 -5.742 1.000 1 A ALA 93.560 1 +ATOM 213 C CA . ALA 28 28 ? A 8.180 -22.906 -6.043 1.000 1 A ALA 93.560 1 +ATOM 214 C C . ALA 28 28 ? A 9.438 -22.375 -5.355 1.000 1 A ALA 93.560 1 +ATOM 215 O O . ALA 28 28 ? A 10.234 -23.141 -4.820 1.000 1 A ALA 93.560 1 +ATOM 216 C CB . ALA 28 28 ? A 8.398 -22.984 -7.551 1.000 1 A ALA 93.560 1 +ATOM 217 N N . LYS 29 29 ? A 9.586 -21.094 -5.375 1.000 1 A LYS 90.810 1 +ATOM 218 C CA . LYS 29 29 ? A 10.742 -20.500 -4.707 1.000 1 A LYS 90.810 1 +ATOM 219 C C . LYS 29 29 ? A 10.688 -20.734 -3.201 1.000 1 A LYS 90.810 1 +ATOM 220 O O . LYS 29 29 ? A 11.719 -20.969 -2.570 1.000 1 A LYS 90.810 1 +ATOM 221 C CB . LYS 29 29 ? A 10.812 -19.000 -5.000 1.000 1 A LYS 90.810 1 +ATOM 222 C CG . LYS 29 29 ? A 11.156 -18.656 -6.445 1.000 1 A LYS 90.810 1 +ATOM 223 C CD . LYS 29 29 ? A 11.164 -17.156 -6.684 1.000 1 A LYS 90.810 1 +ATOM 224 C CE . LYS 29 29 ? A 11.414 -16.828 -8.148 1.000 1 A LYS 90.810 1 +ATOM 225 N NZ . LYS 29 29 ? A 12.875 -16.703 -8.445 1.000 1 A LYS 90.810 1 +ATOM 226 N N . ILE 30 30 ? A 9.523 -20.625 -2.697 1.000 1 A ILE 90.750 1 +ATOM 227 C CA . ILE 30 30 ? A 9.367 -20.922 -1.275 1.000 1 A ILE 90.750 1 +ATOM 228 C C . ILE 30 30 ? A 9.758 -22.359 -0.997 1.000 1 A ILE 90.750 1 +ATOM 229 O O . ILE 30 30 ? A 10.438 -22.656 -0.007 1.000 1 A ILE 90.750 1 +ATOM 230 C CB . ILE 30 30 ? A 7.918 -20.656 -0.803 1.000 1 A ILE 90.750 1 +ATOM 231 C CG1 . ILE 30 30 ? A 7.578 -19.156 -0.923 1.000 1 A ILE 90.750 1 +ATOM 232 C CG2 . ILE 30 30 ? A 7.719 -21.141 0.634 1.000 1 A ILE 90.750 1 +ATOM 233 C CD1 . ILE 30 30 ? A 6.102 -18.844 -0.733 1.000 1 A ILE 90.750 1 +ATOM 234 N N . GLN 31 31 ? A 9.281 -23.266 -1.890 1.000 1 A GLN 90.310 1 +ATOM 235 C CA . GLN 31 31 ? A 9.641 -24.672 -1.733 1.000 1 A GLN 90.310 1 +ATOM 236 C C . GLN 31 31 ? A 11.156 -24.859 -1.729 1.000 1 A GLN 90.310 1 +ATOM 237 O O . GLN 31 31 ? A 11.695 -25.594 -0.909 1.000 1 A GLN 90.310 1 +ATOM 238 C CB . GLN 31 31 ? A 9.016 -25.516 -2.844 1.000 1 A GLN 90.310 1 +ATOM 239 C CG . GLN 31 31 ? A 9.328 -27.000 -2.742 1.000 1 A GLN 90.310 1 +ATOM 240 C CD . GLN 31 31 ? A 8.688 -27.812 -3.852 1.000 1 A GLN 90.310 1 +ATOM 241 O OE1 . GLN 31 31 ? A 8.531 -27.344 -4.980 1.000 1 A GLN 90.310 1 +ATOM 242 N NE2 . GLN 31 31 ? A 8.320 -29.047 -3.539 1.000 1 A GLN 90.310 1 +ATOM 243 N N . ASP 32 32 ? A 11.781 -24.141 -2.553 1.000 1 A ASP 89.750 1 +ATOM 244 C CA . ASP 32 32 ? A 13.234 -24.266 -2.652 1.000 1 A ASP 89.750 1 +ATOM 245 C C . ASP 32 32 ? A 13.914 -23.766 -1.380 1.000 1 A ASP 89.750 1 +ATOM 246 O O . ASP 32 32 ? A 14.914 -24.344 -0.935 1.000 1 A ASP 89.750 1 +ATOM 247 C CB . ASP 32 32 ? A 13.750 -23.484 -3.863 1.000 1 A ASP 89.750 1 +ATOM 248 C CG . ASP 32 32 ? A 13.414 -24.141 -5.188 1.000 1 A ASP 89.750 1 +ATOM 249 O OD1 . ASP 32 32 ? A 13.164 -25.375 -5.207 1.000 1 A ASP 89.750 1 +ATOM 250 O OD2 . ASP 32 32 ? A 13.406 -23.438 -6.219 1.000 1 A ASP 89.750 1 +ATOM 251 N N . LYS 33 33 ? A 13.320 -22.750 -0.858 1.000 1 A LYS 84.620 1 +ATOM 252 C CA . LYS 33 33 ? A 13.953 -22.141 0.306 1.000 1 A LYS 84.620 1 +ATOM 253 C C . LYS 33 33 ? A 13.555 -22.859 1.591 1.000 1 A LYS 84.620 1 +ATOM 254 O O . LYS 33 33 ? A 14.391 -23.062 2.479 1.000 1 A LYS 84.620 1 +ATOM 255 C CB . LYS 33 33 ? A 13.578 -20.656 0.400 1.000 1 A LYS 84.620 1 +ATOM 256 C CG . LYS 33 33 ? A 14.430 -19.750 -0.479 1.000 1 A LYS 84.620 1 +ATOM 257 C CD . LYS 33 33 ? A 14.391 -18.312 0.004 1.000 1 A LYS 84.620 1 +ATOM 258 C CE . LYS 33 33 ? A 15.336 -17.422 -0.789 1.000 1 A LYS 84.620 1 +ATOM 259 N NZ . LYS 33 33 ? A 15.328 -16.016 -0.293 1.000 1 A LYS 84.620 1 +ATOM 260 N N . GLU 34 34 ? A 12.297 -23.203 1.670 1.000 1 A GLU 86.000 1 +ATOM 261 C CA . GLU 34 34 ? A 11.773 -23.719 2.936 1.000 1 A GLU 86.000 1 +ATOM 262 C C . GLU 34 34 ? A 11.414 -25.188 2.834 1.000 1 A GLU 86.000 1 +ATOM 263 O O . GLU 34 34 ? A 11.133 -25.844 3.844 1.000 1 A GLU 86.000 1 +ATOM 264 C CB . GLU 34 34 ? A 10.547 -22.906 3.373 1.000 1 A GLU 86.000 1 +ATOM 265 C CG . GLU 34 34 ? A 10.852 -21.453 3.691 1.000 1 A GLU 86.000 1 +ATOM 266 C CD . GLU 34 34 ? A 11.742 -21.281 4.910 1.000 1 A GLU 86.000 1 +ATOM 267 O OE1 . GLU 34 34 ? A 12.734 -20.516 4.836 1.000 1 A GLU 86.000 1 +ATOM 268 O OE2 . GLU 34 34 ? A 11.453 -21.906 5.949 1.000 1 A GLU 86.000 1 +ATOM 269 N N . GLY 35 35 ? A 11.305 -25.812 1.676 1.000 1 A GLY 85.440 1 +ATOM 270 C CA . GLY 35 35 ? A 10.984 -27.219 1.487 1.000 1 A GLY 85.440 1 +ATOM 271 C C . GLY 35 35 ? A 9.500 -27.516 1.601 1.000 1 A GLY 85.440 1 +ATOM 272 O O . GLY 35 35 ? A 9.102 -28.656 1.831 1.000 1 A GLY 85.440 1 +ATOM 273 N N . ILE 36 36 ? A 8.727 -26.484 1.594 1.000 1 A ILE 85.000 1 +ATOM 274 C CA . ILE 36 36 ? A 7.281 -26.656 1.689 1.000 1 A ILE 85.000 1 +ATOM 275 C C . ILE 36 36 ? A 6.695 -26.875 0.296 1.000 1 A ILE 85.000 1 +ATOM 276 O O . ILE 36 36 ? A 6.809 -26.016 -0.575 1.000 1 A ILE 85.000 1 +ATOM 277 C CB . ILE 36 36 ? A 6.605 -25.453 2.371 1.000 1 A ILE 85.000 1 +ATOM 278 C CG1 . ILE 36 36 ? A 7.254 -25.172 3.732 1.000 1 A ILE 85.000 1 +ATOM 279 C CG2 . ILE 36 36 ? A 5.098 -25.688 2.523 1.000 1 A ILE 85.000 1 +ATOM 280 C CD1 . ILE 36 36 ? A 6.828 -23.859 4.363 1.000 1 A ILE 85.000 1 +ATOM 281 N N . PRO 37 37 ? A 6.043 -28.047 0.043 1.000 1 A PRO 88.560 1 +ATOM 282 C CA . PRO 37 37 ? A 5.441 -28.281 -1.271 1.000 1 A PRO 88.560 1 +ATOM 283 C C . PRO 37 37 ? A 4.410 -27.219 -1.647 1.000 1 A PRO 88.560 1 +ATOM 284 O O . PRO 37 37 ? A 3.660 -26.750 -0.788 1.000 1 A PRO 88.560 1 +ATOM 285 C CB . PRO 37 37 ? A 4.773 -29.656 -1.116 1.000 1 A PRO 88.560 1 +ATOM 286 C CG . PRO 37 37 ? A 5.488 -30.297 0.030 1.000 1 A PRO 88.560 1 +ATOM 287 C CD . PRO 37 37 ? A 5.988 -29.219 0.944 1.000 1 A PRO 88.560 1 +ATOM 288 N N . PRO 38 38 ? A 4.418 -26.781 -2.861 1.000 1 A PRO 88.440 1 +ATOM 289 C CA . PRO 38 38 ? A 3.479 -25.734 -3.297 1.000 1 A PRO 88.440 1 +ATOM 290 C C . PRO 38 38 ? A 2.027 -26.078 -2.971 1.000 1 A PRO 88.440 1 +ATOM 291 O O . PRO 38 38 ? A 1.232 -25.188 -2.658 1.000 1 A PRO 88.440 1 +ATOM 292 C CB . PRO 38 38 ? A 3.699 -25.672 -4.809 1.000 1 A PRO 88.440 1 +ATOM 293 C CG . PRO 38 38 ? A 5.098 -26.172 -5.012 1.000 1 A PRO 88.440 1 +ATOM 294 C CD . PRO 38 38 ? A 5.418 -27.156 -3.928 1.000 1 A PRO 88.440 1 +ATOM 295 N N . ASP 39 39 ? A 1.685 -27.469 -2.994 1.000 1 A ASP 84.620 1 +ATOM 296 C CA . ASP 39 39 ? A 0.307 -27.875 -2.746 1.000 1 A ASP 84.620 1 +ATOM 297 C C . ASP 39 39 ? A -0.080 -27.656 -1.287 1.000 1 A ASP 84.620 1 +ATOM 298 O O . ASP 39 39 ? A -1.267 -27.609 -0.953 1.000 1 A ASP 84.620 1 +ATOM 299 C CB . ASP 39 39 ? A 0.112 -29.359 -3.119 1.000 1 A ASP 84.620 1 +ATOM 300 C CG . ASP 39 39 ? A 1.071 -30.281 -2.393 1.000 1 A ASP 84.620 1 +ATOM 301 O OD1 . ASP 39 39 ? A 2.182 -29.844 -2.023 1.000 1 A ASP 84.620 1 +ATOM 302 O OD2 . ASP 39 39 ? A 0.715 -31.453 -2.189 1.000 1 A ASP 84.620 1 +ATOM 303 N N . GLN 40 40 ? A 0.870 -27.500 -0.499 1.000 1 A GLN 85.380 1 +ATOM 304 C CA . GLN 40 40 ? A 0.608 -27.266 0.917 1.000 1 A GLN 85.380 1 +ATOM 305 C C . GLN 40 40 ? A 0.654 -25.781 1.241 1.000 1 A GLN 85.380 1 +ATOM 306 O O . GLN 40 40 ? A 0.405 -25.375 2.379 1.000 1 A GLN 85.380 1 +ATOM 307 C CB . GLN 40 40 ? A 1.616 -28.031 1.784 1.000 1 A GLN 85.380 1 +ATOM 308 C CG . GLN 40 40 ? A 1.487 -29.547 1.697 1.000 1 A GLN 85.380 1 +ATOM 309 C CD . GLN 40 40 ? A 2.449 -30.266 2.619 1.000 1 A GLN 85.380 1 +ATOM 310 O OE1 . GLN 40 40 ? A 3.135 -29.641 3.438 1.000 1 A GLN 85.380 1 +ATOM 311 N NE2 . GLN 40 40 ? A 2.508 -31.594 2.496 1.000 1 A GLN 85.380 1 +ATOM 312 N N . GLN 41 41 ? A 1.011 -25.016 0.274 1.000 1 A GLN 87.060 1 +ATOM 313 C CA . GLN 41 41 ? A 1.165 -23.578 0.496 1.000 1 A GLN 87.060 1 +ATOM 314 C C . GLN 41 41 ? A -0.099 -22.828 0.098 1.000 1 A GLN 87.060 1 +ATOM 315 O O . GLN 41 41 ? A -0.688 -23.094 -0.949 1.000 1 A GLN 87.060 1 +ATOM 316 C CB . GLN 41 41 ? A 2.367 -23.047 -0.284 1.000 1 A GLN 87.060 1 +ATOM 317 C CG . GLN 41 41 ? A 3.695 -23.656 0.130 1.000 1 A GLN 87.060 1 +ATOM 318 C CD . GLN 41 41 ? A 4.859 -23.156 -0.706 1.000 1 A GLN 87.060 1 +ATOM 319 O OE1 . GLN 41 41 ? A 4.883 -22.000 -1.134 1.000 1 A GLN 87.060 1 +ATOM 320 N NE2 . GLN 41 41 ? A 5.832 -24.031 -0.944 1.000 1 A GLN 87.060 1 +ATOM 321 N N . ARG 42 42 ? A -0.602 -21.906 1.000 1.000 1 A ARG 84.310 1 +ATOM 322 C CA . ARG 42 42 ? A -1.598 -20.891 0.691 1.000 1 A ARG 84.310 1 +ATOM 323 C C . ARG 42 42 ? A -1.034 -19.484 0.915 1.000 1 A ARG 84.310 1 +ATOM 324 O O . ARG 42 42 ? A -0.644 -19.141 2.031 1.000 1 A ARG 84.310 1 +ATOM 325 C CB . ARG 42 42 ? A -2.854 -21.078 1.541 1.000 1 A ARG 84.310 1 +ATOM 326 C CG . ARG 42 42 ? A -3.756 -22.203 1.052 1.000 1 A ARG 84.310 1 +ATOM 327 C CD . ARG 42 42 ? A -5.012 -22.328 1.901 1.000 1 A ARG 84.310 1 +ATOM 328 N NE . ARG 42 42 ? A -5.809 -23.500 1.521 1.000 1 A ARG 84.310 1 +ATOM 329 C CZ . ARG 42 42 ? A -7.020 -23.766 1.992 1.000 1 A ARG 84.310 1 +ATOM 330 N NH1 . ARG 42 42 ? A -7.605 -22.969 2.873 1.000 1 A ARG 84.310 1 +ATOM 331 N NH2 . ARG 42 42 ? A -7.652 -24.859 1.579 1.000 1 A ARG 84.310 1 +ATOM 332 N N . LEU 43 43 ? A -0.884 -18.766 -0.194 1.000 1 A LEU 85.750 1 +ATOM 333 C CA . LEU 43 43 ? A -0.344 -17.406 -0.111 1.000 1 A LEU 85.750 1 +ATOM 334 C C . LEU 43 43 ? A -1.466 -16.375 -0.066 1.000 1 A LEU 85.750 1 +ATOM 335 O O . LEU 43 43 ? A -2.408 -16.438 -0.859 1.000 1 A LEU 85.750 1 +ATOM 336 C CB . LEU 43 43 ? A 0.579 -17.125 -1.300 1.000 1 A LEU 85.750 1 +ATOM 337 C CG . LEU 43 43 ? A 1.903 -17.891 -1.327 1.000 1 A LEU 85.750 1 +ATOM 338 C CD1 . LEU 43 43 ? A 2.641 -17.625 -2.635 1.000 1 A LEU 85.750 1 +ATOM 339 C CD2 . LEU 43 43 ? A 2.768 -17.500 -0.131 1.000 1 A LEU 85.750 1 +ATOM 340 N N . ILE 44 44 ? A -1.348 -15.430 0.903 1.000 1 A ILE 79.000 1 +ATOM 341 C CA . ILE 44 44 ? A -2.355 -14.391 1.107 1.000 1 A ILE 79.000 1 +ATOM 342 C C . ILE 44 44 ? A -1.707 -13.016 1.014 1.000 1 A ILE 79.000 1 +ATOM 343 O O . ILE 44 44 ? A -0.628 -12.789 1.566 1.000 1 A ILE 79.000 1 +ATOM 344 C CB . ILE 44 44 ? A -3.068 -14.555 2.469 1.000 1 A ILE 79.000 1 +ATOM 345 C CG1 . ILE 44 44 ? A -3.727 -15.938 2.562 1.000 1 A ILE 79.000 1 +ATOM 346 C CG2 . ILE 44 44 ? A -4.098 -13.445 2.680 1.000 1 A ILE 79.000 1 +ATOM 347 C CD1 . ILE 44 44 ? A -4.227 -16.297 3.955 1.000 1 A ILE 79.000 1 +ATOM 348 N N . PHE 45 45 ? A -2.264 -12.172 0.196 1.000 1 A PHE 76.620 1 +ATOM 349 C CA . PHE 45 45 ? A -1.839 -10.781 0.104 1.000 1 A PHE 76.620 1 +ATOM 350 C C . PHE 45 45 ? A -3.043 -9.844 0.099 1.000 1 A PHE 76.620 1 +ATOM 351 O O . PHE 45 45 ? A -4.004 -10.070 -0.638 1.000 1 A PHE 76.620 1 +ATOM 352 C CB . PHE 45 45 ? A -0.996 -10.555 -1.155 1.000 1 A PHE 76.620 1 +ATOM 353 C CG . PHE 45 45 ? A -0.457 -9.156 -1.284 1.000 1 A PHE 76.620 1 +ATOM 354 C CD1 . PHE 45 45 ? A -0.925 -8.305 -2.277 1.000 1 A PHE 76.620 1 +ATOM 355 C CD2 . PHE 45 45 ? A 0.519 -8.695 -0.411 1.000 1 A PHE 76.620 1 +ATOM 356 C CE1 . PHE 45 45 ? A -0.429 -7.008 -2.398 1.000 1 A PHE 76.620 1 +ATOM 357 C CE2 . PHE 45 45 ? A 1.020 -7.398 -0.526 1.000 1 A PHE 76.620 1 +ATOM 358 C CZ . PHE 45 45 ? A 0.545 -6.559 -1.521 1.000 1 A PHE 76.620 1 +ATOM 359 N N . ALA 46 46 ? A -3.045 -8.930 1.020 1.000 1 A ALA 63.690 1 +ATOM 360 C CA . ALA 46 46 ? A -4.129 -7.961 1.123 1.000 1 A ALA 63.690 1 +ATOM 361 C C . ALA 46 46 ? A -5.473 -8.656 1.330 1.000 1 A ALA 63.690 1 +ATOM 362 O O . ALA 46 46 ? A -6.473 -8.289 0.709 1.000 1 A ALA 63.690 1 +ATOM 363 C CB . ALA 46 46 ? A -4.176 -7.082 -0.124 1.000 1 A ALA 63.690 1 +ATOM 364 N N . GLY 47 47 ? A -5.438 -9.773 2.023 1.000 1 A GLY 58.660 1 +ATOM 365 C CA . GLY 47 47 ? A -6.656 -10.484 2.379 1.000 1 A GLY 58.660 1 +ATOM 366 C C . GLY 47 47 ? A -7.145 -11.422 1.291 1.000 1 A GLY 58.660 1 +ATOM 367 O O . GLY 47 47 ? A -8.227 -11.992 1.396 1.000 1 A GLY 58.660 1 +ATOM 368 N N . LYS 48 48 ? A -6.375 -11.367 0.274 1.000 1 A LYS 66.500 1 +ATOM 369 C CA . LYS 48 48 ? A -6.785 -12.219 -0.836 1.000 1 A LYS 66.500 1 +ATOM 370 C C . LYS 48 48 ? A -5.832 -13.406 -1.000 1.000 1 A LYS 66.500 1 +ATOM 371 O O . LYS 48 48 ? A -4.617 -13.250 -0.890 1.000 1 A LYS 66.500 1 +ATOM 372 C CB . LYS 48 48 ? A -6.848 -11.422 -2.137 1.000 1 A LYS 66.500 1 +ATOM 373 C CG . LYS 48 48 ? A -7.938 -10.359 -2.154 1.000 1 A LYS 66.500 1 +ATOM 374 C CD . LYS 48 48 ? A -8.016 -9.656 -3.504 1.000 1 A LYS 66.500 1 +ATOM 375 C CE . LYS 48 48 ? A -9.086 -8.578 -3.516 1.000 1 A LYS 66.500 1 +ATOM 376 N NZ . LYS 48 48 ? A -9.172 -7.891 -4.840 1.000 1 A LYS 66.500 1 +ATOM 377 N N . GLN 49 49 ? A -6.500 -14.594 -1.154 1.000 1 A GLN 73.880 1 +ATOM 378 C CA . GLN 49 49 ? A -5.688 -15.750 -1.519 1.000 1 A GLN 73.880 1 +ATOM 379 C C . GLN 49 49 ? A -5.215 -15.656 -2.967 1.000 1 A GLN 73.880 1 +ATOM 380 O O . GLN 49 49 ? A -6.012 -15.375 -3.867 1.000 1 A GLN 73.880 1 +ATOM 381 C CB . GLN 49 49 ? A -6.465 -17.047 -1.302 1.000 1 A GLN 73.880 1 +ATOM 382 C CG . GLN 49 49 ? A -5.629 -18.297 -1.493 1.000 1 A GLN 73.880 1 +ATOM 383 C CD . GLN 49 49 ? A -6.395 -19.578 -1.165 1.000 1 A GLN 73.880 1 +ATOM 384 O OE1 . GLN 49 49 ? A -7.098 -19.641 -0.150 1.000 1 A GLN 73.880 1 +ATOM 385 N NE2 . GLN 49 49 ? A -6.266 -20.578 -2.020 1.000 1 A GLN 73.880 1 +ATOM 386 N N . LEU 50 50 ? A -3.963 -15.859 -3.047 1.000 1 A LEU 82.310 1 +ATOM 387 C CA . LEU 50 50 ? A -3.367 -15.719 -4.371 1.000 1 A LEU 82.310 1 +ATOM 388 C C . LEU 50 50 ? A -3.521 -17.000 -5.180 1.000 1 A LEU 82.310 1 +ATOM 389 O O . LEU 50 50 ? A -3.344 -18.094 -4.648 1.000 1 A LEU 82.310 1 +ATOM 390 C CB . LEU 50 50 ? A -1.886 -15.344 -4.254 1.000 1 A LEU 82.310 1 +ATOM 391 C CG . LEU 50 50 ? A -1.555 -14.141 -3.379 1.000 1 A LEU 82.310 1 +ATOM 392 C CD1 . LEU 50 50 ? A -0.050 -13.891 -3.363 1.000 1 A LEU 82.310 1 +ATOM 393 C CD2 . LEU 50 50 ? A -2.301 -12.898 -3.869 1.000 1 A LEU 82.310 1 +ATOM 394 N N . GLU 51 51 ? A -3.920 -16.781 -6.406 1.000 1 A GLU 85.940 1 +ATOM 395 C CA . GLU 51 51 ? A -4.133 -17.938 -7.273 1.000 1 A GLU 85.940 1 +ATOM 396 C C . GLU 51 51 ? A -2.867 -18.281 -8.055 1.000 1 A GLU 85.940 1 +ATOM 397 O O . GLU 51 51 ? A -2.182 -17.391 -8.562 1.000 1 A GLU 85.940 1 +ATOM 398 C CB . GLU 51 51 ? A -5.293 -17.672 -8.242 1.000 1 A GLU 85.940 1 +ATOM 399 C CG . GLU 51 51 ? A -6.637 -17.484 -7.551 1.000 1 A GLU 85.940 1 +ATOM 400 C CD . GLU 51 51 ? A -7.781 -17.266 -8.523 1.000 1 A GLU 85.940 1 +ATOM 401 O OE1 . GLU 51 51 ? A -8.945 -17.156 -8.078 1.000 1 A GLU 85.940 1 +ATOM 402 O OE2 . GLU 51 51 ? A -7.516 -17.203 -9.742 1.000 1 A GLU 85.940 1 +ATOM 403 N N . ASP 52 52 ? A -2.600 -19.531 -8.125 1.000 1 A ASP 87.310 1 +ATOM 404 C CA . ASP 52 52 ? A -1.361 -20.062 -8.688 1.000 1 A ASP 87.310 1 +ATOM 405 C C . ASP 52 52 ? A -1.130 -19.516 -10.102 1.000 1 A ASP 87.310 1 +ATOM 406 O O . ASP 52 52 ? A 0.005 -19.219 -10.477 1.000 1 A ASP 87.310 1 +ATOM 407 C CB . ASP 52 52 ? A -1.386 -21.594 -8.711 1.000 1 A ASP 87.310 1 +ATOM 408 C CG . ASP 52 52 ? A -1.225 -22.203 -7.332 1.000 1 A ASP 87.310 1 +ATOM 409 O OD1 . ASP 52 52 ? A -0.831 -21.484 -6.387 1.000 1 A ASP 87.310 1 +ATOM 410 O OD2 . ASP 52 52 ? A -1.490 -23.422 -7.191 1.000 1 A ASP 87.310 1 +ATOM 411 N N . GLY 53 53 ? A -2.199 -19.391 -10.875 1.000 1 A GLY 86.120 1 +ATOM 412 C CA . GLY 53 53 ? A -2.059 -19.016 -12.273 1.000 1 A GLY 86.120 1 +ATOM 413 C C . GLY 53 53 ? A -1.900 -17.516 -12.469 1.000 1 A GLY 86.120 1 +ATOM 414 O O . GLY 53 53 ? A -1.688 -17.047 -13.594 1.000 1 A GLY 86.120 1 +ATOM 415 N N . ARG 54 54 ? A -1.936 -16.844 -11.492 1.000 1 A ARG 84.190 1 +ATOM 416 C CA . ARG 54 54 ? A -1.820 -15.383 -11.578 1.000 1 A ARG 84.190 1 +ATOM 417 C C . ARG 54 54 ? A -0.408 -14.930 -11.227 1.000 1 A ARG 84.190 1 +ATOM 418 O O . ARG 54 54 ? A 0.349 -15.664 -10.586 1.000 1 A ARG 84.190 1 +ATOM 419 C CB . ARG 54 54 ? A -2.834 -14.711 -10.648 1.000 1 A ARG 84.190 1 +ATOM 420 C CG . ARG 54 54 ? A -4.281 -14.930 -11.055 1.000 1 A ARG 84.190 1 +ATOM 421 C CD . ARG 54 54 ? A -5.195 -13.852 -10.484 1.000 1 A ARG 84.190 1 +ATOM 422 N NE . ARG 54 54 ? A -6.594 -14.094 -10.844 1.000 1 A ARG 84.190 1 +ATOM 423 C CZ . ARG 54 54 ? A -7.574 -13.211 -10.680 1.000 1 A ARG 84.190 1 +ATOM 424 N NH1 . ARG 54 54 ? A -7.332 -12.008 -10.164 1.000 1 A ARG 84.190 1 +ATOM 425 N NH2 . ARG 54 54 ? A -8.812 -13.531 -11.039 1.000 1 A ARG 84.190 1 +ATOM 426 N N . THR 55 55 ? A -0.102 -13.766 -11.750 1.000 1 A THR 82.940 1 +ATOM 427 C CA . THR 55 55 ? A 1.227 -13.219 -11.500 1.000 1 A THR 82.940 1 +ATOM 428 C C . THR 55 55 ? A 1.196 -12.242 -10.320 1.000 1 A THR 82.940 1 +ATOM 429 O O . THR 55 55 ? A 0.122 -11.852 -9.867 1.000 1 A THR 82.940 1 +ATOM 430 C CB . THR 55 55 ? A 1.784 -12.508 -12.742 1.000 1 A THR 82.940 1 +ATOM 431 O OG1 . THR 55 55 ? A 0.964 -11.375 -13.047 1.000 1 A THR 82.940 1 +ATOM 432 C CG2 . THR 55 55 ? A 1.802 -13.445 -13.945 1.000 1 A THR 82.940 1 +ATOM 433 N N . LEU 56 56 ? A 2.412 -11.977 -9.758 1.000 1 A LEU 82.690 1 +ATOM 434 C CA . LEU 56 56 ? A 2.521 -10.992 -8.680 1.000 1 A LEU 82.690 1 +ATOM 435 C C . LEU 56 56 ? A 1.945 -9.648 -9.117 1.000 1 A LEU 82.690 1 +ATOM 436 O O . LEU 56 56 ? A 1.292 -8.969 -8.320 1.000 1 A LEU 82.690 1 +ATOM 437 C CB . LEU 56 56 ? A 3.982 -10.828 -8.258 1.000 1 A LEU 82.690 1 +ATOM 438 C CG . LEU 56 56 ? A 4.617 -12.008 -7.531 1.000 1 A LEU 82.690 1 +ATOM 439 C CD1 . LEU 56 56 ? A 6.113 -11.781 -7.352 1.000 1 A LEU 82.690 1 +ATOM 440 C CD2 . LEU 56 56 ? A 3.941 -12.242 -6.184 1.000 1 A LEU 82.690 1 +ATOM 441 N N . SER 57 57 ? A 2.188 -9.367 -10.406 1.000 1 A SER 73.380 1 +ATOM 442 C CA . SER 57 57 ? A 1.684 -8.109 -10.938 1.000 1 A SER 73.380 1 +ATOM 443 C C . SER 57 57 ? A 0.159 -8.086 -10.953 1.000 1 A SER 73.380 1 +ATOM 444 O O . SER 57 57 ? A -0.452 -7.031 -10.766 1.000 1 A SER 73.380 1 +ATOM 445 C CB . SER 57 57 ? A 2.217 -7.871 -12.352 1.000 1 A SER 73.380 1 +ATOM 446 O OG . SER 57 57 ? A 1.901 -8.961 -13.203 1.000 1 A SER 73.380 1 +ATOM 447 N N . ASP 58 58 ? A -0.380 -9.242 -11.172 1.000 1 A ASP 71.880 1 +ATOM 448 C CA . ASP 58 58 ? A -1.838 -9.328 -11.172 1.000 1 A ASP 71.880 1 +ATOM 449 C C . ASP 58 58 ? A -2.410 -8.844 -9.836 1.000 1 A ASP 71.880 1 +ATOM 450 O O . ASP 58 58 ? A -3.514 -8.297 -9.797 1.000 1 A ASP 71.880 1 +ATOM 451 C CB . ASP 58 58 ? A -2.293 -10.766 -11.445 1.000 1 A ASP 71.880 1 +ATOM 452 C CG . ASP 58 58 ? A -2.139 -11.164 -12.906 1.000 1 A ASP 71.880 1 +ATOM 453 O OD1 . ASP 58 58 ? A -2.180 -10.273 -13.789 1.000 1 A ASP 71.880 1 +ATOM 454 O OD2 . ASP 58 58 ? A -1.975 -12.375 -13.172 1.000 1 A ASP 71.880 1 +ATOM 455 N N . TYR 59 59 ? A -1.726 -9.047 -8.727 1.000 1 A TYR 72.000 1 +ATOM 456 C CA . TYR 59 59 ? A -2.191 -8.695 -7.391 1.000 1 A TYR 72.000 1 +ATOM 457 C C . TYR 59 59 ? A -1.578 -7.375 -6.930 1.000 1 A TYR 72.000 1 +ATOM 458 O O . TYR 59 59 ? A -1.703 -7.000 -5.762 1.000 1 A TYR 72.000 1 +ATOM 459 C CB . TYR 59 59 ? A -1.851 -9.805 -6.391 1.000 1 A TYR 72.000 1 +ATOM 460 C CG . TYR 59 59 ? A -2.670 -11.055 -6.574 1.000 1 A TYR 72.000 1 +ATOM 461 C CD1 . TYR 59 59 ? A -4.027 -11.078 -6.262 1.000 1 A TYR 72.000 1 +ATOM 462 C CD2 . TYR 59 59 ? A -2.088 -12.227 -7.059 1.000 1 A TYR 72.000 1 +ATOM 463 C CE1 . TYR 59 59 ? A -4.785 -12.234 -6.430 1.000 1 A TYR 72.000 1 +ATOM 464 C CE2 . TYR 59 59 ? A -2.838 -13.383 -7.227 1.000 1 A TYR 72.000 1 +ATOM 465 C CZ . TYR 59 59 ? A -4.184 -13.375 -6.910 1.000 1 A TYR 72.000 1 +ATOM 466 O OH . TYR 59 59 ? A -4.930 -14.523 -7.078 1.000 1 A TYR 72.000 1 +ATOM 467 N N . ASN 60 60 ? A -0.880 -6.750 -7.871 1.000 1 A ASN 70.310 1 +ATOM 468 C CA . ASN 60 60 ? A -0.194 -5.496 -7.570 1.000 1 A ASN 70.310 1 +ATOM 469 C C . ASN 60 60 ? A 0.789 -5.660 -6.414 1.000 1 A ASN 70.310 1 +ATOM 470 O O . ASN 60 60 ? A 0.853 -4.812 -5.523 1.000 1 A ASN 70.310 1 +ATOM 471 C CB . ASN 60 60 ? A -1.206 -4.395 -7.258 1.000 1 A ASN 70.310 1 +ATOM 472 C CG . ASN 60 60 ? A -0.696 -3.014 -7.621 1.000 1 A ASN 70.310 1 +ATOM 473 O OD1 . ASN 60 60 ? A 0.269 -2.873 -8.375 1.000 1 A ASN 70.310 1 +ATOM 474 N ND2 . ASN 60 60 ? A -1.342 -1.983 -7.086 1.000 1 A ASN 70.310 1 +ATOM 475 N N . ILE 61 61 ? A 1.318 -6.879 -6.324 1.000 1 A ILE 72.380 1 +ATOM 476 C CA . ILE 61 61 ? A 2.369 -7.113 -5.344 1.000 1 A ILE 72.380 1 +ATOM 477 C C . ILE 61 61 ? A 3.654 -6.414 -5.781 1.000 1 A ILE 72.380 1 +ATOM 478 O O . ILE 61 61 ? A 4.148 -6.648 -6.887 1.000 1 A ILE 72.380 1 +ATOM 479 C CB . ILE 61 61 ? A 2.625 -8.625 -5.141 1.000 1 A ILE 72.380 1 +ATOM 480 C CG1 . ILE 61 61 ? A 1.410 -9.289 -4.480 1.000 1 A ILE 72.380 1 +ATOM 481 C CG2 . ILE 61 61 ? A 3.891 -8.852 -4.305 1.000 1 A ILE 72.380 1 +ATOM 482 C CD1 . ILE 61 61 ? A 1.444 -10.812 -4.512 1.000 1 A ILE 72.380 1 +ATOM 483 N N . GLN 62 62 ? A 4.070 -5.523 -4.934 1.000 1 A GLN 68.380 1 +ATOM 484 C CA . GLN 62 62 ? A 5.242 -4.715 -5.246 1.000 1 A GLN 68.380 1 +ATOM 485 C C . GLN 62 62 ? A 6.449 -5.152 -4.418 1.000 1 A GLN 68.380 1 +ATOM 486 O O . GLN 62 62 ? A 6.352 -6.090 -3.621 1.000 1 A GLN 68.380 1 +ATOM 487 C CB . GLN 62 62 ? A 4.953 -3.232 -5.012 1.000 1 A GLN 68.380 1 +ATOM 488 C CG . GLN 62 62 ? A 3.789 -2.695 -5.832 1.000 1 A GLN 68.380 1 +ATOM 489 C CD . GLN 62 62 ? A 3.855 -1.192 -6.031 1.000 1 A GLN 68.380 1 +ATOM 490 O OE1 . GLN 62 62 ? A 4.477 -0.474 -5.242 1.000 1 A GLN 68.380 1 +ATOM 491 N NE2 . GLN 62 62 ? A 3.217 -0.707 -7.090 1.000 1 A GLN 68.380 1 +ATOM 492 N N . LYS 63 63 ? A 7.520 -4.391 -4.730 1.000 1 A LYS 68.880 1 +ATOM 493 C CA . LYS 63 63 ? A 8.742 -4.668 -3.986 1.000 1 A LYS 68.880 1 +ATOM 494 C C . LYS 63 63 ? A 8.531 -4.469 -2.488 1.000 1 A LYS 68.880 1 +ATOM 495 O O . LYS 63 63 ? A 7.859 -3.525 -2.070 1.000 1 A LYS 68.880 1 +ATOM 496 C CB . LYS 63 63 ? A 9.883 -3.773 -4.477 1.000 1 A LYS 68.880 1 +ATOM 497 C CG . LYS 63 63 ? A 11.203 -4.004 -3.758 1.000 1 A LYS 68.880 1 +ATOM 498 C CD . LYS 63 63 ? A 12.281 -3.045 -4.246 1.000 1 A LYS 68.880 1 +ATOM 499 C CE . LYS 63 63 ? A 13.578 -3.205 -3.459 1.000 1 A LYS 68.880 1 +ATOM 500 N NZ . LYS 63 63 ? A 14.633 -2.260 -3.926 1.000 1 A LYS 68.880 1 +ATOM 501 N N . GLU 64 64 ? A 9.047 -5.414 -1.816 1.000 1 A GLU 66.190 1 +ATOM 502 C CA . GLU 64 64 ? A 9.062 -5.406 -0.357 1.000 1 A GLU 66.190 1 +ATOM 503 C C . GLU 64 64 ? A 7.664 -5.637 0.209 1.000 1 A GLU 66.190 1 +ATOM 504 O O . GLU 64 64 ? A 7.398 -5.316 1.370 1.000 1 A GLU 66.190 1 +ATOM 505 C CB . GLU 64 64 ? A 9.633 -4.090 0.171 1.000 1 A GLU 66.190 1 +ATOM 506 C CG . GLU 64 64 ? A 11.109 -3.889 -0.149 1.000 1 A GLU 66.190 1 +ATOM 507 C CD . GLU 64 64 ? A 11.680 -2.598 0.423 1.000 1 A GLU 66.190 1 +ATOM 508 O OE1 . GLU 64 64 ? A 12.922 -2.449 0.470 1.000 1 A GLU 66.190 1 +ATOM 509 O OE2 . GLU 64 64 ? A 10.875 -1.729 0.828 1.000 1 A GLU 66.190 1 +ATOM 510 N N . SER 65 65 ? A 6.707 -6.059 -0.716 1.000 1 A SER 70.120 1 +ATOM 511 C CA . SER 65 65 ? A 5.418 -6.516 -0.203 1.000 1 A SER 70.120 1 +ATOM 512 C C . SER 65 65 ? A 5.578 -7.754 0.670 1.000 1 A SER 70.120 1 +ATOM 513 O O . SER 65 65 ? A 6.543 -8.508 0.522 1.000 1 A SER 70.120 1 +ATOM 514 C CB . SER 65 65 ? A 4.457 -6.812 -1.354 1.000 1 A SER 70.120 1 +ATOM 515 O OG . SER 65 65 ? A 4.090 -5.617 -2.025 1.000 1 A SER 70.120 1 +ATOM 516 N N . THR 66 66 ? A 4.691 -7.828 1.617 1.000 1 A THR 74.250 1 +ATOM 517 C CA . THR 66 66 ? A 4.695 -9.000 2.482 1.000 1 A THR 74.250 1 +ATOM 518 C C . THR 66 66 ? A 3.502 -9.898 2.174 1.000 1 A THR 74.250 1 +ATOM 519 O O . THR 66 66 ? A 2.350 -9.469 2.262 1.000 1 A THR 74.250 1 +ATOM 520 C CB . THR 66 66 ? A 4.664 -8.602 3.969 1.000 1 A THR 74.250 1 +ATOM 521 O OG1 . THR 66 66 ? A 5.770 -7.730 4.246 1.000 1 A THR 74.250 1 +ATOM 522 C CG2 . THR 66 66 ? A 4.758 -9.828 4.871 1.000 1 A THR 74.250 1 +ATOM 523 N N . LEU 67 67 ? A 3.859 -11.094 1.738 1.000 1 A LEU 81.000 1 +ATOM 524 C CA . LEU 67 67 ? A 2.846 -12.125 1.564 1.000 1 A LEU 81.000 1 +ATOM 525 C C . LEU 67 67 ? A 2.756 -13.016 2.801 1.000 1 A LEU 81.000 1 +ATOM 526 O O . LEU 67 67 ? A 3.734 -13.164 3.537 1.000 1 A LEU 81.000 1 +ATOM 527 C CB . LEU 67 67 ? A 3.152 -12.977 0.330 1.000 1 A LEU 81.000 1 +ATOM 528 C CG . LEU 67 67 ? A 3.131 -12.258 -1.017 1.000 1 A LEU 81.000 1 +ATOM 529 C CD1 . LEU 67 67 ? A 4.293 -12.727 -1.890 1.000 1 A LEU 81.000 1 +ATOM 530 C CD2 . LEU 67 67 ? A 1.799 -12.477 -1.724 1.000 1 A LEU 81.000 1 +ATOM 531 N N . HIS 68 68 ? A 1.559 -13.477 3.090 1.000 1 A HIS 78.560 1 +ATOM 532 C CA . HIS 68 68 ? A 1.381 -14.414 4.191 1.000 1 A HIS 78.560 1 +ATOM 533 C C . HIS 68 68 ? A 1.261 -15.844 3.676 1.000 1 A HIS 78.560 1 +ATOM 534 O O . HIS 68 68 ? A 0.499 -16.109 2.744 1.000 1 A HIS 78.560 1 +ATOM 535 C CB . HIS 68 68 ? A 0.146 -14.047 5.016 1.000 1 A HIS 78.560 1 +ATOM 536 C CG . HIS 68 68 ? A 0.254 -12.719 5.699 1.000 1 A HIS 78.560 1 +ATOM 537 N ND1 . HIS 68 68 ? A 0.853 -12.570 6.930 1.000 1 A HIS 78.560 1 +ATOM 538 C CD2 . HIS 68 68 ? A -0.160 -11.492 5.320 1.000 1 A HIS 78.560 1 +ATOM 539 C CE1 . HIS 68 68 ? A 0.802 -11.289 7.281 1.000 1 A HIS 78.560 1 +ATOM 540 N NE2 . HIS 68 68 ? A 0.193 -10.617 6.320 1.000 1 A HIS 78.560 1 +ATOM 541 N N . LEU 69 69 ? A 2.053 -16.672 4.266 1.000 1 A LEU 85.120 1 +ATOM 542 C CA . LEU 69 69 ? A 2.039 -18.094 3.918 1.000 1 A LEU 85.120 1 +ATOM 543 C C . LEU 69 69 ? A 1.271 -18.891 4.961 1.000 1 A LEU 85.120 1 +ATOM 544 O O . LEU 69 69 ? A 1.605 -18.859 6.148 1.000 1 A LEU 85.120 1 +ATOM 545 C CB . LEU 69 69 ? A 3.467 -18.625 3.791 1.000 1 A LEU 85.120 1 +ATOM 546 C CG . LEU 69 69 ? A 3.607 -20.109 3.465 1.000 1 A LEU 85.120 1 +ATOM 547 C CD1 . LEU 69 69 ? A 3.053 -20.406 2.076 1.000 1 A LEU 85.120 1 +ATOM 548 C CD2 . LEU 69 69 ? A 5.066 -20.547 3.566 1.000 1 A LEU 85.120 1 +ATOM 549 N N . VAL 70 70 ? A 0.207 -19.547 4.457 1.000 1 A VAL 79.620 1 +ATOM 550 C CA . VAL 70 70 ? A -0.541 -20.469 5.293 1.000 1 A VAL 79.620 1 +ATOM 551 C C . VAL 70 70 ? A -0.403 -21.891 4.742 1.000 1 A VAL 79.620 1 +ATOM 552 O O . VAL 70 70 ? A -0.612 -22.125 3.549 1.000 1 A VAL 79.620 1 +ATOM 553 C CB . VAL 70 70 ? A -2.031 -20.078 5.391 1.000 1 A VAL 79.620 1 +ATOM 554 C CG1 . VAL 70 70 ? A -2.797 -21.078 6.266 1.000 1 A VAL 79.620 1 +ATOM 555 C CG2 . VAL 70 70 ? A -2.176 -18.656 5.930 1.000 1 A VAL 79.620 1 +ATOM 556 N N . LEU 71 71 ? A -0.002 -22.719 5.676 1.000 1 A LEU 79.940 1 +ATOM 557 C CA . LEU 71 71 ? A 0.196 -24.109 5.250 1.000 1 A LEU 79.940 1 +ATOM 558 C C . LEU 71 71 ? A -1.105 -24.891 5.340 1.000 1 A LEU 79.940 1 +ATOM 559 O O . LEU 71 71 ? A -1.881 -24.719 6.285 1.000 1 A LEU 79.940 1 +ATOM 560 C CB . LEU 71 71 ? A 1.275 -24.781 6.102 1.000 1 A LEU 79.940 1 +ATOM 561 C CG . LEU 71 71 ? A 2.670 -24.156 6.051 1.000 1 A LEU 79.940 1 +ATOM 562 C CD1 . LEU 71 71 ? A 3.578 -24.797 7.094 1.000 1 A LEU 79.940 1 +ATOM 563 C CD2 . LEU 71 71 ? A 3.268 -24.281 4.656 1.000 1 A LEU 79.940 1 +ATOM 564 N N . ARG 72 72 ? A -1.445 -25.484 4.273 1.000 1 A ARG 67.880 1 +ATOM 565 C CA . ARG 72 72 ? A -2.568 -26.422 4.309 1.000 1 A ARG 67.880 1 +ATOM 566 C C . ARG 72 72 ? A -2.133 -27.781 4.840 1.000 1 A ARG 67.880 1 +ATOM 567 O O . ARG 72 72 ? A -1.101 -28.312 4.422 1.000 1 A ARG 67.880 1 +ATOM 568 C CB . ARG 72 72 ? A -3.188 -26.562 2.918 1.000 1 A ARG 67.880 1 +ATOM 569 C CG . ARG 72 72 ? A -4.680 -26.859 2.938 1.000 1 A ARG 67.880 1 +ATOM 570 C CD . ARG 72 72 ? A -5.258 -26.938 1.531 1.000 1 A ARG 67.880 1 +ATOM 571 N NE . ARG 72 72 ? A -6.680 -27.281 1.550 1.000 1 A ARG 67.880 1 +ATOM 572 C CZ . ARG 72 72 ? A -7.395 -27.594 0.478 1.000 1 A ARG 67.880 1 +ATOM 573 N NH1 . ARG 72 72 ? A -6.836 -27.641 -0.725 1.000 1 A ARG 67.880 1 +ATOM 574 N NH2 . ARG 72 72 ? A -8.680 -27.891 0.609 1.000 1 A ARG 67.880 1 +ATOM 575 N N . LEU 73 73 ? A -2.594 -28.141 5.977 1.000 1 A LEU 55.970 1 +ATOM 576 C CA . LEU 73 73 ? A -2.318 -29.484 6.461 1.000 1 A LEU 55.970 1 +ATOM 577 C C . LEU 73 73 ? A -2.932 -30.531 5.535 1.000 1 A LEU 55.970 1 +ATOM 578 O O . LEU 73 73 ? A -4.027 -30.344 5.008 1.000 1 A LEU 55.970 1 +ATOM 579 C CB . LEU 73 73 ? A -2.859 -29.672 7.883 1.000 1 A LEU 55.970 1 +ATOM 580 C CG . LEU 73 73 ? A -2.137 -28.891 8.984 1.000 1 A LEU 55.970 1 +ATOM 581 C CD1 . LEU 73 73 ? A -3.000 -28.828 10.242 1.000 1 A LEU 55.970 1 +ATOM 582 C CD2 . LEU 73 73 ? A -0.786 -29.531 9.289 1.000 1 A LEU 55.970 1 +ATOM 583 N N . ARG 74 74 ? A -2.180 -31.234 4.695 1.000 1 A ARG 47.720 1 +ATOM 584 C CA . ARG 74 74 ? A -2.740 -32.438 4.062 1.000 1 A ARG 47.720 1 +ATOM 585 C C . ARG 74 74 ? A -3.514 -33.281 5.066 1.000 1 A ARG 47.720 1 +ATOM 586 O O . ARG 74 74 ? A -2.953 -33.719 6.070 1.000 1 A ARG 47.720 1 +ATOM 587 C CB . ARG 74 74 ? A -1.633 -33.250 3.416 1.000 1 A ARG 47.720 1 +ATOM 588 C CG . ARG 74 74 ? A -1.240 -32.781 2.023 1.000 1 A ARG 47.720 1 +ATOM 589 C CD . ARG 74 74 ? A -0.197 -33.719 1.395 1.000 1 A ARG 47.720 1 +ATOM 590 N NE . ARG 74 74 ? A 0.157 -33.281 0.046 1.000 1 A ARG 47.720 1 +ATOM 591 C CZ . ARG 74 74 ? A 0.994 -33.906 -0.763 1.000 1 A ARG 47.720 1 +ATOM 592 N NH1 . ARG 74 74 ? A 1.582 -35.031 -0.373 1.000 1 A ARG 47.720 1 +ATOM 593 N NH2 . ARG 74 74 ? A 1.245 -33.438 -1.971 1.000 1 A ARG 47.720 1 +ATOM 594 N N . GLY 75 75 ? A -4.605 -33.000 5.594 1.000 1 A GLY 39.810 1 +ATOM 595 C CA . GLY 75 75 ? A -5.195 -34.156 6.273 1.000 1 A GLY 39.810 1 +ATOM 596 C C . GLY 75 75 ? A -5.000 -35.469 5.527 1.000 1 A GLY 39.810 1 +ATOM 597 O O . GLY 75 75 ? A -4.836 -35.469 4.305 1.000 1 A GLY 39.810 1 +ATOM 598 N N . GLY 76 76 ? A -4.582 -36.656 6.297 1.000 1 A GLY 31.770 1 +ATOM 599 C CA . GLY 76 76 ? A -4.969 -38.000 5.949 1.000 1 A GLY 31.770 1 +ATOM 600 C C . GLY 76 76 ? A -6.383 -38.125 5.414 1.000 1 A GLY 31.770 1 +ATOM 601 O O . GLY 76 76 ? A -7.227 -37.250 5.707 1.000 1 A GLY 31.770 1 +ATOM 602 N N . MET 1 1 ? B 13.938 18.703 17.719 1.000 2 B MET 64.120 1 +ATOM 603 C CA . MET 1 1 ? B 12.859 19.250 16.906 1.000 2 B MET 64.120 1 +ATOM 604 C C . MET 1 1 ? B 11.500 18.875 17.469 1.000 2 B MET 64.120 1 +ATOM 605 O O . MET 1 1 ? B 11.352 17.797 18.078 1.000 2 B MET 64.120 1 +ATOM 606 C CB . MET 1 1 ? B 12.977 18.766 15.461 1.000 2 B MET 64.120 1 +ATOM 607 C CG . MET 1 1 ? B 14.008 19.516 14.633 1.000 2 B MET 64.120 1 +ATOM 608 S SD . MET 1 1 ? B 14.094 18.938 12.898 1.000 2 B MET 64.120 1 +ATOM 609 C CE . MET 1 1 ? B 15.250 20.156 12.211 1.000 2 B MET 64.120 1 +ATOM 610 N N . PRO 2 2 ? B 10.672 19.969 17.688 1.000 2 B PRO 82.810 1 +ATOM 611 C CA . PRO 2 2 ? B 9.367 19.609 18.250 1.000 2 B PRO 82.810 1 +ATOM 612 C C . PRO 2 2 ? B 8.711 18.438 17.531 1.000 2 B PRO 82.810 1 +ATOM 613 O O . PRO 2 2 ? B 8.891 18.266 16.328 1.000 2 B PRO 82.810 1 +ATOM 614 C CB . PRO 2 2 ? B 8.547 20.891 18.062 1.000 2 B PRO 82.810 1 +ATOM 615 C CG . PRO 2 2 ? B 9.289 21.672 17.031 1.000 2 B PRO 82.810 1 +ATOM 616 C CD . PRO 2 2 ? B 10.688 21.156 16.953 1.000 2 B PRO 82.810 1 +ATOM 617 N N . SER 3 3 ? B 8.320 17.500 18.297 1.000 2 B SER 87.810 1 +ATOM 618 C CA . SER 3 3 ? B 7.652 16.312 17.766 1.000 2 B SER 87.810 1 +ATOM 619 C C . SER 3 3 ? B 6.191 16.250 18.203 1.000 2 B SER 87.810 1 +ATOM 620 O O . SER 3 3 ? B 5.805 16.891 19.172 1.000 2 B SER 87.810 1 +ATOM 621 C CB . SER 3 3 ? B 8.375 15.039 18.219 1.000 2 B SER 87.810 1 +ATOM 622 O OG . SER 3 3 ? B 8.516 15.016 19.625 1.000 2 B SER 87.810 1 +ATOM 623 N N . ILE 4 4 ? B 5.441 15.680 17.422 1.000 2 B ILE 88.120 1 +ATOM 624 C CA . ILE 4 4 ? B 4.035 15.445 17.734 1.000 2 B ILE 88.120 1 +ATOM 625 C C . ILE 4 4 ? B 3.736 13.953 17.672 1.000 2 B ILE 88.120 1 +ATOM 626 O O . ILE 4 4 ? B 4.445 13.195 17.000 1.000 2 B ILE 88.120 1 +ATOM 627 C CB . ILE 4 4 ? B 3.109 16.219 16.766 1.000 2 B ILE 88.120 1 +ATOM 628 C CG1 . ILE 4 4 ? B 3.242 15.664 15.344 1.000 2 B ILE 88.120 1 +ATOM 629 C CG2 . ILE 4 4 ? B 3.418 17.719 16.797 1.000 2 B ILE 88.120 1 +ATOM 630 C CD1 . ILE 4 4 ? B 2.152 16.125 14.391 1.000 2 B ILE 88.120 1 +ATOM 631 N N . LYS 5 5 ? B 2.742 13.617 18.453 1.000 2 B LYS 90.500 1 +ATOM 632 C CA . LYS 5 5 ? B 2.326 12.211 18.484 1.000 2 B LYS 90.500 1 +ATOM 633 C C . LYS 5 5 ? B 1.053 12.000 17.672 1.000 2 B LYS 90.500 1 +ATOM 634 O O . LYS 5 5 ? B 0.047 12.680 17.906 1.000 2 B LYS 90.500 1 +ATOM 635 C CB . LYS 5 5 ? B 2.115 11.742 19.922 1.000 2 B LYS 90.500 1 +ATOM 636 C CG . LYS 5 5 ? B 3.387 11.719 20.766 1.000 2 B LYS 90.500 1 +ATOM 637 C CD . LYS 5 5 ? B 3.113 11.227 22.172 1.000 2 B LYS 90.500 1 +ATOM 638 C CE . LYS 5 5 ? B 4.355 11.328 23.062 1.000 2 B LYS 90.500 1 +ATOM 639 N NZ . LYS 5 5 ? B 4.070 10.930 24.469 1.000 2 B LYS 90.500 1 +ATOM 640 N N . LEU 6 6 ? B 1.191 11.023 16.734 1.000 2 B LEU 91.440 1 +ATOM 641 C CA . LEU 6 6 ? B 0.038 10.648 15.930 1.000 2 B LEU 91.440 1 +ATOM 642 C C . LEU 6 6 ? B -0.444 9.250 16.297 1.000 2 B LEU 91.440 1 +ATOM 643 O O . LEU 6 6 ? B 0.363 8.328 16.438 1.000 2 B LEU 91.440 1 +ATOM 644 C CB . LEU 6 6 ? B 0.383 10.711 14.445 1.000 2 B LEU 91.440 1 +ATOM 645 C CG . LEU 6 6 ? B 0.949 12.039 13.938 1.000 2 B LEU 91.440 1 +ATOM 646 C CD1 . LEU 6 6 ? B 1.329 11.922 12.461 1.000 2 B LEU 91.440 1 +ATOM 647 C CD2 . LEU 6 6 ? B -0.057 13.164 14.148 1.000 2 B LEU 91.440 1 +ATOM 648 N N . GLN 7 7 ? B -1.772 9.164 16.438 1.000 2 B GLN 89.560 1 +ATOM 649 C CA . GLN 7 7 ? B -2.348 7.871 16.781 1.000 2 B GLN 89.560 1 +ATOM 650 C C . GLN 7 7 ? B -3.180 7.309 15.633 1.000 2 B GLN 89.560 1 +ATOM 651 O O . GLN 7 7 ? B -4.125 7.949 15.172 1.000 2 B GLN 89.560 1 +ATOM 652 C CB . GLN 7 7 ? B -3.207 7.984 18.047 1.000 2 B GLN 89.560 1 +ATOM 653 C CG . GLN 7 7 ? B -3.729 6.652 18.547 1.000 2 B GLN 89.560 1 +ATOM 654 C CD . GLN 7 7 ? B -4.531 6.789 19.828 1.000 2 B GLN 89.560 1 +ATOM 655 O OE1 . GLN 7 7 ? B -5.277 7.758 20.016 1.000 2 B GLN 89.560 1 +ATOM 656 N NE2 . GLN 7 7 ? B -4.387 5.820 20.734 1.000 2 B GLN 89.560 1 +ATOM 657 N N . SER 8 8 ? B -2.766 6.109 15.188 1.000 2 B SER 88.060 1 +ATOM 658 C CA . SER 8 8 ? B -3.490 5.453 14.102 1.000 2 B SER 88.060 1 +ATOM 659 C C . SER 8 8 ? B -4.863 4.977 14.562 1.000 2 B SER 88.060 1 +ATOM 660 O O . SER 8 8 ? B -5.164 5.000 15.758 1.000 2 B SER 88.060 1 +ATOM 661 C CB . SER 8 8 ? B -2.688 4.273 13.555 1.000 2 B SER 88.060 1 +ATOM 662 O OG . SER 8 8 ? B -2.531 3.271 14.547 1.000 2 B SER 88.060 1 +ATOM 663 N N . SER 9 9 ? B -5.699 4.559 13.555 1.000 2 B SER 83.120 1 +ATOM 664 C CA . SER 9 9 ? B -7.051 4.082 13.836 1.000 2 B SER 83.120 1 +ATOM 665 C C . SER 9 9 ? B -7.023 2.824 14.695 1.000 2 B SER 83.120 1 +ATOM 666 O O . SER 9 9 ? B -7.961 2.566 15.461 1.000 2 B SER 83.120 1 +ATOM 667 C CB . SER 9 9 ? B -7.805 3.807 12.539 1.000 2 B SER 83.120 1 +ATOM 668 O OG . SER 9 9 ? B -7.113 2.855 11.742 1.000 2 B SER 83.120 1 +ATOM 669 N N . ASP 10 10 ? B -5.934 2.092 14.555 1.000 2 B ASP 81.620 1 +ATOM 670 C CA . ASP 10 10 ? B -5.809 0.855 15.320 1.000 2 B ASP 81.620 1 +ATOM 671 C C . ASP 10 10 ? B -5.113 1.105 16.656 1.000 2 B ASP 81.620 1 +ATOM 672 O O . ASP 10 10 ? B -4.793 0.161 17.375 1.000 2 B ASP 81.620 1 +ATOM 673 C CB . ASP 10 10 ? B -5.039 -0.198 14.516 1.000 2 B ASP 81.620 1 +ATOM 674 C CG . ASP 10 10 ? B -3.768 0.346 13.891 1.000 2 B ASP 81.620 1 +ATOM 675 O OD1 . ASP 10 10 ? B -3.572 1.580 13.883 1.000 2 B ASP 81.620 1 +ATOM 676 O OD2 . ASP 10 10 ? B -2.953 -0.468 13.406 1.000 2 B ASP 81.620 1 +ATOM 677 N N . GLY 11 11 ? B -4.750 2.320 17.047 1.000 2 B GLY 85.500 1 +ATOM 678 C CA . GLY 11 11 ? B -4.297 2.688 18.375 1.000 2 B GLY 85.500 1 +ATOM 679 C C . GLY 11 11 ? B -2.789 2.783 18.484 1.000 2 B GLY 85.500 1 +ATOM 680 O O . GLY 11 11 ? B -2.252 2.910 19.594 1.000 2 B GLY 85.500 1 +ATOM 681 N N . GLU 12 12 ? B -2.086 2.678 17.422 1.000 2 B GLU 88.380 1 +ATOM 682 C CA . GLU 12 12 ? B -0.634 2.816 17.453 1.000 2 B GLU 88.380 1 +ATOM 683 C C . GLU 12 12 ? B -0.218 4.285 17.422 1.000 2 B GLU 88.380 1 +ATOM 684 O O . GLU 12 12 ? B -0.803 5.082 16.688 1.000 2 B GLU 88.380 1 +ATOM 685 C CB . GLU 12 12 ? B 0.010 2.055 16.297 1.000 2 B GLU 88.380 1 +ATOM 686 C CG . GLU 12 12 ? B 1.492 1.771 16.500 1.000 2 B GLU 88.380 1 +ATOM 687 C CD . GLU 12 12 ? B 2.027 0.700 15.555 1.000 2 B GLU 88.380 1 +ATOM 688 O OE1 . GLU 12 12 ? B 3.266 0.545 15.453 1.000 2 B GLU 88.380 1 +ATOM 689 O OE2 . GLU 12 12 ? B 1.201 0.009 14.922 1.000 2 B GLU 88.380 1 +ATOM 690 N N . ILE 13 13 ? B 0.771 4.562 18.344 1.000 2 B ILE 91.310 1 +ATOM 691 C CA . ILE 13 13 ? B 1.227 5.945 18.453 1.000 2 B ILE 91.310 1 +ATOM 692 C C . ILE 13 13 ? B 2.566 6.105 17.734 1.000 2 B ILE 91.310 1 +ATOM 693 O O . ILE 13 13 ? B 3.475 5.293 17.922 1.000 2 B ILE 91.310 1 +ATOM 694 C CB . ILE 13 13 ? B 1.351 6.383 19.922 1.000 2 B ILE 91.310 1 +ATOM 695 C CG1 . ILE 13 13 ? B -0.006 6.270 20.641 1.000 2 B ILE 91.310 1 +ATOM 696 C CG2 . ILE 13 13 ? B 1.903 7.809 20.031 1.000 2 B ILE 91.310 1 +ATOM 697 C CD1 . ILE 13 13 ? B 0.007 6.746 22.078 1.000 2 B ILE 91.310 1 +ATOM 698 N N . PHE 14 14 ? B 2.590 7.055 16.859 1.000 2 B PHE 91.060 1 +ATOM 699 C CA . PHE 14 14 ? B 3.814 7.375 16.141 1.000 2 B PHE 91.060 1 +ATOM 700 C C . PHE 14 14 ? B 4.312 8.766 16.500 1.000 2 B PHE 91.060 1 +ATOM 701 O O . PHE 14 14 ? B 3.551 9.734 16.469 1.000 2 B PHE 91.060 1 +ATOM 702 C CB . PHE 14 14 ? B 3.586 7.281 14.625 1.000 2 B PHE 91.060 1 +ATOM 703 C CG . PHE 14 14 ? B 3.217 5.898 14.156 1.000 2 B PHE 91.060 1 +ATOM 704 C CD1 . PHE 14 14 ? B 4.191 5.027 13.688 1.000 2 B PHE 91.060 1 +ATOM 705 C CD2 . PHE 14 14 ? B 1.895 5.473 14.180 1.000 2 B PHE 91.060 1 +ATOM 706 C CE1 . PHE 14 14 ? B 3.854 3.748 13.242 1.000 2 B PHE 91.060 1 +ATOM 707 C CE2 . PHE 14 14 ? B 1.550 4.195 13.742 1.000 2 B PHE 91.060 1 +ATOM 708 C CZ . PHE 14 14 ? B 2.529 3.336 13.273 1.000 2 B PHE 91.060 1 +ATOM 709 N N . GLU 15 15 ? B 5.551 8.867 16.969 1.000 2 B GLU 90.690 1 +ATOM 710 C CA . GLU 15 15 ? B 6.184 10.156 17.219 1.000 2 B GLU 90.690 1 +ATOM 711 C C . GLU 15 15 ? B 6.863 10.695 15.961 1.000 2 B GLU 90.690 1 +ATOM 712 O O . GLU 15 15 ? B 7.797 10.086 15.445 1.000 2 B GLU 90.690 1 +ATOM 713 C CB . GLU 15 15 ? B 7.199 10.039 18.359 1.000 2 B GLU 90.690 1 +ATOM 714 C CG . GLU 15 15 ? B 7.672 11.383 18.906 1.000 2 B GLU 90.690 1 +ATOM 715 C CD . GLU 15 15 ? B 8.594 11.258 20.094 1.000 2 B GLU 90.690 1 +ATOM 716 O OE1 . GLU 15 15 ? B 9.047 12.297 20.641 1.000 2 B GLU 90.690 1 +ATOM 717 O OE2 . GLU 15 15 ? B 8.875 10.102 20.500 1.000 2 B GLU 90.690 1 +ATOM 718 N N . VAL 16 16 ? B 6.367 11.781 15.500 1.000 2 B VAL 90.750 1 +ATOM 719 C CA . VAL 16 16 ? B 6.836 12.344 14.234 1.000 2 B VAL 90.750 1 +ATOM 720 C C . VAL 16 16 ? B 7.281 13.789 14.445 1.000 2 B VAL 90.750 1 +ATOM 721 O O . VAL 16 16 ? B 6.738 14.500 15.297 1.000 2 B VAL 90.750 1 +ATOM 722 C CB . VAL 16 16 ? B 5.738 12.281 13.148 1.000 2 B VAL 90.750 1 +ATOM 723 C CG1 . VAL 16 16 ? B 6.309 12.680 11.781 1.000 2 B VAL 90.750 1 +ATOM 724 C CG2 . VAL 16 16 ? B 5.129 10.883 13.086 1.000 2 B VAL 90.750 1 +ATOM 725 N N . ASP 17 17 ? B 8.328 14.203 13.695 1.000 2 B ASP 90.380 1 +ATOM 726 C CA . ASP 17 17 ? B 8.758 15.602 13.711 1.000 2 B ASP 90.380 1 +ATOM 727 C C . ASP 17 17 ? B 7.656 16.516 13.195 1.000 2 B ASP 90.380 1 +ATOM 728 O O . ASP 17 17 ? B 6.996 16.219 12.203 1.000 2 B ASP 90.380 1 +ATOM 729 C CB . ASP 17 17 ? B 10.023 15.781 12.875 1.000 2 B ASP 90.380 1 +ATOM 730 C CG . ASP 17 17 ? B 10.758 17.078 13.172 1.000 2 B ASP 90.380 1 +ATOM 731 O OD1 . ASP 17 17 ? B 10.094 18.125 13.297 1.000 2 B ASP 90.380 1 +ATOM 732 O OD2 . ASP 17 17 ? B 12.000 17.047 13.273 1.000 2 B ASP 90.380 1 +ATOM 733 N N . VAL 18 18 ? B 7.516 17.641 13.891 1.000 2 B VAL 88.190 1 +ATOM 734 C CA . VAL 18 18 ? B 6.469 18.594 13.555 1.000 2 B VAL 88.190 1 +ATOM 735 C C . VAL 18 18 ? B 6.652 19.062 12.109 1.000 2 B VAL 88.190 1 +ATOM 736 O O . VAL 18 18 ? B 5.676 19.234 11.375 1.000 2 B VAL 88.190 1 +ATOM 737 C CB . VAL 18 18 ? B 6.461 19.797 14.516 1.000 2 B VAL 88.190 1 +ATOM 738 C CG1 . VAL 18 18 ? B 5.520 20.891 14.008 1.000 2 B VAL 88.190 1 +ATOM 739 C CG2 . VAL 18 18 ? B 6.070 19.359 15.922 1.000 2 B VAL 88.190 1 +ATOM 740 N N . GLU 19 19 ? B 7.891 19.281 11.695 1.000 2 B GLU 88.380 1 +ATOM 741 C CA . GLU 19 19 ? B 8.164 19.734 10.336 1.000 2 B GLU 88.380 1 +ATOM 742 C C . GLU 19 19 ? B 7.680 18.719 9.305 1.000 2 B GLU 88.380 1 +ATOM 743 O O . GLU 19 19 ? B 7.188 19.094 8.242 1.000 2 B GLU 88.380 1 +ATOM 744 C CB . GLU 19 19 ? B 9.664 19.984 10.148 1.000 2 B GLU 88.380 1 +ATOM 745 C CG . GLU 19 19 ? B 10.188 21.172 10.945 1.000 2 B GLU 88.380 1 +ATOM 746 C CD . GLU 19 19 ? B 11.609 21.578 10.570 1.000 2 B GLU 88.380 1 +ATOM 747 O OE1 . GLU 19 19 ? B 11.953 22.766 10.680 1.000 2 B GLU 88.380 1 +ATOM 748 O OE2 . GLU 19 19 ? B 12.383 20.688 10.156 1.000 2 B GLU 88.380 1 +ATOM 749 N N . ILE 20 20 ? B 7.789 17.422 9.648 1.000 2 B ILE 91.380 1 +ATOM 750 C CA . ILE 20 20 ? B 7.348 16.344 8.781 1.000 2 B ILE 91.380 1 +ATOM 751 C C . ILE 20 20 ? B 5.820 16.266 8.797 1.000 2 B ILE 91.380 1 +ATOM 752 O O . ILE 20 20 ? B 5.188 16.125 7.746 1.000 2 B ILE 91.380 1 +ATOM 753 C CB . ILE 20 20 ? B 7.965 14.992 9.195 1.000 2 B ILE 91.380 1 +ATOM 754 C CG1 . ILE 20 20 ? B 9.492 15.039 9.047 1.000 2 B ILE 91.380 1 +ATOM 755 C CG2 . ILE 20 20 ? B 7.367 13.852 8.375 1.000 2 B ILE 91.380 1 +ATOM 756 C CD1 . ILE 20 20 ? B 10.195 13.789 9.562 1.000 2 B ILE 91.380 1 +ATOM 757 N N . ALA 21 21 ? B 5.250 16.438 9.969 1.000 2 B ALA 89.750 1 +ATOM 758 C CA . ALA 21 21 ? B 3.803 16.344 10.133 1.000 2 B ALA 89.750 1 +ATOM 759 C C . ALA 21 21 ? B 3.096 17.500 9.422 1.000 2 B ALA 89.750 1 +ATOM 760 O O . ALA 21 21 ? B 1.956 17.359 8.969 1.000 2 B ALA 89.750 1 +ATOM 761 C CB . ALA 21 21 ? B 3.434 16.328 11.609 1.000 2 B ALA 89.750 1 +ATOM 762 N N . LYS 22 22 ? B 3.754 18.594 9.219 1.000 2 B LYS 88.750 1 +ATOM 763 C CA . LYS 22 22 ? B 3.156 19.781 8.625 1.000 2 B LYS 88.750 1 +ATOM 764 C C . LYS 22 22 ? B 3.006 19.641 7.117 1.000 2 B LYS 88.750 1 +ATOM 765 O O . LYS 22 22 ? B 2.412 20.500 6.461 1.000 2 B LYS 88.750 1 +ATOM 766 C CB . LYS 22 22 ? B 3.998 21.016 8.953 1.000 2 B LYS 88.750 1 +ATOM 767 C CG . LYS 22 22 ? B 3.768 21.578 10.352 1.000 2 B LYS 88.750 1 +ATOM 768 C CD . LYS 22 22 ? B 4.629 22.812 10.609 1.000 2 B LYS 88.750 1 +ATOM 769 C CE . LYS 22 22 ? B 4.332 23.422 11.969 1.000 2 B LYS 88.750 1 +ATOM 770 N NZ . LYS 22 22 ? B 5.188 24.609 12.242 1.000 2 B LYS 88.750 1 +ATOM 771 N N . GLN 23 23 ? B 3.625 18.547 6.637 1.000 2 B GLN 90.690 1 +ATOM 772 C CA . GLN 23 23 ? B 3.377 18.266 5.230 1.000 2 B GLN 90.690 1 +ATOM 773 C C . GLN 23 23 ? B 1.909 17.922 4.988 1.000 2 B GLN 90.690 1 +ATOM 774 O O . GLN 23 23 ? B 1.417 18.031 3.863 1.000 2 B GLN 90.690 1 +ATOM 775 C CB . GLN 23 23 ? B 4.270 17.109 4.750 1.000 2 B GLN 90.690 1 +ATOM 776 C CG . GLN 23 23 ? B 5.746 17.469 4.672 1.000 2 B GLN 90.690 1 +ATOM 777 C CD . GLN 23 23 ? B 6.590 16.375 4.062 1.000 2 B GLN 90.690 1 +ATOM 778 O OE1 . GLN 23 23 ? B 6.281 15.859 2.980 1.000 2 B GLN 90.690 1 +ATOM 779 N NE2 . GLN 23 23 ? B 7.656 15.984 4.754 1.000 2 B GLN 90.690 1 +ATOM 780 N N . SER 24 24 ? B 1.292 17.500 6.055 1.000 2 B SER 91.380 1 +ATOM 781 C CA . SER 24 24 ? B -0.161 17.375 6.027 1.000 2 B SER 91.380 1 +ATOM 782 C C . SER 24 24 ? B -0.846 18.703 6.332 1.000 2 B SER 91.380 1 +ATOM 783 O O . SER 24 24 ? B -0.670 19.266 7.414 1.000 2 B SER 91.380 1 +ATOM 784 C CB . SER 24 24 ? B -0.622 16.312 7.027 1.000 2 B SER 91.380 1 +ATOM 785 O OG . SER 24 24 ? B -2.020 16.391 7.242 1.000 2 B SER 91.380 1 +ATOM 786 N N . VAL 25 25 ? B -1.625 19.172 5.406 1.000 2 B VAL 90.560 1 +ATOM 787 C CA . VAL 25 25 ? B -2.311 20.453 5.613 1.000 2 B VAL 90.560 1 +ATOM 788 C C . VAL 25 25 ? B -3.312 20.312 6.758 1.000 2 B VAL 90.560 1 +ATOM 789 O O . VAL 25 25 ? B -3.496 21.250 7.539 1.000 2 B VAL 90.560 1 +ATOM 790 C CB . VAL 25 25 ? B -3.027 20.922 4.328 1.000 2 B VAL 90.560 1 +ATOM 791 C CG1 . VAL 25 25 ? B -3.814 22.203 4.590 1.000 2 B VAL 90.560 1 +ATOM 792 C CG2 . VAL 25 25 ? B -2.020 21.125 3.199 1.000 2 B VAL 90.560 1 +ATOM 793 N N . THR 26 26 ? B -3.883 19.141 6.852 1.000 2 B THR 89.810 1 +ATOM 794 C CA . THR 26 26 ? B -4.840 18.891 7.922 1.000 2 B THR 89.810 1 +ATOM 795 C C . THR 26 26 ? B -4.172 19.031 9.289 1.000 2 B THR 89.810 1 +ATOM 796 O O . THR 26 26 ? B -4.684 19.734 10.164 1.000 2 B THR 89.810 1 +ATOM 797 C CB . THR 26 26 ? B -5.453 17.484 7.797 1.000 2 B THR 89.810 1 +ATOM 798 O OG1 . THR 26 26 ? B -6.090 17.359 6.516 1.000 2 B THR 89.810 1 +ATOM 799 C CG2 . THR 26 26 ? B -6.484 17.234 8.891 1.000 2 B THR 89.810 1 +ATOM 800 N N . ILE 27 27 ? B -3.020 18.406 9.469 1.000 2 B ILE 89.060 1 +ATOM 801 C CA . ILE 27 27 ? B -2.316 18.453 10.750 1.000 2 B ILE 89.060 1 +ATOM 802 C C . ILE 27 27 ? B -1.785 19.875 10.992 1.000 2 B ILE 89.060 1 +ATOM 803 O O . ILE 27 27 ? B -1.871 20.391 12.109 1.000 2 B ILE 89.060 1 +ATOM 804 C CB . ILE 27 27 ? B -1.158 17.438 10.797 1.000 2 B ILE 89.060 1 +ATOM 805 C CG1 . ILE 27 27 ? B -1.707 16.000 10.727 1.000 2 B ILE 89.060 1 +ATOM 806 C CG2 . ILE 27 27 ? B -0.310 17.641 12.055 1.000 2 B ILE 89.060 1 +ATOM 807 C CD1 . ILE 27 27 ? B -0.630 14.930 10.633 1.000 2 B ILE 89.060 1 +ATOM 808 N N . LYS 28 28 ? B -1.307 20.469 9.945 1.000 2 B LYS 89.190 1 +ATOM 809 C CA . LYS 28 28 ? B -0.809 21.844 10.055 1.000 2 B LYS 89.190 1 +ATOM 810 C C . LYS 28 28 ? B -1.896 22.781 10.562 1.000 2 B LYS 89.190 1 +ATOM 811 O O . LYS 28 28 ? B -1.666 23.562 11.484 1.000 2 B LYS 89.190 1 +ATOM 812 C CB . LYS 28 28 ? B -0.281 22.328 8.703 1.000 2 B LYS 89.190 1 +ATOM 813 C CG . LYS 28 28 ? B 0.281 23.734 8.734 1.000 2 B LYS 89.190 1 +ATOM 814 C CD . LYS 28 28 ? B 0.769 24.172 7.359 1.000 2 B LYS 89.190 1 +ATOM 815 C CE . LYS 28 28 ? B 1.255 25.625 7.367 1.000 2 B LYS 89.190 1 +ATOM 816 N NZ . LYS 28 28 ? B 1.718 26.062 6.020 1.000 2 B LYS 89.190 1 +ATOM 817 N N . THR 29 29 ? B -3.010 22.703 9.914 1.000 2 B THR 89.750 1 +ATOM 818 C CA . THR 29 29 ? B -4.133 23.547 10.289 1.000 2 B THR 89.750 1 +ATOM 819 C C . THR 29 29 ? B -4.566 23.266 11.727 1.000 2 B THR 89.750 1 +ATOM 820 O O . THR 29 29 ? B -4.863 24.188 12.484 1.000 2 B THR 89.750 1 +ATOM 821 C CB . THR 29 29 ? B -5.328 23.344 9.344 1.000 2 B THR 89.750 1 +ATOM 822 O OG1 . THR 29 29 ? B -4.918 23.625 8.000 1.000 2 B THR 89.750 1 +ATOM 823 C CG2 . THR 29 29 ? B -6.484 24.281 9.711 1.000 2 B THR 89.750 1 +ATOM 824 N N . MET 30 30 ? B -4.523 22.000 12.023 1.000 2 B MET 84.000 1 +ATOM 825 C CA . MET 30 30 ? B -4.906 21.625 13.383 1.000 2 B MET 84.000 1 +ATOM 826 C C . MET 30 30 ? B -3.932 22.203 14.406 1.000 2 B MET 84.000 1 +ATOM 827 O O . MET 30 30 ? B -4.348 22.719 15.438 1.000 2 B MET 84.000 1 +ATOM 828 C CB . MET 30 30 ? B -4.969 20.109 13.523 1.000 2 B MET 84.000 1 +ATOM 829 C CG . MET 30 30 ? B -6.199 19.484 12.883 1.000 2 B MET 84.000 1 +ATOM 830 S SD . MET 30 30 ? B -6.266 17.656 13.117 1.000 2 B MET 84.000 1 +ATOM 831 C CE . MET 30 30 ? B -7.512 17.219 11.867 1.000 2 B MET 84.000 1 +ATOM 832 N N . LEU 31 31 ? B -2.740 22.109 14.117 1.000 2 B LEU 82.440 1 +ATOM 833 C CA . LEU 31 31 ? B -1.719 22.625 15.016 1.000 2 B LEU 82.440 1 +ATOM 834 C C . LEU 31 31 ? B -1.827 24.141 15.141 1.000 2 B LEU 82.440 1 +ATOM 835 O O . LEU 31 31 ? B -1.675 24.703 16.234 1.000 2 B LEU 82.440 1 +ATOM 836 C CB . LEU 31 31 ? B -0.321 22.234 14.523 1.000 2 B LEU 82.440 1 +ATOM 837 C CG . LEU 31 31 ? B 0.048 20.766 14.625 1.000 2 B LEU 82.440 1 +ATOM 838 C CD1 . LEU 31 31 ? B 1.416 20.516 14.000 1.000 2 B LEU 82.440 1 +ATOM 839 C CD2 . LEU 31 31 ? B 0.028 20.297 16.078 1.000 2 B LEU 82.440 1 +ATOM 840 N N . GLU 32 32 ? B -2.045 24.828 14.102 1.000 2 B GLU 84.250 1 +ATOM 841 C CA . GLU 32 32 ? B -2.199 26.281 14.102 1.000 2 B GLU 84.250 1 +ATOM 842 C C . GLU 32 32 ? B -3.432 26.703 14.891 1.000 2 B GLU 84.250 1 +ATOM 843 O O . GLU 32 32 ? B -3.391 27.688 15.633 1.000 2 B GLU 84.250 1 +ATOM 844 C CB . GLU 32 32 ? B -2.285 26.797 12.672 1.000 2 B GLU 84.250 1 +ATOM 845 C CG . GLU 32 32 ? B -0.983 26.688 11.891 1.000 2 B GLU 84.250 1 +ATOM 846 C CD . GLU 32 32 ? B -1.104 27.156 10.445 1.000 2 B GLU 84.250 1 +ATOM 847 O OE1 . GLU 32 32 ? B -0.067 27.250 9.750 1.000 2 B GLU 84.250 1 +ATOM 848 O OE2 . GLU 32 32 ? B -2.244 27.422 10.008 1.000 2 B GLU 84.250 1 +ATOM 849 N N . ASP 33 33 ? B -4.445 25.922 14.680 1.000 2 B ASP 81.440 1 +ATOM 850 C CA . ASP 33 33 ? B -5.699 26.234 15.367 1.000 2 B ASP 81.440 1 +ATOM 851 C C . ASP 33 33 ? B -5.570 26.016 16.875 1.000 2 B ASP 81.440 1 +ATOM 852 O O . ASP 33 33 ? B -6.133 26.781 17.656 1.000 2 B ASP 81.440 1 +ATOM 853 C CB . ASP 33 33 ? B -6.840 25.375 14.812 1.000 2 B ASP 81.440 1 +ATOM 854 C CG . ASP 33 33 ? B -8.203 26.031 14.977 1.000 2 B ASP 81.440 1 +ATOM 855 O OD1 . ASP 33 33 ? B -8.305 27.266 14.875 1.000 2 B ASP 81.440 1 +ATOM 856 O OD2 . ASP 33 33 ? B -9.180 25.297 15.227 1.000 2 B ASP 81.440 1 +ATOM 857 N N . LEU 34 34 ? B -4.926 24.953 17.219 1.000 2 B LEU 74.440 1 +ATOM 858 C CA . LEU 34 34 ? B -4.773 24.625 18.641 1.000 2 B LEU 74.440 1 +ATOM 859 C C . LEU 34 34 ? B -3.645 25.438 19.266 1.000 2 B LEU 74.440 1 +ATOM 860 O O . LEU 34 34 ? B -3.482 25.453 20.484 1.000 2 B LEU 74.440 1 +ATOM 861 C CB . LEU 34 34 ? B -4.500 23.141 18.812 1.000 2 B LEU 74.440 1 +ATOM 862 C CG . LEU 34 34 ? B -5.645 22.188 18.438 1.000 2 B LEU 74.440 1 +ATOM 863 C CD1 . LEU 34 34 ? B -5.164 20.750 18.469 1.000 2 B LEU 74.440 1 +ATOM 864 C CD2 . LEU 34 34 ? B -6.832 22.391 19.375 1.000 2 B LEU 74.440 1 +ATOM 865 N N . GLY 35 35 ? B -3.021 26.109 18.531 1.000 2 B GLY 70.620 1 +ATOM 866 C CA . GLY 35 35 ? B -1.907 26.906 19.016 1.000 2 B GLY 70.620 1 +ATOM 867 C C . GLY 35 35 ? B -0.696 26.078 19.391 1.000 2 B GLY 70.620 1 +ATOM 868 O O . GLY 35 35 ? B 0.045 26.438 20.312 1.000 2 B GLY 70.620 1 +ATOM 869 N N . MET 36 36 ? B -0.739 24.719 18.984 1.000 2 B MET 62.220 1 +ATOM 870 C CA . MET 36 36 ? B 0.328 23.797 19.344 1.000 2 B MET 62.220 1 +ATOM 871 C C . MET 36 36 ? B 1.516 23.922 18.406 1.000 2 B MET 62.220 1 +ATOM 872 O O . MET 36 36 ? B 2.270 22.969 18.203 1.000 2 B MET 62.220 1 +ATOM 873 C CB . MET 36 36 ? B -0.184 22.344 19.344 1.000 2 B MET 62.220 1 +ATOM 874 C CG . MET 36 36 ? B -1.163 22.047 20.469 1.000 2 B MET 62.220 1 +ATOM 875 S SD . MET 36 36 ? B -1.022 20.312 21.078 1.000 2 B MET 62.220 1 +ATOM 876 C CE . MET 36 36 ? B -2.777 19.906 21.297 1.000 2 B MET 62.220 1 +ATOM 877 N N . ASP 37 37 ? B 1.688 25.031 17.703 1.000 2 B ASP 58.910 1 +ATOM 878 C CA . ASP 37 37 ? B 2.746 25.188 16.719 1.000 2 B ASP 58.910 1 +ATOM 879 C C . ASP 37 37 ? B 4.070 24.641 17.234 1.000 2 B ASP 58.910 1 +ATOM 880 O O . ASP 37 37 ? B 4.832 24.031 16.484 1.000 2 B ASP 58.910 1 +ATOM 881 C CB . ASP 37 37 ? B 2.908 26.656 16.328 1.000 2 B ASP 58.910 1 +ATOM 882 C CG . ASP 37 37 ? B 1.845 27.141 15.359 1.000 2 B ASP 58.910 1 +ATOM 883 O OD1 . ASP 37 37 ? B 1.428 26.359 14.477 1.000 2 B ASP 58.910 1 +ATOM 884 O OD2 . ASP 37 37 ? B 1.422 28.312 15.477 1.000 2 B ASP 58.910 1 +ATOM 885 N N . ASP 38 38 ? B 4.656 25.125 18.359 1.000 2 B ASP 56.620 1 +ATOM 886 C CA . ASP 38 38 ? B 6.031 24.859 18.781 1.000 2 B ASP 56.620 1 +ATOM 887 C C . ASP 38 38 ? B 6.090 23.797 19.875 1.000 2 B ASP 56.620 1 +ATOM 888 O O . ASP 38 38 ? B 7.164 23.297 20.188 1.000 2 B ASP 56.620 1 +ATOM 889 C CB . ASP 38 38 ? B 6.691 26.156 19.266 1.000 2 B ASP 56.620 1 +ATOM 890 C CG . ASP 38 38 ? B 7.461 26.875 18.172 1.000 2 B ASP 56.620 1 +ATOM 891 O OD1 . ASP 38 38 ? B 7.977 26.203 17.250 1.000 2 B ASP 56.620 1 +ATOM 892 O OD2 . ASP 38 38 ? B 7.562 28.125 18.234 1.000 2 B ASP 56.620 1 +ATOM 893 N N . GLU 39 39 ? B 5.035 23.594 20.797 1.000 2 B GLU 54.060 1 +ATOM 894 C CA . GLU 39 39 ? B 5.168 22.672 21.922 1.000 2 B GLU 54.060 1 +ATOM 895 C C . GLU 39 39 ? B 4.500 21.328 21.641 1.000 2 B GLU 54.060 1 +ATOM 896 O O . GLU 39 39 ? B 3.434 21.281 21.031 1.000 2 B GLU 54.060 1 +ATOM 897 C CB . GLU 39 39 ? B 4.574 23.297 23.203 1.000 2 B GLU 54.060 1 +ATOM 898 C CG . GLU 39 39 ? B 5.363 24.484 23.719 1.000 2 B GLU 54.060 1 +ATOM 899 C CD . GLU 39 39 ? B 4.980 24.875 25.141 1.000 2 B GLU 54.060 1 +ATOM 900 O OE1 . GLU 39 39 ? B 5.727 25.641 25.781 1.000 2 B GLU 54.060 1 +ATOM 901 O OE2 . GLU 39 39 ? B 3.930 24.391 25.625 1.000 2 B GLU 54.060 1 +ATOM 902 N N . GLY 40 40 ? B 5.215 20.266 21.109 1.000 2 B GLY 54.880 1 +ATOM 903 C CA . GLY 40 40 ? B 4.816 18.875 21.000 1.000 2 B GLY 54.880 1 +ATOM 904 C C . GLY 40 40 ? B 3.732 18.484 22.000 1.000 2 B GLY 54.880 1 +ATOM 905 O O . GLY 40 40 ? B 3.996 18.359 23.188 1.000 2 B GLY 54.880 1 +ATOM 906 N N . ASP 41 41 ? B 2.541 19.031 21.781 1.000 2 B ASP 57.440 1 +ATOM 907 C CA . ASP 41 41 ? B 1.497 18.516 22.656 1.000 2 B ASP 57.440 1 +ATOM 908 C C . ASP 41 41 ? B 1.618 17.016 22.828 1.000 2 B ASP 57.440 1 +ATOM 909 O O . ASP 41 41 ? B 1.911 16.297 21.859 1.000 2 B ASP 57.440 1 +ATOM 910 C CB . ASP 41 41 ? B 0.112 18.875 22.109 1.000 2 B ASP 57.440 1 +ATOM 911 C CG . ASP 41 41 ? B -0.864 19.297 23.203 1.000 2 B ASP 57.440 1 +ATOM 912 O OD1 . ASP 41 41 ? B -0.545 19.125 24.391 1.000 2 B ASP 57.440 1 +ATOM 913 O OD2 . ASP 41 41 ? B -1.964 19.781 22.859 1.000 2 B ASP 57.440 1 +ATOM 914 N N . ASP 42 42 ? B 1.878 16.625 24.047 1.000 2 B ASP 67.620 1 +ATOM 915 C CA . ASP 42 42 ? B 2.086 15.242 24.469 1.000 2 B ASP 67.620 1 +ATOM 916 C C . ASP 42 42 ? B 0.863 14.375 24.156 1.000 2 B ASP 67.620 1 +ATOM 917 O O . ASP 42 42 ? B 0.923 13.148 24.250 1.000 2 B ASP 67.620 1 +ATOM 918 C CB . ASP 42 42 ? B 2.404 15.180 25.969 1.000 2 B ASP 67.620 1 +ATOM 919 C CG . ASP 42 42 ? B 3.725 14.492 26.266 1.000 2 B ASP 67.620 1 +ATOM 920 O OD1 . ASP 42 42 ? B 4.195 13.688 25.422 1.000 2 B ASP 67.620 1 +ATOM 921 O OD2 . ASP 42 42 ? B 4.297 14.758 27.344 1.000 2 B ASP 67.620 1 +ATOM 922 N N . ASP 43 43 ? B -0.172 15.070 23.656 1.000 2 B ASP 76.120 1 +ATOM 923 C CA . ASP 43 43 ? B -1.348 14.242 23.406 1.000 2 B ASP 76.120 1 +ATOM 924 C C . ASP 43 43 ? B -1.385 13.758 21.969 1.000 2 B ASP 76.120 1 +ATOM 925 O O . ASP 43 43 ? B -1.191 14.547 21.031 1.000 2 B ASP 76.120 1 +ATOM 926 C CB . ASP 43 43 ? B -2.627 15.016 23.750 1.000 2 B ASP 76.120 1 +ATOM 927 C CG . ASP 43 43 ? B -3.018 14.914 25.203 1.000 2 B ASP 76.120 1 +ATOM 928 O OD1 . ASP 43 43 ? B -3.072 13.789 25.750 1.000 2 B ASP 76.120 1 +ATOM 929 O OD2 . ASP 43 43 ? B -3.275 15.977 25.828 1.000 2 B ASP 76.120 1 +ATOM 930 N N . PRO 44 44 ? B -1.523 12.578 21.750 1.000 2 B PRO 86.310 1 +ATOM 931 C CA . PRO 44 44 ? B -1.562 12.023 20.406 1.000 2 B PRO 86.310 1 +ATOM 932 C C . PRO 44 44 ? B -2.779 12.492 19.609 1.000 2 B PRO 86.310 1 +ATOM 933 O O . PRO 44 44 ? B -3.863 12.648 20.172 1.000 2 B PRO 86.310 1 +ATOM 934 C CB . PRO 44 44 ? B -1.611 10.516 20.641 1.000 2 B PRO 86.310 1 +ATOM 935 C CG . PRO 44 44 ? B -2.084 10.367 22.062 1.000 2 B PRO 86.310 1 +ATOM 936 C CD . PRO 44 44 ? B -1.819 11.656 22.781 1.000 2 B PRO 86.310 1 +ATOM 937 N N . VAL 45 45 ? B -2.500 13.008 18.453 1.000 2 B VAL 85.310 1 +ATOM 938 C CA . VAL 45 45 ? B -3.562 13.367 17.531 1.000 2 B VAL 85.310 1 +ATOM 939 C C . VAL 45 45 ? B -4.160 12.102 16.922 1.000 2 B VAL 85.310 1 +ATOM 940 O O . VAL 45 45 ? B -3.479 11.367 16.203 1.000 2 B VAL 85.310 1 +ATOM 941 C CB . VAL 45 45 ? B -3.049 14.305 16.406 1.000 2 B VAL 85.310 1 +ATOM 942 C CG1 . VAL 45 45 ? B -4.172 14.648 15.430 1.000 2 B VAL 85.310 1 +ATOM 943 C CG2 . VAL 45 45 ? B -2.449 15.570 17.016 1.000 2 B VAL 85.310 1 +ATOM 944 N N . PRO 46 46 ? B -5.387 11.914 17.188 1.000 2 B PRO 87.120 1 +ATOM 945 C CA . PRO 46 46 ? B -6.023 10.711 16.656 1.000 2 B PRO 87.120 1 +ATOM 946 C C . PRO 46 46 ? B -6.336 10.820 15.172 1.000 2 B PRO 87.120 1 +ATOM 947 O O . PRO 46 46 ? B -6.832 11.852 14.711 1.000 2 B PRO 87.120 1 +ATOM 948 C CB . PRO 46 46 ? B -7.312 10.594 17.484 1.000 2 B PRO 87.120 1 +ATOM 949 C CG . PRO 46 46 ? B -7.629 12.000 17.891 1.000 2 B PRO 87.120 1 +ATOM 950 C CD . PRO 46 46 ? B -6.344 12.773 17.953 1.000 2 B PRO 87.120 1 +ATOM 951 N N . LEU 47 47 ? B -5.949 9.789 14.414 1.000 2 B LEU 87.190 1 +ATOM 952 C CA . LEU 47 47 ? B -6.266 9.648 13.000 1.000 2 B LEU 87.190 1 +ATOM 953 C C . LEU 47 47 ? B -7.152 8.438 12.758 1.000 2 B LEU 87.190 1 +ATOM 954 O O . LEU 47 47 ? B -6.656 7.352 12.438 1.000 2 B LEU 87.190 1 +ATOM 955 C CB . LEU 47 47 ? B -4.984 9.539 12.172 1.000 2 B LEU 87.190 1 +ATOM 956 C CG . LEU 47 47 ? B -3.969 10.672 12.328 1.000 2 B LEU 87.190 1 +ATOM 957 C CD1 . LEU 47 47 ? B -2.697 10.359 11.547 1.000 2 B LEU 87.190 1 +ATOM 958 C CD2 . LEU 47 47 ? B -4.570 11.992 11.875 1.000 2 B LEU 87.190 1 +ATOM 959 N N . PRO 48 48 ? B -8.422 8.648 12.852 1.000 2 B PRO 84.500 1 +ATOM 960 C CA . PRO 48 48 ? B -9.367 7.527 12.828 1.000 2 B PRO 84.500 1 +ATOM 961 C C . PRO 48 48 ? B -9.477 6.875 11.453 1.000 2 B PRO 84.500 1 +ATOM 962 O O . PRO 48 48 ? B -9.945 5.738 11.344 1.000 2 B PRO 84.500 1 +ATOM 963 C CB . PRO 48 48 ? B -10.688 8.180 13.234 1.000 2 B PRO 84.500 1 +ATOM 964 C CG . PRO 48 48 ? B -10.523 9.625 12.914 1.000 2 B PRO 84.500 1 +ATOM 965 C CD . PRO 48 48 ? B -9.062 9.977 13.039 1.000 2 B PRO 84.500 1 +ATOM 966 N N . ASN 49 49 ? B -9.102 7.473 10.422 1.000 2 B ASN 82.060 1 +ATOM 967 C CA . ASN 49 49 ? B -9.258 6.945 9.070 1.000 2 B ASN 82.060 1 +ATOM 968 C C . ASN 49 49 ? B -7.945 6.402 8.516 1.000 2 B ASN 82.060 1 +ATOM 969 O O . ASN 49 49 ? B -7.840 6.117 7.324 1.000 2 B ASN 82.060 1 +ATOM 970 C CB . ASN 49 49 ? B -9.820 8.023 8.133 1.000 2 B ASN 82.060 1 +ATOM 971 C CG . ASN 49 49 ? B -11.242 8.414 8.484 1.000 2 B ASN 82.060 1 +ATOM 972 O OD1 . ASN 49 49 ? B -12.062 7.566 8.859 1.000 2 B ASN 82.060 1 +ATOM 973 N ND2 . ASN 49 49 ? B -11.555 9.703 8.367 1.000 2 B ASN 82.060 1 +ATOM 974 N N . VAL 50 50 ? B -6.945 6.309 9.367 1.000 2 B VAL 87.060 1 +ATOM 975 C CA . VAL 50 50 ? B -5.637 5.840 8.922 1.000 2 B VAL 87.060 1 +ATOM 976 C C . VAL 50 50 ? B -5.121 4.766 9.875 1.000 2 B VAL 87.060 1 +ATOM 977 O O . VAL 50 50 ? B -4.926 5.023 11.070 1.000 2 B VAL 87.060 1 +ATOM 978 C CB . VAL 50 50 ? B -4.617 7.000 8.828 1.000 2 B VAL 87.060 1 +ATOM 979 C CG1 . VAL 50 50 ? B -3.318 6.523 8.188 1.000 2 B VAL 87.060 1 +ATOM 980 C CG2 . VAL 50 50 ? B -5.211 8.164 8.047 1.000 2 B VAL 87.060 1 +ATOM 981 N N . ASN 51 51 ? B -4.914 3.613 9.328 1.000 2 B ASN 84.880 1 +ATOM 982 C CA . ASN 51 51 ? B -4.371 2.562 10.180 1.000 2 B ASN 84.880 1 +ATOM 983 C C . ASN 51 51 ? B -2.863 2.701 10.352 1.000 2 B ASN 84.880 1 +ATOM 984 O O . ASN 51 51 ? B -2.230 3.514 9.672 1.000 2 B ASN 84.880 1 +ATOM 985 C CB . ASN 51 51 ? B -4.719 1.182 9.625 1.000 2 B ASN 84.880 1 +ATOM 986 C CG . ASN 51 51 ? B -4.113 0.936 8.258 1.000 2 B ASN 84.880 1 +ATOM 987 O OD1 . ASN 51 51 ? B -2.900 1.078 8.070 1.000 2 B ASN 84.880 1 +ATOM 988 N ND2 . ASN 51 51 ? B -4.945 0.564 7.293 1.000 2 B ASN 84.880 1 +ATOM 989 N N . ALA 52 52 ? B -2.297 1.890 11.219 1.000 2 B ALA 85.190 1 +ATOM 990 C CA . ALA 52 52 ? B -0.897 2.023 11.609 1.000 2 B ALA 85.190 1 +ATOM 991 C C . ALA 52 52 ? B 0.032 1.740 10.430 1.000 2 B ALA 85.190 1 +ATOM 992 O O . ALA 52 52 ? B 1.015 2.455 10.227 1.000 2 B ALA 85.190 1 +ATOM 993 C CB . ALA 52 52 ? B -0.578 1.086 12.773 1.000 2 B ALA 85.190 1 +ATOM 994 N N . ALA 53 53 ? B -0.316 0.700 9.742 1.000 2 B ALA 82.440 1 +ATOM 995 C CA . ALA 53 53 ? B 0.528 0.306 8.617 1.000 2 B ALA 82.440 1 +ATOM 996 C C . ALA 53 53 ? B 0.605 1.416 7.574 1.000 2 B ALA 82.440 1 +ATOM 997 O O . ALA 53 53 ? B 1.686 1.723 7.062 1.000 2 B ALA 82.440 1 +ATOM 998 C CB . ALA 53 53 ? B 0.004 -0.980 7.984 1.000 2 B ALA 82.440 1 +ATOM 999 N N . ILE 54 54 ? B -0.521 2.000 7.289 1.000 2 B ILE 88.310 1 +ATOM 1000 C CA . ILE 54 54 ? B -0.610 3.072 6.305 1.000 2 B ILE 88.310 1 +ATOM 1001 C C . ILE 54 54 ? B 0.049 4.336 6.855 1.000 2 B ILE 88.310 1 +ATOM 1002 O O . ILE 54 54 ? B 0.791 5.016 6.145 1.000 2 B ILE 88.310 1 +ATOM 1003 C CB . ILE 54 54 ? B -2.076 3.361 5.910 1.000 2 B ILE 88.310 1 +ATOM 1004 C CG1 . ILE 54 54 ? B -2.717 2.117 5.285 1.000 2 B ILE 88.310 1 +ATOM 1005 C CG2 . ILE 54 54 ? B -2.156 4.559 4.961 1.000 2 B ILE 88.310 1 +ATOM 1006 C CD1 . ILE 54 54 ? B -2.018 1.634 4.020 1.000 2 B ILE 88.310 1 +ATOM 1007 N N . LEU 55 55 ? B -0.215 4.613 8.109 1.000 2 B LEU 92.810 1 +ATOM 1008 C CA . LEU 55 55 ? B 0.367 5.793 8.742 1.000 2 B LEU 92.810 1 +ATOM 1009 C C . LEU 55 55 ? B 1.891 5.738 8.695 1.000 2 B LEU 92.810 1 +ATOM 1010 O O . LEU 55 55 ? B 2.543 6.750 8.422 1.000 2 B LEU 92.810 1 +ATOM 1011 C CB . LEU 55 55 ? B -0.103 5.906 10.195 1.000 2 B LEU 92.810 1 +ATOM 1012 C CG . LEU 55 55 ? B 0.333 7.160 10.953 1.000 2 B LEU 92.810 1 +ATOM 1013 C CD1 . LEU 55 55 ? B -0.079 8.414 10.188 1.000 2 B LEU 92.810 1 +ATOM 1014 C CD2 . LEU 55 55 ? B -0.259 7.168 12.359 1.000 2 B LEU 92.810 1 +ATOM 1015 N N . LYS 56 56 ? B 2.400 4.637 8.984 1.000 2 B LYS 90.500 1 +ATOM 1016 C CA . LYS 56 56 ? B 3.848 4.457 8.922 1.000 2 B LYS 90.500 1 +ATOM 1017 C C . LYS 56 56 ? B 4.391 4.809 7.543 1.000 2 B LYS 90.500 1 +ATOM 1018 O O . LYS 56 56 ? B 5.414 5.484 7.422 1.000 2 B LYS 90.500 1 +ATOM 1019 C CB . LYS 56 56 ? B 4.227 3.021 9.289 1.000 2 B LYS 90.500 1 +ATOM 1020 C CG . LYS 56 56 ? B 5.719 2.809 9.508 1.000 2 B LYS 90.500 1 +ATOM 1021 C CD . LYS 56 56 ? B 6.023 1.378 9.930 1.000 2 B LYS 90.500 1 +ATOM 1022 C CE . LYS 56 56 ? B 7.520 1.157 10.117 1.000 2 B LYS 90.500 1 +ATOM 1023 N NZ . LYS 56 56 ? B 7.820 -0.234 10.570 1.000 2 B LYS 90.500 1 +ATOM 1024 N N . LYS 57 57 ? B 3.693 4.367 6.551 1.000 2 B LYS 91.310 1 +ATOM 1025 C CA . LYS 57 57 ? B 4.109 4.645 5.180 1.000 2 B LYS 91.310 1 +ATOM 1026 C C . LYS 57 57 ? B 3.994 6.133 4.859 1.000 2 B LYS 91.310 1 +ATOM 1027 O O . LYS 57 57 ? B 4.871 6.707 4.207 1.000 2 B LYS 91.310 1 +ATOM 1028 C CB . LYS 57 57 ? B 3.277 3.832 4.188 1.000 2 B LYS 91.310 1 +ATOM 1029 C CG . LYS 57 57 ? B 3.594 2.344 4.191 1.000 2 B LYS 91.310 1 +ATOM 1030 C CD . LYS 57 57 ? B 2.881 1.618 3.057 1.000 2 B LYS 91.310 1 +ATOM 1031 C CE . LYS 57 57 ? B 3.318 0.162 2.959 1.000 2 B LYS 91.310 1 +ATOM 1032 N NZ . LYS 57 57 ? B 4.332 -0.041 1.883 1.000 2 B LYS 91.310 1 +ATOM 1033 N N . VAL 58 58 ? B 2.957 6.707 5.336 1.000 2 B VAL 93.940 1 +ATOM 1034 C CA . VAL 58 58 ? B 2.750 8.133 5.113 1.000 2 B VAL 93.940 1 +ATOM 1035 C C . VAL 58 58 ? B 3.879 8.930 5.766 1.000 2 B VAL 93.940 1 +ATOM 1036 O O . VAL 58 58 ? B 4.434 9.844 5.156 1.000 2 B VAL 93.940 1 +ATOM 1037 C CB . VAL 58 58 ? B 1.383 8.602 5.660 1.000 2 B VAL 93.940 1 +ATOM 1038 C CG1 . VAL 58 58 ? B 1.285 10.125 5.641 1.000 2 B VAL 93.940 1 +ATOM 1039 C CG2 . VAL 58 58 ? B 0.245 7.980 4.852 1.000 2 B VAL 93.940 1 +ATOM 1040 N N . ILE 59 59 ? B 4.238 8.570 6.969 1.000 2 B ILE 93.380 1 +ATOM 1041 C CA . ILE 59 59 ? B 5.305 9.258 7.691 1.000 2 B ILE 93.380 1 +ATOM 1042 C C . ILE 59 59 ? B 6.629 9.070 6.953 1.000 2 B ILE 93.380 1 +ATOM 1043 O O . ILE 59 59 ? B 7.406 10.016 6.816 1.000 2 B ILE 93.380 1 +ATOM 1044 C CB . ILE 59 59 ? B 5.422 8.742 9.141 1.000 2 B ILE 93.380 1 +ATOM 1045 C CG1 . ILE 59 59 ? B 4.180 9.141 9.953 1.000 2 B ILE 93.380 1 +ATOM 1046 C CG2 . ILE 59 59 ? B 6.695 9.281 9.805 1.000 2 B ILE 93.380 1 +ATOM 1047 C CD1 . ILE 59 59 ? B 4.062 8.430 11.289 1.000 2 B ILE 93.380 1 +ATOM 1048 N N . GLN 60 60 ? B 6.812 7.848 6.520 1.000 2 B GLN 91.690 1 +ATOM 1049 C CA . GLN 60 60 ? B 8.031 7.559 5.762 1.000 2 B GLN 91.690 1 +ATOM 1050 C C . GLN 60 60 ? B 8.133 8.453 4.527 1.000 2 B GLN 91.690 1 +ATOM 1051 O O . GLN 60 60 ? B 9.188 9.016 4.254 1.000 2 B GLN 91.690 1 +ATOM 1052 C CB . GLN 60 60 ? B 8.070 6.086 5.352 1.000 2 B GLN 91.690 1 +ATOM 1053 C CG . GLN 60 60 ? B 9.367 5.672 4.676 1.000 2 B GLN 91.690 1 +ATOM 1054 C CD . GLN 60 60 ? B 9.367 4.219 4.238 1.000 2 B GLN 91.690 1 +ATOM 1055 O OE1 . GLN 60 60 ? B 8.383 3.500 4.438 1.000 2 B GLN 91.690 1 +ATOM 1056 N NE2 . GLN 60 60 ? B 10.469 3.781 3.639 1.000 2 B GLN 91.690 1 +ATOM 1057 N N . TRP 61 61 ? B 7.051 8.531 3.750 1.000 2 B TRP 93.620 1 +ATOM 1058 C CA . TRP 61 61 ? B 7.031 9.367 2.557 1.000 2 B TRP 93.620 1 +ATOM 1059 C C . TRP 61 61 ? B 7.305 10.820 2.910 1.000 2 B TRP 93.620 1 +ATOM 1060 O O . TRP 61 61 ? B 8.148 11.477 2.291 1.000 2 B TRP 93.620 1 +ATOM 1061 C CB . TRP 61 61 ? B 5.688 9.242 1.832 1.000 2 B TRP 93.620 1 +ATOM 1062 C CG . TRP 61 61 ? B 5.668 9.883 0.476 1.000 2 B TRP 93.620 1 +ATOM 1063 C CD1 . TRP 61 61 ? B 5.973 9.297 -0.719 1.000 2 B TRP 93.620 1 +ATOM 1064 C CD2 . TRP 61 61 ? B 5.328 11.242 0.178 1.000 2 B TRP 93.620 1 +ATOM 1065 N NE1 . TRP 61 61 ? B 5.840 10.203 -1.742 1.000 2 B TRP 93.620 1 +ATOM 1066 C CE2 . TRP 61 61 ? B 5.449 11.406 -1.220 1.000 2 B TRP 93.620 1 +ATOM 1067 C CE3 . TRP 61 61 ? B 4.934 12.336 0.958 1.000 2 B TRP 93.620 1 +ATOM 1068 C CZ2 . TRP 61 61 ? B 5.188 12.625 -1.854 1.000 2 B TRP 93.620 1 +ATOM 1069 C CZ3 . TRP 61 61 ? B 4.676 13.547 0.325 1.000 2 B TRP 93.620 1 +ATOM 1070 C CH2 . TRP 61 61 ? B 4.801 13.680 -1.068 1.000 2 B TRP 93.620 1 +ATOM 1071 N N . CYS 62 62 ? B 6.676 11.336 3.941 1.000 2 B CYS 93.120 1 +ATOM 1072 C CA . CYS 62 62 ? B 6.832 12.727 4.352 1.000 2 B CYS 93.120 1 +ATOM 1073 C C . CYS 62 62 ? B 8.258 13.000 4.816 1.000 2 B CYS 93.120 1 +ATOM 1074 O O . CYS 62 62 ? B 8.797 14.078 4.578 1.000 2 B CYS 93.120 1 +ATOM 1075 C CB . CYS 62 62 ? B 5.848 13.062 5.469 1.000 2 B CYS 93.120 1 +ATOM 1076 S SG . CYS 62 62 ? B 4.125 13.133 4.930 1.000 2 B CYS 93.120 1 +ATOM 1077 N N . THR 63 63 ? B 8.820 12.008 5.496 1.000 2 B THR 92.310 1 +ATOM 1078 C CA . THR 63 63 ? B 10.195 12.164 5.973 1.000 2 B THR 92.310 1 +ATOM 1079 C C . THR 63 63 ? B 11.164 12.258 4.801 1.000 2 B THR 92.310 1 +ATOM 1080 O O . THR 63 63 ? B 12.094 13.078 4.816 1.000 2 B THR 92.310 1 +ATOM 1081 C CB . THR 63 63 ? B 10.602 11.000 6.891 1.000 2 B THR 92.310 1 +ATOM 1082 O OG1 . THR 63 63 ? B 9.695 10.922 7.992 1.000 2 B THR 92.310 1 +ATOM 1083 C CG2 . THR 63 63 ? B 12.016 11.188 7.426 1.000 2 B THR 92.310 1 +ATOM 1084 N N . HIS 64 64 ? B 10.945 11.438 3.832 1.000 2 B HIS 90.690 1 +ATOM 1085 C CA . HIS 64 64 ? B 11.805 11.445 2.656 1.000 2 B HIS 90.690 1 +ATOM 1086 C C . HIS 64 64 ? B 11.672 12.750 1.877 1.000 2 B HIS 90.690 1 +ATOM 1087 O O . HIS 64 64 ? B 12.656 13.250 1.330 1.000 2 B HIS 90.690 1 +ATOM 1088 C CB . HIS 64 64 ? B 11.477 10.258 1.750 1.000 2 B HIS 90.690 1 +ATOM 1089 C CG . HIS 64 64 ? B 12.344 10.172 0.533 1.000 2 B HIS 90.690 1 +ATOM 1090 N ND1 . HIS 64 64 ? B 12.047 10.836 -0.638 1.000 2 B HIS 90.690 1 +ATOM 1091 C CD2 . HIS 64 64 ? B 13.500 9.508 0.309 1.000 2 B HIS 90.690 1 +ATOM 1092 C CE1 . HIS 64 64 ? B 12.984 10.578 -1.533 1.000 2 B HIS 90.690 1 +ATOM 1093 N NE2 . HIS 64 64 ? B 13.883 9.773 -0.983 1.000 2 B HIS 90.690 1 +ATOM 1094 N N . HIS 65 65 ? B 10.484 13.312 1.833 1.000 2 B HIS 90.560 1 +ATOM 1095 C CA . HIS 65 65 ? B 10.211 14.484 1.011 1.000 2 B HIS 90.560 1 +ATOM 1096 C C . HIS 65 65 ? B 10.211 15.758 1.849 1.000 2 B HIS 90.560 1 +ATOM 1097 O O . HIS 65 65 ? B 9.680 16.781 1.422 1.000 2 B HIS 90.560 1 +ATOM 1098 C CB . HIS 65 65 ? B 8.867 14.320 0.292 1.000 2 B HIS 90.560 1 +ATOM 1099 C CG . HIS 65 65 ? B 8.898 13.336 -0.828 1.000 2 B HIS 90.560 1 +ATOM 1100 N ND1 . HIS 65 65 ? B 9.477 13.602 -2.047 1.000 2 B HIS 90.560 1 +ATOM 1101 C CD2 . HIS 65 65 ? B 8.422 12.070 -0.908 1.000 2 B HIS 90.560 1 +ATOM 1102 C CE1 . HIS 65 65 ? B 9.359 12.547 -2.832 1.000 2 B HIS 90.560 1 +ATOM 1103 N NE2 . HIS 65 65 ? B 8.719 11.602 -2.164 1.000 2 B HIS 90.560 1 +ATOM 1104 N N . LYS 66 66 ? B 10.656 15.766 3.014 1.000 2 B LYS 85.620 1 +ATOM 1105 C CA . LYS 66 66 ? B 10.641 16.906 3.918 1.000 2 B LYS 85.620 1 +ATOM 1106 C C . LYS 66 66 ? B 11.406 18.094 3.330 1.000 2 B LYS 85.620 1 +ATOM 1107 O O . LYS 66 66 ? B 11.047 19.250 3.549 1.000 2 B LYS 85.620 1 +ATOM 1108 C CB . LYS 66 66 ? B 11.227 16.531 5.281 1.000 2 B LYS 85.620 1 +ATOM 1109 C CG . LYS 66 66 ? B 12.711 16.203 5.242 1.000 2 B LYS 85.620 1 +ATOM 1110 C CD . LYS 66 66 ? B 13.250 15.898 6.637 1.000 2 B LYS 85.620 1 +ATOM 1111 C CE . LYS 66 66 ? B 14.742 15.602 6.605 1.000 2 B LYS 85.620 1 +ATOM 1112 N NZ . LYS 66 66 ? B 15.312 15.500 7.980 1.000 2 B LYS 85.620 1 +ATOM 1113 N N . ASP 67 67 ? B 12.453 17.688 2.582 1.000 2 B ASP 82.690 1 +ATOM 1114 C CA . ASP 67 67 ? B 13.305 18.766 2.055 1.000 2 B ASP 82.690 1 +ATOM 1115 C C . ASP 67 67 ? B 12.938 19.078 0.606 1.000 2 B ASP 82.690 1 +ATOM 1116 O O . ASP 67 67 ? B 13.609 19.891 -0.041 1.000 2 B ASP 82.690 1 +ATOM 1117 C CB . ASP 67 67 ? B 14.773 18.375 2.154 1.000 2 B ASP 82.690 1 +ATOM 1118 C CG . ASP 67 67 ? B 15.250 18.219 3.588 1.000 2 B ASP 82.690 1 +ATOM 1119 O OD1 . ASP 67 67 ? B 15.125 19.172 4.383 1.000 2 B ASP 82.690 1 +ATOM 1120 O OD2 . ASP 67 67 ? B 15.758 17.125 3.926 1.000 2 B ASP 82.690 1 +ATOM 1121 N N . ASP 68 68 ? B 11.883 18.359 0.148 1.000 2 B ASP 78.880 1 +ATOM 1122 C CA . ASP 68 68 ? B 11.461 18.609 -1.227 1.000 2 B ASP 78.880 1 +ATOM 1123 C C . ASP 68 68 ? B 10.742 19.953 -1.347 1.000 2 B ASP 78.880 1 +ATOM 1124 O O . ASP 68 68 ? B 10.070 20.375 -0.410 1.000 2 B ASP 78.880 1 +ATOM 1125 C CB . ASP 68 68 ? B 10.555 17.469 -1.723 1.000 2 B ASP 78.880 1 +ATOM 1126 C CG . ASP 68 68 ? B 11.328 16.234 -2.131 1.000 2 B ASP 78.880 1 +ATOM 1127 O OD1 . ASP 68 68 ? B 12.578 16.266 -2.150 1.000 2 B ASP 78.880 1 +ATOM 1128 O OD2 . ASP 68 68 ? B 10.680 15.211 -2.439 1.000 2 B ASP 78.880 1 +ATOM 1129 N N . PRO 69 69 ? B 11.070 20.641 -2.387 1.000 2 B PRO 67.810 1 +ATOM 1130 C CA . PRO 69 69 ? B 10.305 21.875 -2.609 1.000 2 B PRO 67.810 1 +ATOM 1131 C C . PRO 69 69 ? B 8.805 21.609 -2.742 1.000 2 B PRO 67.810 1 +ATOM 1132 O O . PRO 69 69 ? B 8.391 20.516 -3.111 1.000 2 B PRO 67.810 1 +ATOM 1133 C CB . PRO 69 69 ? B 10.883 22.406 -3.918 1.000 2 B PRO 67.810 1 +ATOM 1134 C CG . PRO 69 69 ? B 11.641 21.266 -4.508 1.000 2 B PRO 67.810 1 +ATOM 1135 C CD . PRO 69 69 ? B 11.805 20.203 -3.451 1.000 2 B PRO 67.810 1 +ATOM 1136 N N . PRO 70 70 ? B 8.000 22.438 -2.197 1.000 2 B PRO 60.220 1 +ATOM 1137 C CA . PRO 70 70 ? B 6.551 22.266 -2.340 1.000 2 B PRO 60.220 1 +ATOM 1138 C C . PRO 70 70 ? B 6.125 22.016 -3.783 1.000 2 B PRO 60.220 1 +ATOM 1139 O O . PRO 70 70 ? B 6.793 22.469 -4.719 1.000 2 B PRO 60.220 1 +ATOM 1140 C CB . PRO 70 70 ? B 5.992 23.609 -1.836 1.000 2 B PRO 60.220 1 +ATOM 1141 C CG . PRO 70 70 ? B 7.188 24.484 -1.681 1.000 2 B PRO 60.220 1 +ATOM 1142 C CD . PRO 70 70 ? B 8.422 23.656 -1.892 1.000 2 B PRO 60.220 1 +ATOM 1143 N N . PRO 71 71 ? B 5.324 20.859 -4.000 1.000 2 B PRO 57.720 1 +ATOM 1144 C CA . PRO 71 71 ? B 4.887 20.594 -5.375 1.000 2 B PRO 57.720 1 +ATOM 1145 C C . PRO 71 71 ? B 4.488 21.875 -6.113 1.000 2 B PRO 57.720 1 +ATOM 1146 O O . PRO 71 71 ? B 4.016 22.828 -5.492 1.000 2 B PRO 57.720 1 +ATOM 1147 C CB . PRO 71 71 ? B 3.682 19.672 -5.195 1.000 2 B PRO 57.720 1 +ATOM 1148 C CG . PRO 71 71 ? B 3.393 19.703 -3.729 1.000 2 B PRO 57.720 1 +ATOM 1149 C CD . PRO 71 71 ? B 4.512 20.422 -3.035 1.000 2 B PRO 57.720 1 +ATOM 1150 N N . PRO 72 72 ? B 5.125 22.031 -7.297 1.000 2 B PRO 50.280 1 +ATOM 1151 C CA . PRO 72 72 ? B 4.754 23.250 -8.031 1.000 2 B PRO 50.280 1 +ATOM 1152 C C . PRO 72 72 ? B 3.244 23.484 -8.062 1.000 2 B PRO 50.280 1 +ATOM 1153 O O . PRO 72 72 ? B 2.469 22.531 -7.980 1.000 2 B PRO 50.280 1 +ATOM 1154 C CB . PRO 72 72 ? B 5.297 23.000 -9.438 1.000 2 B PRO 50.280 1 +ATOM 1155 C CG . PRO 72 72 ? B 5.637 21.547 -9.453 1.000 2 B PRO 50.280 1 +ATOM 1156 C CD . PRO 72 72 ? B 5.555 21.016 -8.047 1.000 2 B PRO 50.280 1 +ATOM 1157 N N . GLU 73 73 ? B 2.707 24.547 -7.621 1.000 2 B GLU 49.620 1 +ATOM 1158 C CA . GLU 73 73 ? B 1.303 24.906 -7.773 1.000 2 B GLU 49.620 1 +ATOM 1159 C C . GLU 73 73 ? B 0.718 24.328 -9.062 1.000 2 B GLU 49.620 1 +ATOM 1160 O O . GLU 73 73 ? B 1.452 24.062 -10.008 1.000 2 B GLU 49.620 1 +ATOM 1161 C CB . GLU 73 73 ? B 1.140 26.438 -7.758 1.000 2 B GLU 49.620 1 +ATOM 1162 C CG . GLU 73 73 ? B 1.304 27.047 -6.379 1.000 2 B GLU 49.620 1 +ATOM 1163 C CD . GLU 73 73 ? B 0.638 28.422 -6.250 1.000 2 B GLU 49.620 1 +ATOM 1164 O OE1 . GLU 73 73 ? B 0.249 28.797 -5.121 1.000 2 B GLU 49.620 1 +ATOM 1165 O OE2 . GLU 73 73 ? B 0.503 29.109 -7.285 1.000 2 B GLU 49.620 1 +ATOM 1166 N N . ASP 74 74 ? B -0.482 23.672 -9.039 1.000 2 B ASP 48.750 1 +ATOM 1167 C CA . ASP 74 74 ? B -1.240 23.156 -10.172 1.000 2 B ASP 48.750 1 +ATOM 1168 C C . ASP 74 74 ? B -0.948 23.953 -11.445 1.000 2 B ASP 48.750 1 +ATOM 1169 O O . ASP 74 74 ? B -1.459 25.062 -11.617 1.000 2 B ASP 48.750 1 +ATOM 1170 C CB . ASP 74 74 ? B -2.740 23.172 -9.875 1.000 2 B ASP 48.750 1 +ATOM 1171 C CG . ASP 74 74 ? B -3.354 21.781 -9.812 1.000 2 B ASP 48.750 1 +ATOM 1172 O OD1 . ASP 74 74 ? B -2.846 20.859 -10.492 1.000 2 B ASP 48.750 1 +ATOM 1173 O OD2 . ASP 74 74 ? B -4.355 21.609 -9.086 1.000 2 B ASP 48.750 1 +ATOM 1174 N N . ASP 75 75 ? B 0.282 24.000 -11.836 1.000 2 B ASP 45.380 1 +ATOM 1175 C CA . ASP 75 75 ? B 0.243 24.547 -13.188 1.000 2 B ASP 45.380 1 +ATOM 1176 C C . ASP 75 75 ? B -0.658 23.719 -14.094 1.000 2 B ASP 45.380 1 +ATOM 1177 O O . ASP 75 75 ? B -0.336 22.562 -14.406 1.000 2 B ASP 45.380 1 +ATOM 1178 C CB . ASP 75 75 ? B 1.652 24.625 -13.773 1.000 2 B ASP 45.380 1 +ATOM 1179 C CG . ASP 75 75 ? B 1.882 25.891 -14.602 1.000 2 B ASP 45.380 1 +ATOM 1180 O OD1 . ASP 75 75 ? B 0.896 26.531 -15.008 1.000 2 B ASP 45.380 1 +ATOM 1181 O OD2 . ASP 75 75 ? B 3.059 26.219 -14.844 1.000 2 B ASP 45.380 1 +ATOM 1182 N N . GLU 76 76 ? B -1.999 23.906 -14.047 1.000 2 B GLU 48.620 1 +ATOM 1183 C CA . GLU 76 76 ? B -3.053 23.312 -14.867 1.000 2 B GLU 48.620 1 +ATOM 1184 C C . GLU 76 76 ? B -2.535 22.953 -16.266 1.000 2 B GLU 48.620 1 +ATOM 1185 O O . GLU 76 76 ? B -3.047 22.031 -16.906 1.000 2 B GLU 48.620 1 +ATOM 1186 C CB . GLU 76 76 ? B -4.246 24.266 -14.977 1.000 2 B GLU 48.620 1 +ATOM 1187 C CG . GLU 76 76 ? B -5.125 24.297 -13.734 1.000 2 B GLU 48.620 1 +ATOM 1188 C CD . GLU 76 76 ? B -6.422 25.062 -13.938 1.000 2 B GLU 48.620 1 +ATOM 1189 O OE1 . GLU 76 76 ? B -7.348 24.922 -13.109 1.000 2 B GLU 48.620 1 +ATOM 1190 O OE2 . GLU 76 76 ? B -6.516 25.812 -14.938 1.000 2 B GLU 48.620 1 +ATOM 1191 N N . ASN 77 77 ? B -1.673 23.875 -16.844 1.000 2 B ASN 46.910 1 +ATOM 1192 C CA . ASN 77 77 ? B -1.264 23.766 -18.234 1.000 2 B ASN 46.910 1 +ATOM 1193 C C . ASN 77 77 ? B -0.156 22.734 -18.422 1.000 2 B ASN 46.910 1 +ATOM 1194 O O . ASN 77 77 ? B 0.308 22.500 -19.531 1.000 2 B ASN 46.910 1 +ATOM 1195 C CB . ASN 77 77 ? B -0.813 25.125 -18.781 1.000 2 B ASN 46.910 1 +ATOM 1196 C CG . ASN 77 77 ? B -1.937 25.875 -19.469 1.000 2 B ASN 46.910 1 +ATOM 1197 O OD1 . ASN 77 77 ? B -2.809 25.281 -20.094 1.000 2 B ASN 46.910 1 +ATOM 1198 N ND2 . ASN 77 77 ? B -1.923 27.203 -19.328 1.000 2 B ASN 46.910 1 +ATOM 1199 N N . LYS 78 78 ? B 0.530 22.359 -17.344 1.000 2 B LYS 49.620 1 +ATOM 1200 C CA . LYS 78 78 ? B 1.572 21.391 -17.641 1.000 2 B LYS 49.620 1 +ATOM 1201 C C . LYS 78 78 ? B 0.990 19.984 -17.766 1.000 2 B LYS 49.620 1 +ATOM 1202 O O . LYS 78 78 ? B 0.152 19.578 -16.953 1.000 2 B LYS 49.620 1 +ATOM 1203 C CB . LYS 78 78 ? B 2.662 21.406 -16.578 1.000 2 B LYS 49.620 1 +ATOM 1204 C CG . LYS 78 78 ? B 3.922 22.156 -16.984 1.000 2 B LYS 49.620 1 +ATOM 1205 C CD . LYS 78 78 ? B 4.992 22.094 -15.906 1.000 2 B LYS 49.620 1 +ATOM 1206 C CE . LYS 78 78 ? B 6.238 22.875 -16.312 1.000 2 B LYS 49.620 1 +ATOM 1207 N NZ . LYS 78 78 ? B 7.277 22.844 -15.234 1.000 2 B LYS 49.620 1 +ATOM 1208 N N . GLU 79 79 ? B 0.884 19.406 -18.891 1.000 2 B GLU 47.410 1 +ATOM 1209 C CA . GLU 79 79 ? B 0.485 18.031 -19.156 1.000 2 B GLU 47.410 1 +ATOM 1210 C C . GLU 79 79 ? B 0.966 17.078 -18.062 1.000 2 B GLU 47.410 1 +ATOM 1211 O O . GLU 79 79 ? B 2.162 17.016 -17.766 1.000 2 B GLU 47.410 1 +ATOM 1212 C CB . GLU 79 79 ? B 1.024 17.562 -20.516 1.000 2 B GLU 47.410 1 +ATOM 1213 C CG . GLU 79 79 ? B 0.090 17.859 -21.672 1.000 2 B GLU 47.410 1 +ATOM 1214 C CD . GLU 79 79 ? B 0.626 17.375 -23.016 1.000 2 B GLU 47.410 1 +ATOM 1215 O OE1 . GLU 79 79 ? B -0.177 17.172 -23.953 1.000 2 B GLU 47.410 1 +ATOM 1216 O OE2 . GLU 79 79 ? B 1.859 17.188 -23.125 1.000 2 B GLU 47.410 1 +ATOM 1217 N N . LYS 80 80 ? B 0.240 17.094 -16.875 1.000 2 B LYS 52.310 1 +ATOM 1218 C CA . LYS 80 80 ? B 0.505 16.109 -15.844 1.000 2 B LYS 52.310 1 +ATOM 1219 C C . LYS 80 80 ? B 1.017 14.805 -16.438 1.000 2 B LYS 52.310 1 +ATOM 1220 O O . LYS 80 80 ? B 0.242 14.031 -17.016 1.000 2 B LYS 52.310 1 +ATOM 1221 C CB . LYS 80 80 ? B -0.756 15.852 -15.016 1.000 2 B LYS 52.310 1 +ATOM 1222 C CG . LYS 80 80 ? B -1.194 17.031 -14.164 1.000 2 B LYS 52.310 1 +ATOM 1223 C CD . LYS 80 80 ? B -2.416 16.688 -13.320 1.000 2 B LYS 52.310 1 +ATOM 1224 C CE . LYS 80 80 ? B -2.852 17.875 -12.461 1.000 2 B LYS 52.310 1 +ATOM 1225 N NZ . LYS 80 80 ? B -4.086 17.562 -11.680 1.000 2 B LYS 52.310 1 +ATOM 1226 N N . ARG 81 81 ? B 2.279 14.898 -16.703 1.000 2 B ARG 58.590 1 +ATOM 1227 C CA . ARG 81 81 ? B 2.814 13.648 -17.234 1.000 2 B ARG 58.590 1 +ATOM 1228 C C . ARG 81 81 ? B 2.766 12.547 -16.188 1.000 2 B ARG 58.590 1 +ATOM 1229 O O . ARG 81 81 ? B 3.104 12.766 -15.023 1.000 2 B ARG 58.590 1 +ATOM 1230 C CB . ARG 81 81 ? B 4.250 13.844 -17.734 1.000 2 B ARG 58.590 1 +ATOM 1231 C CG . ARG 81 81 ? B 4.371 14.773 -18.922 1.000 2 B ARG 58.590 1 +ATOM 1232 C CD . ARG 81 81 ? B 5.785 14.781 -19.500 1.000 2 B ARG 58.590 1 +ATOM 1233 N NE . ARG 81 81 ? B 6.242 16.141 -19.781 1.000 2 B ARG 58.590 1 +ATOM 1234 C CZ . ARG 81 81 ? B 7.340 16.438 -20.469 1.000 2 B ARG 58.590 1 +ATOM 1235 N NH1 . ARG 81 81 ? B 8.117 15.477 -20.969 1.000 2 B ARG 58.590 1 +ATOM 1236 N NH2 . ARG 81 81 ? B 7.664 17.703 -20.672 1.000 2 B ARG 58.590 1 +ATOM 1237 N N . THR 82 82 ? B 1.875 11.562 -16.391 1.000 2 B THR 61.560 1 +ATOM 1238 C CA . THR 82 82 ? B 1.746 10.391 -15.539 1.000 2 B THR 61.560 1 +ATOM 1239 C C . THR 82 82 ? B 3.100 9.711 -15.336 1.000 2 B THR 61.560 1 +ATOM 1240 O O . THR 82 82 ? B 3.246 8.852 -14.469 1.000 2 B THR 61.560 1 +ATOM 1241 C CB . THR 82 82 ? B 0.742 9.375 -16.125 1.000 2 B THR 61.560 1 +ATOM 1242 O OG1 . THR 82 82 ? B 1.121 9.055 -17.469 1.000 2 B THR 61.560 1 +ATOM 1243 C CG2 . THR 82 82 ? B -0.673 9.945 -16.125 1.000 2 B THR 61.560 1 +ATOM 1244 N N . ASP 83 83 ? B 4.102 10.172 -16.125 1.000 2 B ASP 64.560 1 +ATOM 1245 C CA . ASP 83 83 ? B 5.406 9.523 -16.094 1.000 2 B ASP 64.560 1 +ATOM 1246 C C . ASP 83 83 ? B 6.371 10.281 -15.180 1.000 2 B ASP 64.560 1 +ATOM 1247 O O . ASP 83 83 ? B 7.500 9.836 -14.953 1.000 2 B ASP 64.560 1 +ATOM 1248 C CB . ASP 83 83 ? B 5.988 9.406 -17.500 1.000 2 B ASP 64.560 1 +ATOM 1249 C CG . ASP 83 83 ? B 5.891 10.703 -18.297 1.000 2 B ASP 64.560 1 +ATOM 1250 O OD1 . ASP 83 83 ? B 5.016 11.539 -17.984 1.000 2 B ASP 64.560 1 +ATOM 1251 O OD2 . ASP 83 83 ? B 6.691 10.883 -19.234 1.000 2 B ASP 64.560 1 +ATOM 1252 N N . ASP 84 84 ? B 5.844 11.281 -14.547 1.000 2 B ASP 75.750 1 +ATOM 1253 C CA . ASP 84 84 ? B 6.758 12.117 -13.773 1.000 2 B ASP 75.750 1 +ATOM 1254 C C . ASP 84 84 ? B 6.707 11.766 -12.289 1.000 2 B ASP 75.750 1 +ATOM 1255 O O . ASP 84 84 ? B 6.352 12.609 -11.461 1.000 2 B ASP 75.750 1 +ATOM 1256 C CB . ASP 84 84 ? B 6.426 13.602 -13.977 1.000 2 B ASP 75.750 1 +ATOM 1257 C CG . ASP 84 84 ? B 7.625 14.508 -13.773 1.000 2 B ASP 75.750 1 +ATOM 1258 O OD1 . ASP 84 84 ? B 8.766 14.008 -13.672 1.000 2 B ASP 75.750 1 +ATOM 1259 O OD2 . ASP 84 84 ? B 7.430 15.742 -13.703 1.000 2 B ASP 75.750 1 +ATOM 1260 N N . ILE 85 85 ? B 6.910 10.516 -12.000 1.000 2 B ILE 80.810 1 +ATOM 1261 C CA . ILE 85 85 ? B 7.016 10.070 -10.609 1.000 2 B ILE 80.810 1 +ATOM 1262 C C . ILE 85 85 ? B 8.477 9.805 -10.266 1.000 2 B ILE 80.810 1 +ATOM 1263 O O . ILE 85 85 ? B 9.148 9.000 -10.922 1.000 2 B ILE 80.810 1 +ATOM 1264 C CB . ILE 85 85 ? B 6.172 8.797 -10.367 1.000 2 B ILE 80.810 1 +ATOM 1265 C CG1 . ILE 85 85 ? B 4.703 9.055 -10.734 1.000 2 B ILE 80.810 1 +ATOM 1266 C CG2 . ILE 85 85 ? B 6.297 8.336 -8.914 1.000 2 B ILE 80.810 1 +ATOM 1267 C CD1 . ILE 85 85 ? B 3.820 7.816 -10.641 1.000 2 B ILE 80.810 1 +ATOM 1268 N N . PRO 86 86 ? B 8.984 10.562 -9.281 1.000 2 B PRO 83.120 1 +ATOM 1269 C CA . PRO 86 86 ? B 10.367 10.305 -8.859 1.000 2 B PRO 83.120 1 +ATOM 1270 C C . PRO 86 86 ? B 10.641 8.828 -8.609 1.000 2 B PRO 83.120 1 +ATOM 1271 O O . PRO 86 86 ? B 9.742 8.094 -8.188 1.000 2 B PRO 83.120 1 +ATOM 1272 C CB . PRO 86 86 ? B 10.484 11.109 -7.559 1.000 2 B PRO 83.120 1 +ATOM 1273 C CG . PRO 86 86 ? B 9.430 12.164 -7.664 1.000 2 B PRO 83.120 1 +ATOM 1274 C CD . PRO 86 86 ? B 8.273 11.609 -8.445 1.000 2 B PRO 83.120 1 +ATOM 1275 N N . VAL 87 87 ? B 11.914 8.430 -8.859 1.000 2 B VAL 81.750 1 +ATOM 1276 C CA . VAL 87 87 ? B 12.305 7.027 -8.766 1.000 2 B VAL 81.750 1 +ATOM 1277 C C . VAL 87 87 ? B 12.016 6.504 -7.363 1.000 2 B VAL 81.750 1 +ATOM 1278 O O . VAL 87 87 ? B 11.484 5.402 -7.203 1.000 2 B VAL 81.750 1 +ATOM 1279 C CB . VAL 87 87 ? B 13.797 6.832 -9.109 1.000 2 B VAL 81.750 1 +ATOM 1280 C CG1 . VAL 87 87 ? B 14.227 5.391 -8.844 1.000 2 B VAL 81.750 1 +ATOM 1281 C CG2 . VAL 87 87 ? B 14.062 7.215 -10.562 1.000 2 B VAL 81.750 1 +ATOM 1282 N N . TRP 88 88 ? B 12.320 7.258 -6.445 1.000 2 B TRP 80.750 1 +ATOM 1283 C CA . TRP 88 88 ? B 12.086 6.812 -5.074 1.000 2 B TRP 80.750 1 +ATOM 1284 C C . TRP 88 88 ? B 10.602 6.582 -4.820 1.000 2 B TRP 80.750 1 +ATOM 1285 O O . TRP 88 88 ? B 10.219 5.578 -4.219 1.000 2 B TRP 80.750 1 +ATOM 1286 C CB . TRP 88 88 ? B 12.641 7.836 -4.078 1.000 2 B TRP 80.750 1 +ATOM 1287 C CG . TRP 88 88 ? B 12.578 7.387 -2.646 1.000 2 B TRP 80.750 1 +ATOM 1288 C CD1 . TRP 88 88 ? B 13.531 6.691 -1.960 1.000 2 B TRP 80.750 1 +ATOM 1289 C CD2 . TRP 88 88 ? B 11.500 7.598 -1.731 1.000 2 B TRP 80.750 1 +ATOM 1290 N NE1 . TRP 88 88 ? B 13.109 6.457 -0.671 1.000 2 B TRP 80.750 1 +ATOM 1291 C CE2 . TRP 88 88 ? B 11.867 7.004 -0.505 1.000 2 B TRP 80.750 1 +ATOM 1292 C CE3 . TRP 88 88 ? B 10.250 8.234 -1.829 1.000 2 B TRP 80.750 1 +ATOM 1293 C CZ2 . TRP 88 88 ? B 11.031 7.027 0.617 1.000 2 B TRP 80.750 1 +ATOM 1294 C CZ3 . TRP 88 88 ? B 9.422 8.258 -0.712 1.000 2 B TRP 80.750 1 +ATOM 1295 C CH2 . TRP 88 88 ? B 9.820 7.656 0.494 1.000 2 B TRP 80.750 1 +ATOM 1296 N N . ASP 89 89 ? B 9.734 7.461 -5.312 1.000 2 B ASP 86.000 1 +ATOM 1297 C CA . ASP 89 89 ? B 8.289 7.332 -5.133 1.000 2 B ASP 86.000 1 +ATOM 1298 C C . ASP 89 89 ? B 7.746 6.113 -5.883 1.000 2 B ASP 86.000 1 +ATOM 1299 O O . ASP 89 89 ? B 6.824 5.449 -5.410 1.000 2 B ASP 86.000 1 +ATOM 1300 C CB . ASP 89 89 ? B 7.574 8.602 -5.605 1.000 2 B ASP 86.000 1 +ATOM 1301 C CG . ASP 89 89 ? B 7.785 9.781 -4.672 1.000 2 B ASP 86.000 1 +ATOM 1302 O OD1 . ASP 89 89 ? B 7.930 9.570 -3.449 1.000 2 B ASP 86.000 1 +ATOM 1303 O OD2 . ASP 89 89 ? B 7.801 10.930 -5.168 1.000 2 B ASP 86.000 1 +ATOM 1304 N N . GLN 90 90 ? B 8.305 5.906 -7.004 1.000 2 B GLN 78.940 1 +ATOM 1305 C CA . GLN 90 90 ? B 7.906 4.719 -7.750 1.000 2 B GLN 78.940 1 +ATOM 1306 C C . GLN 90 90 ? B 8.141 3.451 -6.934 1.000 2 B GLN 78.940 1 +ATOM 1307 O O . GLN 90 90 ? B 7.285 2.562 -6.895 1.000 2 B GLN 78.940 1 +ATOM 1308 C CB . GLN 90 90 ? B 8.664 4.637 -9.078 1.000 2 B GLN 78.940 1 +ATOM 1309 C CG . GLN 90 90 ? B 8.266 5.715 -10.078 1.000 2 B GLN 78.940 1 +ATOM 1310 C CD . GLN 90 90 ? B 9.055 5.633 -11.367 1.000 2 B GLN 78.940 1 +ATOM 1311 O OE1 . GLN 90 90 ? B 9.359 4.539 -11.859 1.000 2 B GLN 78.940 1 +ATOM 1312 N NE2 . GLN 90 90 ? B 9.398 6.785 -11.938 1.000 2 B GLN 78.940 1 +ATOM 1313 N N . GLU 91 91 ? B 9.312 3.416 -6.359 1.000 2 B GLU 78.380 1 +ATOM 1314 C CA . GLU 91 91 ? B 9.648 2.260 -5.535 1.000 2 B GLU 78.380 1 +ATOM 1315 C C . GLU 91 91 ? B 8.781 2.211 -4.281 1.000 2 B GLU 78.380 1 +ATOM 1316 O O . GLU 91 91 ? B 8.312 1.142 -3.885 1.000 2 B GLU 78.380 1 +ATOM 1317 C CB . GLU 91 91 ? B 11.125 2.279 -5.148 1.000 2 B GLU 78.380 1 +ATOM 1318 C CG . GLU 91 91 ? B 11.641 0.952 -4.613 1.000 2 B GLU 78.380 1 +ATOM 1319 C CD . GLU 91 91 ? B 11.703 -0.139 -5.668 1.000 2 B GLU 78.380 1 +ATOM 1320 O OE1 . GLU 91 91 ? B 11.719 -1.337 -5.305 1.000 2 B GLU 78.380 1 +ATOM 1321 O OE2 . GLU 91 91 ? B 11.734 0.207 -6.871 1.000 2 B GLU 78.380 1 +ATOM 1322 N N . PHE 92 92 ? B 8.500 3.395 -3.711 1.000 2 B PHE 81.000 1 +ATOM 1323 C CA . PHE 92 92 ? B 7.684 3.525 -2.514 1.000 2 B PHE 81.000 1 +ATOM 1324 C C . PHE 92 92 ? B 6.258 3.057 -2.777 1.000 2 B PHE 81.000 1 +ATOM 1325 O O . PHE 92 92 ? B 5.645 2.408 -1.927 1.000 2 B PHE 81.000 1 +ATOM 1326 C CB . PHE 92 92 ? B 7.676 4.977 -2.023 1.000 2 B PHE 81.000 1 +ATOM 1327 C CG . PHE 92 92 ? B 6.824 5.203 -0.802 1.000 2 B PHE 81.000 1 +ATOM 1328 C CD1 . PHE 92 92 ? B 5.539 5.719 -0.917 1.000 2 B PHE 81.000 1 +ATOM 1329 C CD2 . PHE 92 92 ? B 7.309 4.891 0.462 1.000 2 B PHE 81.000 1 +ATOM 1330 C CE1 . PHE 92 92 ? B 4.750 5.930 0.211 1.000 2 B PHE 81.000 1 +ATOM 1331 C CE2 . PHE 92 92 ? B 6.527 5.098 1.595 1.000 2 B PHE 81.000 1 +ATOM 1332 C CZ . PHE 92 92 ? B 5.246 5.617 1.467 1.000 2 B PHE 81.000 1 +ATOM 1333 N N . LEU 93 93 ? B 5.797 3.293 -3.990 1.000 2 B LEU 83.190 1 +ATOM 1334 C CA . LEU 93 93 ? B 4.398 3.027 -4.309 1.000 2 B LEU 83.190 1 +ATOM 1335 C C . LEU 93 93 ? B 4.242 1.664 -4.973 1.000 2 B LEU 83.190 1 +ATOM 1336 O O . LEU 93 93 ? B 3.174 1.346 -5.504 1.000 2 B LEU 83.190 1 +ATOM 1337 C CB . LEU 93 93 ? B 3.836 4.121 -5.219 1.000 2 B LEU 83.190 1 +ATOM 1338 C CG . LEU 93 93 ? B 3.670 5.508 -4.594 1.000 2 B LEU 83.190 1 +ATOM 1339 C CD1 . LEU 93 93 ? B 3.328 6.535 -5.664 1.000 2 B LEU 83.190 1 +ATOM 1340 C CD2 . LEU 93 93 ? B 2.596 5.477 -3.510 1.000 2 B LEU 83.190 1 +ATOM 1341 N N . LYS 94 94 ? B 5.355 0.990 -4.957 1.000 2 B LYS 73.690 1 +ATOM 1342 C CA . LYS 94 94 ? B 5.254 -0.395 -5.410 1.000 2 B LYS 73.690 1 +ATOM 1343 C C . LYS 94 94 ? B 4.602 -1.272 -4.344 1.000 2 B LYS 73.690 1 +ATOM 1344 O O . LYS 94 94 ? B 5.270 -2.100 -3.719 1.000 2 B LYS 73.690 1 +ATOM 1345 C CB . LYS 94 94 ? B 6.637 -0.945 -5.766 1.000 2 B LYS 73.690 1 +ATOM 1346 C CG . LYS 94 94 ? B 6.613 -2.039 -6.824 1.000 2 B LYS 73.690 1 +ATOM 1347 C CD . LYS 94 94 ? B 8.016 -2.443 -7.242 1.000 2 B LYS 73.690 1 +ATOM 1348 C CE . LYS 94 94 ? B 7.996 -3.566 -8.273 1.000 2 B LYS 73.690 1 +ATOM 1349 N NZ . LYS 94 94 ? B 9.367 -3.875 -8.781 1.000 2 B LYS 73.690 1 +ATOM 1350 N N . VAL 95 95 ? B 3.377 -0.963 -4.031 1.000 2 B VAL 69.940 1 +ATOM 1351 C CA . VAL 95 95 ? B 2.578 -1.674 -3.037 1.000 2 B VAL 69.940 1 +ATOM 1352 C C . VAL 95 95 ? B 1.359 -2.303 -3.709 1.000 2 B VAL 69.940 1 +ATOM 1353 O O . VAL 95 95 ? B 1.105 -2.068 -4.895 1.000 2 B VAL 69.940 1 +ATOM 1354 C CB . VAL 95 95 ? B 2.133 -0.738 -1.893 1.000 2 B VAL 69.940 1 +ATOM 1355 C CG1 . VAL 95 95 ? B 3.342 -0.200 -1.131 1.000 2 B VAL 69.940 1 +ATOM 1356 C CG2 . VAL 95 95 ? B 1.289 0.412 -2.441 1.000 2 B VAL 69.940 1 +ATOM 1357 N N . ASP 96 96 ? B 0.685 -3.127 -3.018 1.000 2 B ASP 64.250 1 +ATOM 1358 C CA . ASP 96 96 ? B -0.537 -3.713 -3.561 1.000 2 B ASP 64.250 1 +ATOM 1359 C C . ASP 96 96 ? B -1.641 -2.666 -3.684 1.000 2 B ASP 64.250 1 +ATOM 1360 O O . ASP 96 96 ? B -1.559 -1.594 -3.080 1.000 2 B ASP 64.250 1 +ATOM 1361 C CB . ASP 96 96 ? B -1.010 -4.875 -2.684 1.000 2 B ASP 64.250 1 +ATOM 1362 C CG . ASP 96 96 ? B -1.460 -4.430 -1.302 1.000 2 B ASP 64.250 1 +ATOM 1363 O OD1 . ASP 96 96 ? B -2.668 -4.176 -1.106 1.000 2 B ASP 64.250 1 +ATOM 1364 O OD2 . ASP 96 96 ? B -0.598 -4.332 -0.402 1.000 2 B ASP 64.250 1 +ATOM 1365 N N . GLN 97 97 ? B -2.605 -2.914 -4.453 1.000 2 B GLN 68.690 1 +ATOM 1366 C CA . GLN 97 97 ? B -3.682 -1.973 -4.746 1.000 2 B GLN 68.690 1 +ATOM 1367 C C . GLN 97 97 ? B -4.398 -1.546 -3.467 1.000 2 B GLN 68.690 1 +ATOM 1368 O O . GLN 97 97 ? B -4.766 -0.379 -3.316 1.000 2 B GLN 68.690 1 +ATOM 1369 C CB . GLN 97 97 ? B -4.680 -2.584 -5.727 1.000 2 B GLN 68.690 1 +ATOM 1370 C CG . GLN 97 97 ? B -4.188 -2.617 -7.168 1.000 2 B GLN 68.690 1 +ATOM 1371 C CD . GLN 97 97 ? B -5.199 -3.217 -8.125 1.000 2 B GLN 68.690 1 +ATOM 1372 O OE1 . GLN 97 97 ? B -5.773 -4.277 -7.855 1.000 2 B GLN 68.690 1 +ATOM 1373 N NE2 . GLN 97 97 ? B -5.426 -2.543 -9.250 1.000 2 B GLN 68.690 1 +ATOM 1374 N N . GLY 98 98 ? B -4.629 -2.537 -2.703 1.000 2 B GLY 67.310 1 +ATOM 1375 C CA . GLY 98 98 ? B -5.273 -2.197 -1.446 1.000 2 B GLY 67.310 1 +ATOM 1376 C C . GLY 98 98 ? B -4.531 -1.134 -0.661 1.000 2 B GLY 67.310 1 +ATOM 1377 O O . GLY 98 98 ? B -5.137 -0.168 -0.189 1.000 2 B GLY 67.310 1 +ATOM 1378 N N . THR 99 99 ? B -3.238 -1.430 -0.528 1.000 2 B THR 77.190 1 +ATOM 1379 C CA . THR 99 99 ? B -2.393 -0.473 0.178 1.000 2 B THR 77.190 1 +ATOM 1380 C C . THR 99 99 ? B -2.373 0.869 -0.548 1.000 2 B THR 77.190 1 +ATOM 1381 O O . THR 99 99 ? B -2.430 1.925 0.086 1.000 2 B THR 77.190 1 +ATOM 1382 C CB . THR 99 99 ? B -0.954 -1.001 0.326 1.000 2 B THR 77.190 1 +ATOM 1383 O OG1 . THR 99 99 ? B -0.982 -2.262 1.006 1.000 2 B THR 77.190 1 +ATOM 1384 C CG2 . THR 99 99 ? B -0.090 -0.027 1.121 1.000 2 B THR 77.190 1 +ATOM 1385 N N . LEU 100 100 ? B -2.307 0.826 -1.896 1.000 2 B LEU 82.000 1 +ATOM 1386 C CA . LEU 100 100 ? B -2.301 2.057 -2.680 1.000 2 B LEU 82.000 1 +ATOM 1387 C C . LEU 100 100 ? B -3.572 2.861 -2.438 1.000 2 B LEU 82.000 1 +ATOM 1388 O O . LEU 100 100 ? B -3.514 4.078 -2.240 1.000 2 B LEU 82.000 1 +ATOM 1389 C CB . LEU 100 100 ? B -2.160 1.741 -4.172 1.000 2 B LEU 82.000 1 +ATOM 1390 C CG . LEU 100 100 ? B -2.080 2.943 -5.113 1.000 2 B LEU 82.000 1 +ATOM 1391 C CD1 . LEU 100 100 ? B -0.885 3.820 -4.754 1.000 2 B LEU 82.000 1 +ATOM 1392 C CD2 . LEU 100 100 ? B -1.991 2.480 -6.562 1.000 2 B LEU 82.000 1 +ATOM 1393 N N . PHE 101 101 ? B -4.637 2.201 -2.344 1.000 2 B PHE 80.620 1 +ATOM 1394 C CA . PHE 101 101 ? B -5.914 2.865 -2.105 1.000 2 B PHE 80.620 1 +ATOM 1395 C C . PHE 101 101 ? B -5.949 3.488 -0.715 1.000 2 B PHE 80.620 1 +ATOM 1396 O O . PHE 101 101 ? B -6.395 4.625 -0.550 1.000 2 B PHE 80.620 1 +ATOM 1397 C CB . PHE 101 101 ? B -7.074 1.878 -2.270 1.000 2 B PHE 80.620 1 +ATOM 1398 C CG . PHE 101 101 ? B -7.535 1.714 -3.691 1.000 2 B PHE 80.620 1 +ATOM 1399 C CD1 . PHE 101 101 ? B -8.195 2.748 -4.348 1.000 2 B PHE 80.620 1 +ATOM 1400 C CD2 . PHE 101 101 ? B -7.309 0.526 -4.375 1.000 2 B PHE 80.620 1 +ATOM 1401 C CE1 . PHE 101 101 ? B -8.625 2.598 -5.664 1.000 2 B PHE 80.620 1 +ATOM 1402 C CE2 . PHE 101 101 ? B -7.734 0.369 -5.691 1.000 2 B PHE 80.620 1 +ATOM 1403 C CZ . PHE 101 101 ? B -8.391 1.406 -6.332 1.000 2 B PHE 80.620 1 +ATOM 1404 N N . GLU 102 102 ? B -5.504 2.660 0.245 1.000 2 B GLU 83.120 1 +ATOM 1405 C CA . GLU 102 102 ? B -5.465 3.188 1.605 1.000 2 B GLU 83.120 1 +ATOM 1406 C C . GLU 102 102 ? B -4.547 4.402 1.701 1.000 2 B GLU 83.120 1 +ATOM 1407 O O . GLU 102 102 ? B -4.836 5.352 2.432 1.000 2 B GLU 83.120 1 +ATOM 1408 C CB . GLU 102 102 ? B -5.016 2.105 2.590 1.000 2 B GLU 83.120 1 +ATOM 1409 C CG . GLU 102 102 ? B -6.047 1.008 2.812 1.000 2 B GLU 83.120 1 +ATOM 1410 C CD . GLU 102 102 ? B -5.684 0.070 3.953 1.000 2 B GLU 83.120 1 +ATOM 1411 O OE1 . GLU 102 102 ? B -6.598 -0.372 4.688 1.000 2 B GLU 83.120 1 +ATOM 1412 O OE2 . GLU 102 102 ? B -4.480 -0.226 4.117 1.000 2 B GLU 83.120 1 +ATOM 1413 N N . LEU 103 103 ? B -3.479 4.293 0.898 1.000 2 B LEU 90.190 1 +ATOM 1414 C CA . LEU 103 103 ? B -2.564 5.430 0.862 1.000 2 B LEU 90.190 1 +ATOM 1415 C C . LEU 103 103 ? B -3.240 6.652 0.246 1.000 2 B LEU 90.190 1 +ATOM 1416 O O . LEU 103 103 ? B -3.082 7.770 0.740 1.000 2 B LEU 90.190 1 +ATOM 1417 C CB . LEU 103 103 ? B -1.302 5.078 0.072 1.000 2 B LEU 90.190 1 +ATOM 1418 C CG . LEU 103 103 ? B -0.235 4.277 0.822 1.000 2 B LEU 90.190 1 +ATOM 1419 C CD1 . LEU 103 103 ? B 0.874 3.848 -0.134 1.000 2 B LEU 90.190 1 +ATOM 1420 C CD2 . LEU 103 103 ? B 0.335 5.094 1.977 1.000 2 B LEU 90.190 1 +ATOM 1421 N N . ILE 104 104 ? B -3.953 6.441 -0.834 1.000 2 B ILE 87.880 1 +ATOM 1422 C CA . ILE 104 104 ? B -4.660 7.539 -1.484 1.000 2 B ILE 87.880 1 +ATOM 1423 C C . ILE 104 104 ? B -5.668 8.148 -0.514 1.000 2 B ILE 87.880 1 +ATOM 1424 O O . ILE 104 104 ? B -5.727 9.375 -0.357 1.000 2 B ILE 87.880 1 +ATOM 1425 C CB . ILE 104 104 ? B -5.371 7.066 -2.771 1.000 2 B ILE 87.880 1 +ATOM 1426 C CG1 . ILE 104 104 ? B -4.344 6.703 -3.848 1.000 2 B ILE 87.880 1 +ATOM 1427 C CG2 . ILE 104 104 ? B -6.340 8.141 -3.279 1.000 2 B ILE 87.880 1 +ATOM 1428 C CD1 . ILE 104 104 ? B -4.934 5.977 -5.047 1.000 2 B ILE 87.880 1 +ATOM 1429 N N . LEU 105 105 ? B -6.355 7.309 0.181 1.000 2 B LEU 86.880 1 +ATOM 1430 C CA . LEU 105 105 ? B -7.348 7.773 1.147 1.000 2 B LEU 86.880 1 +ATOM 1431 C C . LEU 105 105 ? B -6.680 8.555 2.277 1.000 2 B LEU 86.880 1 +ATOM 1432 O O . LEU 105 105 ? B -7.156 9.625 2.660 1.000 2 B LEU 86.880 1 +ATOM 1433 C CB . LEU 105 105 ? B -8.133 6.590 1.721 1.000 2 B LEU 86.880 1 +ATOM 1434 C CG . LEU 105 105 ? B -9.078 5.875 0.757 1.000 2 B LEU 86.880 1 +ATOM 1435 C CD1 . LEU 105 105 ? B -9.750 4.695 1.453 1.000 2 B LEU 86.880 1 +ATOM 1436 C CD2 . LEU 105 105 ? B -10.125 6.848 0.213 1.000 2 B LEU 86.880 1 +ATOM 1437 N N . ALA 106 106 ? B -5.602 7.973 2.711 1.000 2 B ALA 92.120 1 +ATOM 1438 C CA . ALA 106 106 ? B -4.871 8.648 3.779 1.000 2 B ALA 92.120 1 +ATOM 1439 C C . ALA 106 106 ? B -4.324 9.992 3.305 1.000 2 B ALA 92.120 1 +ATOM 1440 O O . ALA 106 106 ? B -4.422 11.000 4.016 1.000 2 B ALA 92.120 1 +ATOM 1441 C CB . ALA 106 106 ? B -3.732 7.766 4.289 1.000 2 B ALA 92.120 1 +ATOM 1442 N N . ALA 107 107 ? B -3.828 10.023 2.127 1.000 2 B ALA 92.560 1 +ATOM 1443 C CA . ALA 107 107 ? B -3.285 11.266 1.568 1.000 2 B ALA 92.560 1 +ATOM 1444 C C . ALA 107 107 ? B -4.375 12.320 1.421 1.000 2 B ALA 92.560 1 +ATOM 1445 O O . ALA 107 107 ? B -4.141 13.500 1.684 1.000 2 B ALA 92.560 1 +ATOM 1446 C CB . ALA 107 107 ? B -2.625 10.992 0.219 1.000 2 B ALA 92.560 1 +ATOM 1447 N N . ASN 108 108 ? B -5.473 11.945 0.963 1.000 2 B ASN 88.810 1 +ATOM 1448 C CA . ASN 108 108 ? B -6.605 12.859 0.839 1.000 2 B ASN 88.810 1 +ATOM 1449 C C . ASN 108 108 ? B -7.094 13.336 2.203 1.000 2 B ASN 88.810 1 +ATOM 1450 O O . ASN 108 108 ? B -7.258 14.539 2.422 1.000 2 B ASN 88.810 1 +ATOM 1451 C CB . ASN 108 108 ? B -7.750 12.188 0.075 1.000 2 B ASN 88.810 1 +ATOM 1452 C CG . ASN 108 108 ? B -8.867 13.148 -0.264 1.000 2 B ASN 88.810 1 +ATOM 1453 O OD1 . ASN 108 108 ? B -8.680 14.367 -0.266 1.000 2 B ASN 88.810 1 +ATOM 1454 N ND2 . ASN 108 108 ? B -10.047 12.609 -0.554 1.000 2 B ASN 88.810 1 +ATOM 1455 N N . TYR 109 109 ? B -7.238 12.320 3.109 1.000 2 B TYR 89.380 1 +ATOM 1456 C CA . TYR 109 109 ? B -7.691 12.641 4.461 1.000 2 B TYR 89.380 1 +ATOM 1457 C C . TYR 109 109 ? B -6.746 13.625 5.137 1.000 2 B TYR 89.380 1 +ATOM 1458 O O . TYR 109 109 ? B -7.188 14.594 5.758 1.000 2 B TYR 89.380 1 +ATOM 1459 C CB . TYR 109 109 ? B -7.809 11.359 5.301 1.000 2 B TYR 89.380 1 +ATOM 1460 C CG . TYR 109 109 ? B -8.094 11.617 6.762 1.000 2 B TYR 89.380 1 +ATOM 1461 C CD1 . TYR 109 109 ? B -7.148 11.312 7.738 1.000 2 B TYR 89.380 1 +ATOM 1462 C CD2 . TYR 109 109 ? B -9.305 12.172 7.164 1.000 2 B TYR 89.380 1 +ATOM 1463 C CE1 . TYR 109 109 ? B -7.406 11.547 9.086 1.000 2 B TYR 89.380 1 +ATOM 1464 C CE2 . TYR 109 109 ? B -9.570 12.406 8.508 1.000 2 B TYR 89.380 1 +ATOM 1465 C CZ . TYR 109 109 ? B -8.617 12.094 9.461 1.000 2 B TYR 89.380 1 +ATOM 1466 O OH . TYR 109 109 ? B -8.875 12.328 10.789 1.000 2 B TYR 89.380 1 +ATOM 1467 N N . LEU 110 110 ? B -5.461 13.398 4.922 1.000 2 B LEU 92.690 1 +ATOM 1468 C CA . LEU 110 110 ? B -4.453 14.219 5.578 1.000 2 B LEU 92.690 1 +ATOM 1469 C C . LEU 110 110 ? B -4.062 15.406 4.695 1.000 2 B LEU 92.690 1 +ATOM 1470 O O . LEU 110 110 ? B -3.273 16.250 5.109 1.000 2 B LEU 92.690 1 +ATOM 1471 C CB . LEU 110 110 ? B -3.215 13.391 5.914 1.000 2 B LEU 92.690 1 +ATOM 1472 C CG . LEU 110 110 ? B -3.387 12.320 6.996 1.000 2 B LEU 92.690 1 +ATOM 1473 C CD1 . LEU 110 110 ? B -2.160 11.422 7.051 1.000 2 B LEU 92.690 1 +ATOM 1474 C CD2 . LEU 110 110 ? B -3.641 12.969 8.352 1.000 2 B LEU 92.690 1 +ATOM 1475 N N . ASP 111 111 ? B -4.621 15.547 3.576 1.000 2 B ASP 88.500 1 +ATOM 1476 C CA . ASP 111 111 ? B -4.387 16.625 2.627 1.000 2 B ASP 88.500 1 +ATOM 1477 C C . ASP 111 111 ? B -2.896 16.797 2.342 1.000 2 B ASP 88.500 1 +ATOM 1478 O O . ASP 111 111 ? B -2.342 17.875 2.520 1.000 2 B ASP 88.500 1 +ATOM 1479 C CB . ASP 111 111 ? B -4.973 17.938 3.152 1.000 2 B ASP 88.500 1 +ATOM 1480 C CG . ASP 111 111 ? B -5.059 19.016 2.088 1.000 2 B ASP 88.500 1 +ATOM 1481 O OD1 . ASP 111 111 ? B -4.633 18.781 0.938 1.000 2 B ASP 88.500 1 +ATOM 1482 O OD2 . ASP 111 111 ? B -5.551 20.125 2.402 1.000 2 B ASP 88.500 1 +ATOM 1483 N N . ILE 112 112 ? B -2.301 15.680 1.958 1.000 2 B ILE 92.560 1 +ATOM 1484 C CA . ILE 112 112 ? B -0.917 15.688 1.499 1.000 2 B ILE 92.560 1 +ATOM 1485 C C . ILE 112 112 ? B -0.879 15.664 -0.028 1.000 2 B ILE 92.560 1 +ATOM 1486 O O . ILE 112 112 ? B -0.891 14.602 -0.644 1.000 2 B ILE 92.560 1 +ATOM 1487 C CB . ILE 112 112 ? B -0.129 14.484 2.068 1.000 2 B ILE 92.560 1 +ATOM 1488 C CG1 . ILE 112 112 ? B -0.241 14.445 3.596 1.000 2 B ILE 92.560 1 +ATOM 1489 C CG2 . ILE 112 112 ? B 1.337 14.539 1.628 1.000 2 B ILE 92.560 1 +ATOM 1490 C CD1 . ILE 112 112 ? B 0.331 13.188 4.227 1.000 2 B ILE 92.560 1 +ATOM 1491 N N . LYS 113 113 ? B -0.780 16.797 -0.537 1.000 2 B LYS 87.120 1 +ATOM 1492 C CA . LYS 113 113 ? B -0.952 16.984 -1.975 1.000 2 B LYS 87.120 1 +ATOM 1493 C C . LYS 113 113 ? B 0.109 16.219 -2.762 1.000 2 B LYS 87.120 1 +ATOM 1494 O O . LYS 113 113 ? B -0.201 15.578 -3.764 1.000 2 B LYS 87.120 1 +ATOM 1495 C CB . LYS 113 113 ? B -0.899 18.469 -2.330 1.000 2 B LYS 87.120 1 +ATOM 1496 C CG . LYS 113 113 ? B -2.084 19.281 -1.812 1.000 2 B LYS 87.120 1 +ATOM 1497 C CD . LYS 113 113 ? B -2.062 20.703 -2.338 1.000 2 B LYS 87.120 1 +ATOM 1498 C CE . LYS 113 113 ? B -3.285 21.484 -1.879 1.000 2 B LYS 87.120 1 +ATOM 1499 N NZ . LYS 113 113 ? B -3.141 22.953 -2.152 1.000 2 B LYS 87.120 1 +ATOM 1500 N N . GLY 114 114 ? B 1.335 16.266 -2.330 1.000 2 B GLY 88.880 1 +ATOM 1501 C CA . GLY 114 114 ? B 2.395 15.547 -3.018 1.000 2 B GLY 88.880 1 +ATOM 1502 C C . GLY 114 114 ? B 2.148 14.055 -3.098 1.000 2 B GLY 88.880 1 +ATOM 1503 O O . GLY 114 114 ? B 2.248 13.453 -4.172 1.000 2 B GLY 88.880 1 +ATOM 1504 N N . LEU 115 115 ? B 1.807 13.516 -1.985 1.000 2 B LEU 92.440 1 +ATOM 1505 C CA . LEU 115 115 ? B 1.548 12.086 -1.922 1.000 2 B LEU 92.440 1 +ATOM 1506 C C . LEU 115 115 ? B 0.305 11.719 -2.727 1.000 2 B LEU 92.440 1 +ATOM 1507 O O . LEU 115 115 ? B 0.298 10.719 -3.449 1.000 2 B LEU 92.440 1 +ATOM 1508 C CB . LEU 115 115 ? B 1.378 11.633 -0.469 1.000 2 B LEU 92.440 1 +ATOM 1509 C CG . LEU 115 115 ? B 1.144 10.141 -0.245 1.000 2 B LEU 92.440 1 +ATOM 1510 C CD1 . LEU 115 115 ? B 2.283 9.328 -0.852 1.000 2 B LEU 92.440 1 +ATOM 1511 C CD2 . LEU 115 115 ? B 0.998 9.836 1.242 1.000 2 B LEU 92.440 1 +ATOM 1512 N N . LEU 116 116 ? B -0.753 12.516 -2.646 1.000 2 B LEU 90.060 1 +ATOM 1513 C CA . LEU 116 116 ? B -1.970 12.289 -3.418 1.000 2 B LEU 90.060 1 +ATOM 1514 C C . LEU 116 116 ? B -1.676 12.305 -4.914 1.000 2 B LEU 90.060 1 +ATOM 1515 O O . LEU 116 116 ? B -2.143 11.438 -5.652 1.000 2 B LEU 90.060 1 +ATOM 1516 C CB . LEU 116 116 ? B -3.025 13.344 -3.082 1.000 2 B LEU 90.060 1 +ATOM 1517 C CG . LEU 116 116 ? B -4.398 13.164 -3.727 1.000 2 B LEU 90.060 1 +ATOM 1518 C CD1 . LEU 116 116 ? B -5.004 11.828 -3.312 1.000 2 B LEU 90.060 1 +ATOM 1519 C CD2 . LEU 116 116 ? B -5.324 14.312 -3.352 1.000 2 B LEU 90.060 1 +ATOM 1520 N N . ASP 117 117 ? B -0.819 13.211 -5.242 1.000 2 B ASP 86.690 1 +ATOM 1521 C CA . ASP 117 117 ? B -0.478 13.344 -6.656 1.000 2 B ASP 86.690 1 +ATOM 1522 C C . ASP 117 117 ? B 0.225 12.086 -7.168 1.000 2 B ASP 86.690 1 +ATOM 1523 O O . ASP 117 117 ? B -0.208 11.484 -8.156 1.000 2 B ASP 86.690 1 +ATOM 1524 C CB . ASP 117 117 ? B 0.406 14.570 -6.883 1.000 2 B ASP 86.690 1 +ATOM 1525 C CG . ASP 117 117 ? B 0.704 14.820 -8.352 1.000 2 B ASP 86.690 1 +ATOM 1526 O OD1 . ASP 117 117 ? B -0.239 15.086 -9.133 1.000 2 B ASP 86.690 1 +ATOM 1527 O OD2 . ASP 117 117 ? B 1.893 14.758 -8.734 1.000 2 B ASP 86.690 1 +ATOM 1528 N N . VAL 118 118 ? B 1.255 11.703 -6.445 1.000 2 B VAL 89.060 1 +ATOM 1529 C CA . VAL 118 118 ? B 2.059 10.586 -6.934 1.000 2 B VAL 89.060 1 +ATOM 1530 C C . VAL 118 118 ? B 1.252 9.297 -6.848 1.000 2 B VAL 89.060 1 +ATOM 1531 O O . VAL 118 118 ? B 1.382 8.414 -7.707 1.000 2 B VAL 89.060 1 +ATOM 1532 C CB . VAL 118 118 ? B 3.377 10.445 -6.141 1.000 2 B VAL 89.060 1 +ATOM 1533 C CG1 . VAL 118 118 ? B 4.305 11.625 -6.426 1.000 2 B VAL 89.060 1 +ATOM 1534 C CG2 . VAL 118 118 ? B 3.090 10.336 -4.645 1.000 2 B VAL 89.060 1 +ATOM 1535 N N . THR 119 119 ? B 0.431 9.148 -5.836 1.000 2 B THR 88.690 1 +ATOM 1536 C CA . THR 119 119 ? B -0.394 7.949 -5.734 1.000 2 B THR 88.690 1 +ATOM 1537 C C . THR 119 119 ? B -1.421 7.902 -6.863 1.000 2 B THR 88.690 1 +ATOM 1538 O O . THR 119 119 ? B -1.636 6.848 -7.469 1.000 2 B THR 88.690 1 +ATOM 1539 C CB . THR 119 119 ? B -1.116 7.883 -4.375 1.000 2 B THR 88.690 1 +ATOM 1540 O OG1 . THR 119 119 ? B -1.792 9.125 -4.141 1.000 2 B THR 88.690 1 +ATOM 1541 C CG2 . THR 119 119 ? B -0.130 7.629 -3.240 1.000 2 B THR 88.690 1 +ATOM 1542 N N . CYS 120 120 ? B -2.016 8.977 -7.160 1.000 2 B CYS 84.190 1 +ATOM 1543 C CA . CYS 120 120 ? B -2.982 9.047 -8.250 1.000 2 B CYS 84.190 1 +ATOM 1544 C C . CYS 120 120 ? B -2.320 8.750 -9.586 1.000 2 B CYS 84.190 1 +ATOM 1545 O O . CYS 120 120 ? B -2.877 8.023 -10.414 1.000 2 B CYS 84.190 1 +ATOM 1546 C CB . CYS 120 120 ? B -3.641 10.430 -8.297 1.000 2 B CYS 84.190 1 +ATOM 1547 S SG . CYS 120 120 ? B -4.836 10.703 -6.969 1.000 2 B CYS 84.190 1 +ATOM 1548 N N . LYS 121 121 ? B -1.180 9.328 -9.766 1.000 2 B LYS 82.500 1 +ATOM 1549 C CA . LYS 121 121 ? B -0.440 9.055 -10.992 1.000 2 B LYS 82.500 1 +ATOM 1550 C C . LYS 121 121 ? B -0.111 7.566 -11.117 1.000 2 B LYS 82.500 1 +ATOM 1551 O O . LYS 121 121 ? B -0.172 7.000 -12.211 1.000 2 B LYS 82.500 1 +ATOM 1552 C CB . LYS 121 121 ? B 0.846 9.875 -11.047 1.000 2 B LYS 82.500 1 +ATOM 1553 C CG . LYS 121 121 ? B 0.623 11.352 -11.359 1.000 2 B LYS 82.500 1 +ATOM 1554 C CD . LYS 121 121 ? B 1.923 12.141 -11.281 1.000 2 B LYS 82.500 1 +ATOM 1555 C CE . LYS 121 121 ? B 1.704 13.609 -11.617 1.000 2 B LYS 82.500 1 +ATOM 1556 N NZ . LYS 121 121 ? B 2.938 14.422 -11.383 1.000 2 B LYS 82.500 1 +ATOM 1557 N N . THR 122 122 ? B 0.209 7.023 -10.031 1.000 2 B THR 82.750 1 +ATOM 1558 C CA . THR 122 122 ? B 0.506 5.594 -10.016 1.000 2 B THR 82.750 1 +ATOM 1559 C C . THR 122 122 ? B -0.702 4.785 -10.484 1.000 2 B THR 82.750 1 +ATOM 1560 O O . THR 122 122 ? B -0.568 3.887 -11.320 1.000 2 B THR 82.750 1 +ATOM 1561 C CB . THR 122 122 ? B 0.928 5.121 -8.617 1.000 2 B THR 82.750 1 +ATOM 1562 O OG1 . THR 122 122 ? B 2.111 5.824 -8.219 1.000 2 B THR 82.750 1 +ATOM 1563 C CG2 . THR 122 122 ? B 1.213 3.623 -8.602 1.000 2 B THR 82.750 1 +ATOM 1564 N N . VAL 123 123 ? B -1.856 5.129 -9.992 1.000 2 B VAL 79.060 1 +ATOM 1565 C CA . VAL 123 123 ? B -3.078 4.445 -10.406 1.000 2 B VAL 79.060 1 +ATOM 1566 C C . VAL 123 123 ? B -3.316 4.672 -11.898 1.000 2 B VAL 79.060 1 +ATOM 1567 O O . VAL 123 123 ? B -3.645 3.736 -12.625 1.000 2 B VAL 79.060 1 +ATOM 1568 C CB . VAL 123 123 ? B -4.301 4.926 -9.586 1.000 2 B VAL 79.060 1 +ATOM 1569 C CG1 . VAL 123 123 ? B -5.598 4.383 -10.188 1.000 2 B VAL 79.060 1 +ATOM 1570 C CG2 . VAL 123 123 ? B -4.164 4.504 -8.125 1.000 2 B VAL 79.060 1 +ATOM 1571 N N . ALA 124 124 ? B -3.076 5.883 -12.328 1.000 2 B ALA 77.250 1 +ATOM 1572 C CA . ALA 124 124 ? B -3.250 6.211 -13.742 1.000 2 B ALA 77.250 1 +ATOM 1573 C C . ALA 124 124 ? B -2.346 5.352 -14.617 1.000 2 B ALA 77.250 1 +ATOM 1574 O O . ALA 124 124 ? B -2.779 4.840 -15.656 1.000 2 B ALA 77.250 1 +ATOM 1575 C CB . ALA 124 124 ? B -2.967 7.691 -13.984 1.000 2 B ALA 77.250 1 +ATOM 1576 N N . ASN 125 125 ? B -1.164 5.156 -14.141 1.000 2 B ASN 74.250 1 +ATOM 1577 C CA . ASN 125 125 ? B -0.204 4.352 -14.891 1.000 2 B ASN 74.250 1 +ATOM 1578 C C . ASN 125 125 ? B -0.601 2.877 -14.906 1.000 2 B ASN 74.250 1 +ATOM 1579 O O . ASN 125 125 ? B -0.335 2.172 -15.883 1.000 2 B ASN 74.250 1 +ATOM 1580 C CB . ASN 125 125 ? B 1.202 4.516 -14.305 1.000 2 B ASN 74.250 1 +ATOM 1581 C CG . ASN 125 125 ? B 1.877 5.793 -14.766 1.000 2 B ASN 74.250 1 +ATOM 1582 O OD1 . ASN 125 125 ? B 1.541 6.348 -15.812 1.000 2 B ASN 74.250 1 +ATOM 1583 N ND2 . ASN 125 125 ? B 2.832 6.273 -13.977 1.000 2 B ASN 74.250 1 +ATOM 1584 N N . MET 126 126 ? B -1.264 2.465 -13.836 1.000 2 B MET 70.310 1 +ATOM 1585 C CA . MET 126 126 ? B -1.716 1.080 -13.750 1.000 2 B MET 70.310 1 +ATOM 1586 C C . MET 126 126 ? B -2.879 0.828 -14.703 1.000 2 B MET 70.310 1 +ATOM 1587 O O . MET 126 126 ? B -3.035 -0.280 -15.227 1.000 2 B MET 70.310 1 +ATOM 1588 C CB . MET 126 126 ? B -2.129 0.734 -12.320 1.000 2 B MET 70.310 1 +ATOM 1589 C CG . MET 126 126 ? B -0.958 0.604 -11.359 1.000 2 B MET 70.310 1 +ATOM 1590 S SD . MET 126 126 ? B -1.480 0.066 -9.688 1.000 2 B MET 70.310 1 +ATOM 1591 C CE . MET 126 126 ? B -0.344 1.045 -8.664 1.000 2 B MET 70.310 1 +ATOM 1592 N N . ILE 127 127 ? B -3.672 1.869 -14.867 1.000 2 B ILE 68.810 1 +ATOM 1593 C CA . ILE 127 127 ? B -4.867 1.752 -15.695 1.000 2 B ILE 68.810 1 +ATOM 1594 C C . ILE 127 127 ? B -4.504 1.986 -17.156 1.000 2 B ILE 68.810 1 +ATOM 1595 O O . ILE 127 127 ? B -5.148 1.441 -18.062 1.000 2 B ILE 68.810 1 +ATOM 1596 C CB . ILE 127 127 ? B -5.965 2.746 -15.250 1.000 2 B ILE 68.810 1 +ATOM 1597 C CG1 . ILE 127 127 ? B -6.391 2.457 -13.805 1.000 2 B ILE 68.810 1 +ATOM 1598 C CG2 . ILE 127 127 ? B -7.164 2.686 -16.188 1.000 2 B ILE 68.810 1 +ATOM 1599 C CD1 . ILE 127 127 ? B -7.293 3.523 -13.195 1.000 2 B ILE 68.810 1 +ATOM 1600 N N . LYS 128 128 ? B -3.439 2.818 -17.234 1.000 2 B LYS 66.880 1 +ATOM 1601 C CA . LYS 128 128 ? B -3.059 3.180 -18.594 1.000 2 B LYS 66.880 1 +ATOM 1602 C C . LYS 128 128 ? B -2.641 1.947 -19.391 1.000 2 B LYS 66.880 1 +ATOM 1603 O O . LYS 128 128 ? B -1.796 1.169 -18.953 1.000 2 B LYS 66.880 1 +ATOM 1604 C CB . LYS 128 128 ? B -1.923 4.203 -18.578 1.000 2 B LYS 66.880 1 +ATOM 1605 C CG . LYS 128 128 ? B -1.874 5.090 -19.812 1.000 2 B LYS 66.880 1 +ATOM 1606 C CD . LYS 128 128 ? B -0.771 6.137 -19.719 1.000 2 B LYS 66.880 1 +ATOM 1607 C CE . LYS 128 128 ? B -0.672 6.977 -20.984 1.000 2 B LYS 66.880 1 +ATOM 1608 N NZ . LYS 128 128 ? B 0.357 8.047 -20.859 1.000 2 B LYS 66.880 1 +ATOM 1609 N N . GLY 129 129 ? B -3.270 1.808 -20.500 1.000 2 B GLY 64.310 1 +ATOM 1610 C CA . GLY 129 129 ? B -2.947 0.764 -21.453 1.000 2 B GLY 64.310 1 +ATOM 1611 C C . GLY 129 129 ? B -3.541 -0.584 -21.094 1.000 2 B GLY 64.310 1 +ATOM 1612 O O . GLY 129 129 ? B -3.242 -1.593 -21.734 1.000 2 B GLY 64.310 1 +ATOM 1613 N N . LYS 130 130 ? B -4.160 -0.646 -19.906 1.000 2 B LYS 62.160 1 +ATOM 1614 C CA . LYS 130 130 ? B -4.754 -1.929 -19.547 1.000 2 B LYS 62.160 1 +ATOM 1615 C C . LYS 130 130 ? B -6.176 -2.055 -20.078 1.000 2 B LYS 62.160 1 +ATOM 1616 O O . LYS 130 130 ? B -6.879 -1.054 -20.234 1.000 2 B LYS 62.160 1 +ATOM 1617 C CB . LYS 130 130 ? B -4.754 -2.102 -18.016 1.000 2 B LYS 62.160 1 +ATOM 1618 C CG . LYS 130 130 ? B -3.369 -2.307 -17.422 1.000 2 B LYS 62.160 1 +ATOM 1619 C CD . LYS 130 130 ? B -3.451 -2.838 -16.000 1.000 2 B LYS 62.160 1 +ATOM 1620 C CE . LYS 130 130 ? B -2.066 -3.061 -15.406 1.000 2 B LYS 62.160 1 +ATOM 1621 N NZ . LYS 130 130 ? B -2.139 -3.680 -14.047 1.000 2 B LYS 62.160 1 +ATOM 1622 N N . THR 131 131 ? B -6.367 -3.252 -20.516 1.000 2 B THR 61.280 1 +ATOM 1623 C CA . THR 131 131 ? B -7.723 -3.584 -20.938 1.000 2 B THR 61.280 1 +ATOM 1624 C C . THR 131 131 ? B -8.664 -3.662 -19.734 1.000 2 B THR 61.280 1 +ATOM 1625 O O . THR 131 131 ? B -8.219 -3.758 -18.594 1.000 2 B THR 61.280 1 +ATOM 1626 C CB . THR 131 131 ? B -7.754 -4.918 -21.703 1.000 2 B THR 61.280 1 +ATOM 1627 O OG1 . THR 131 131 ? B -7.332 -5.977 -20.828 1.000 2 B THR 61.280 1 +ATOM 1628 C CG2 . THR 131 131 ? B -6.828 -4.879 -22.922 1.000 2 B THR 61.280 1 +ATOM 1629 N N . PRO 132 132 ? B -9.977 -3.377 -20.016 1.000 2 B PRO 53.380 1 +ATOM 1630 C CA . PRO 132 132 ? B -10.930 -3.510 -18.906 1.000 2 B PRO 53.380 1 +ATOM 1631 C C . PRO 132 132 ? B -10.719 -4.781 -18.094 1.000 2 B PRO 53.380 1 +ATOM 1632 O O . PRO 132 132 ? B -10.867 -4.766 -16.875 1.000 2 B PRO 53.380 1 +ATOM 1633 C CB . PRO 132 132 ? B -12.289 -3.539 -19.625 1.000 2 B PRO 53.380 1 +ATOM 1634 C CG . PRO 132 132 ? B -12.047 -2.889 -20.938 1.000 2 B PRO 53.380 1 +ATOM 1635 C CD . PRO 132 132 ? B -10.602 -3.090 -21.312 1.000 2 B PRO 53.380 1 +ATOM 1636 N N . GLU 133 133 ? B -10.438 -5.785 -18.750 1.000 2 B GLU 54.750 1 +ATOM 1637 C CA . GLU 133 133 ? B -10.195 -7.059 -18.078 1.000 2 B GLU 54.750 1 +ATOM 1638 C C . GLU 133 133 ? B -8.992 -6.973 -17.156 1.000 2 B GLU 54.750 1 +ATOM 1639 O O . GLU 133 133 ? B -9.016 -7.496 -16.031 1.000 2 B GLU 54.750 1 +ATOM 1640 C CB . GLU 133 133 ? B -9.992 -8.180 -19.109 1.000 2 B GLU 54.750 1 +ATOM 1641 C CG . GLU 133 133 ? B -11.289 -8.859 -19.531 1.000 2 B GLU 54.750 1 +ATOM 1642 C CD . GLU 133 133 ? B -11.156 -9.609 -20.844 1.000 2 B GLU 54.750 1 +ATOM 1643 O OE1 . GLU 133 133 ? B -12.117 -10.312 -21.234 1.000 2 B GLU 54.750 1 +ATOM 1644 O OE2 . GLU 133 133 ? B -10.094 -9.492 -21.500 1.000 2 B GLU 54.750 1 +ATOM 1645 N N . GLU 134 134 ? B -8.070 -6.336 -17.641 1.000 2 B GLU 57.090 1 +ATOM 1646 C CA . GLU 134 134 ? B -6.848 -6.176 -16.859 1.000 2 B GLU 57.090 1 +ATOM 1647 C C . GLU 134 134 ? B -7.070 -5.258 -15.672 1.000 2 B GLU 57.090 1 +ATOM 1648 O O . GLU 134 134 ? B -6.535 -5.500 -14.586 1.000 2 B GLU 57.090 1 +ATOM 1649 C CB . GLU 134 134 ? B -5.719 -5.629 -17.750 1.000 2 B GLU 57.090 1 +ATOM 1650 C CG . GLU 134 134 ? B -5.098 -6.672 -18.656 1.000 2 B GLU 57.090 1 +ATOM 1651 C CD . GLU 134 134 ? B -4.062 -6.094 -19.609 1.000 2 B GLU 57.090 1 +ATOM 1652 O OE1 . GLU 134 134 ? B -3.006 -6.730 -19.828 1.000 2 B GLU 57.090 1 +ATOM 1653 O OE2 . GLU 134 134 ? B -4.312 -4.992 -20.156 1.000 2 B GLU 57.090 1 +ATOM 1654 N N . ILE 135 135 ? B -7.910 -4.219 -15.906 1.000 2 B ILE 54.340 1 +ATOM 1655 C CA . ILE 135 135 ? B -8.242 -3.275 -14.852 1.000 2 B ILE 54.340 1 +ATOM 1656 C C . ILE 135 135 ? B -9.000 -3.998 -13.734 1.000 2 B ILE 54.340 1 +ATOM 1657 O O . ILE 135 135 ? B -8.688 -3.824 -12.555 1.000 2 B ILE 54.340 1 +ATOM 1658 C CB . ILE 135 135 ? B -9.086 -2.096 -15.383 1.000 2 B ILE 54.340 1 +ATOM 1659 C CG1 . ILE 135 135 ? B -8.289 -1.306 -16.438 1.000 2 B ILE 54.340 1 +ATOM 1660 C CG2 . ILE 135 135 ? B -9.531 -1.187 -14.242 1.000 2 B ILE 54.340 1 +ATOM 1661 C CD1 . ILE 135 135 ? B -9.094 -0.230 -17.141 1.000 2 B ILE 54.340 1 +ATOM 1662 N N . ARG 136 136 ? B -9.844 -4.793 -14.148 1.000 2 B ARG 53.620 1 +ATOM 1663 C CA . ARG 136 136 ? B -10.625 -5.543 -13.172 1.000 2 B ARG 53.620 1 +ATOM 1664 C C . ARG 136 136 ? B -9.734 -6.445 -12.328 1.000 2 B ARG 53.620 1 +ATOM 1665 O O . ARG 136 136 ? B -9.922 -6.562 -11.117 1.000 2 B ARG 53.620 1 +ATOM 1666 C CB . ARG 136 136 ? B -11.695 -6.379 -13.875 1.000 2 B ARG 53.620 1 +ATOM 1667 C CG . ARG 136 136 ? B -12.977 -5.613 -14.172 1.000 2 B ARG 53.620 1 +ATOM 1668 C CD . ARG 136 136 ? B -14.023 -6.500 -14.828 1.000 2 B ARG 53.620 1 +ATOM 1669 N NE . ARG 136 136 ? B -15.164 -5.727 -15.305 1.000 2 B ARG 53.620 1 +ATOM 1670 C CZ . ARG 136 136 ? B -15.797 -5.934 -16.469 1.000 2 B ARG 53.620 1 +ATOM 1671 N NH1 . ARG 136 136 ? B -15.398 -6.902 -17.281 1.000 2 B ARG 53.620 1 +ATOM 1672 N NH2 . ARG 136 136 ? B -16.828 -5.168 -16.797 1.000 2 B ARG 53.620 1 +ATOM 1673 N N . LYS 137 137 ? B -8.938 -7.094 -13.039 1.000 2 B LYS 52.810 1 +ATOM 1674 C CA . LYS 137 137 ? B -8.008 -7.988 -12.359 1.000 2 B LYS 52.810 1 +ATOM 1675 C C . LYS 137 137 ? B -7.117 -7.223 -11.383 1.000 2 B LYS 52.810 1 +ATOM 1676 O O . LYS 137 137 ? B -6.840 -7.699 -10.281 1.000 2 B LYS 52.810 1 +ATOM 1677 C CB . LYS 137 137 ? B -7.145 -8.742 -13.375 1.000 2 B LYS 52.810 1 +ATOM 1678 C CG . LYS 137 137 ? B -6.793 -10.164 -12.961 1.000 2 B LYS 52.810 1 +ATOM 1679 C CD . LYS 137 137 ? B -5.973 -10.867 -14.031 1.000 2 B LYS 52.810 1 +ATOM 1680 C CE . LYS 137 137 ? B -5.559 -12.266 -13.586 1.000 2 B LYS 52.810 1 +ATOM 1681 N NZ . LYS 137 137 ? B -4.758 -12.961 -14.633 1.000 2 B LYS 52.810 1 +ATOM 1682 N N . THR 138 138 ? B -6.750 -6.023 -11.883 1.000 2 B THR 53.340 1 +ATOM 1683 C CA . THR 138 138 ? B -5.855 -5.176 -11.102 1.000 2 B THR 53.340 1 +ATOM 1684 C C . THR 138 138 ? B -6.555 -4.668 -9.844 1.000 2 B THR 53.340 1 +ATOM 1685 O O . THR 138 138 ? B -5.953 -4.621 -8.773 1.000 2 B THR 53.340 1 +ATOM 1686 C CB . THR 138 138 ? B -5.359 -3.979 -11.938 1.000 2 B THR 53.340 1 +ATOM 1687 O OG1 . THR 138 138 ? B -4.789 -4.457 -13.156 1.000 2 B THR 53.340 1 +ATOM 1688 C CG2 . THR 138 138 ? B -4.301 -3.184 -11.172 1.000 2 B THR 53.340 1 +ATOM 1689 N N . PHE 139 139 ? B -7.816 -4.324 -9.922 1.000 2 B PHE 49.060 1 +ATOM 1690 C CA . PHE 139 139 ? B -8.523 -3.680 -8.820 1.000 2 B PHE 49.060 1 +ATOM 1691 C C . PHE 139 139 ? B -9.328 -4.695 -8.023 1.000 2 B PHE 49.060 1 +ATOM 1692 O O . PHE 139 139 ? B -10.039 -4.332 -7.086 1.000 2 B PHE 49.060 1 +ATOM 1693 C CB . PHE 139 139 ? B -9.438 -2.570 -9.344 1.000 2 B PHE 49.060 1 +ATOM 1694 C CG . PHE 139 139 ? B -8.703 -1.344 -9.812 1.000 2 B PHE 49.060 1 +ATOM 1695 C CD1 . PHE 139 139 ? B -8.336 -0.350 -8.914 1.000 2 B PHE 49.060 1 +ATOM 1696 C CD2 . PHE 139 139 ? B -8.383 -1.185 -11.156 1.000 2 B PHE 49.060 1 +ATOM 1697 C CE1 . PHE 139 139 ? B -7.656 0.786 -9.352 1.000 2 B PHE 49.060 1 +ATOM 1698 C CE2 . PHE 139 139 ? B -7.703 -0.052 -11.594 1.000 2 B PHE 49.060 1 +ATOM 1699 C CZ . PHE 139 139 ? B -7.340 0.933 -10.688 1.000 2 B PHE 49.060 1 +ATOM 1700 N N . ASN 140 140 ? B -8.914 -6.023 -8.141 1.000 2 B ASN 44.690 1 +ATOM 1701 C CA . ASN 140 140 ? B -9.641 -7.090 -7.453 1.000 2 B ASN 44.690 1 +ATOM 1702 C C . ASN 140 140 ? B -11.070 -6.676 -7.133 1.000 2 B ASN 44.690 1 +ATOM 1703 O O . ASN 140 140 ? B -11.547 -6.895 -6.016 1.000 2 B ASN 44.690 1 +ATOM 1704 C CB . ASN 140 140 ? B -8.906 -7.504 -6.180 1.000 2 B ASN 44.690 1 +ATOM 1705 C CG . ASN 140 140 ? B -8.984 -9.000 -5.918 1.000 2 B ASN 44.690 1 +ATOM 1706 O OD1 . ASN 140 140 ? B -9.320 -9.781 -6.812 1.000 2 B ASN 44.690 1 +ATOM 1707 N ND2 . ASN 140 140 ? B -8.672 -9.406 -4.691 1.000 2 B ASN 44.690 1 +ATOM 1708 N N . ILE 141 141 ? B -11.812 -5.793 -7.684 1.000 2 B ILE 39.310 1 +ATOM 1709 C CA . ILE 141 141 ? B -13.227 -5.449 -7.551 1.000 2 B ILE 39.310 1 +ATOM 1710 C C . ILE 141 141 ? B -14.078 -6.719 -7.590 1.000 2 B ILE 39.310 1 +ATOM 1711 O O . ILE 141 141 ? B -14.055 -7.457 -8.578 1.000 2 B ILE 39.310 1 +ATOM 1712 C CB . ILE 141 141 ? B -13.680 -4.469 -8.656 1.000 2 B ILE 39.310 1 +ATOM 1713 C CG1 . ILE 141 141 ? B -12.820 -3.199 -8.625 1.000 2 B ILE 39.310 1 +ATOM 1714 C CG2 . ILE 141 141 ? B -15.164 -4.133 -8.500 1.000 2 B ILE 39.310 1 +ATOM 1715 C CD1 . ILE 141 141 ? B -13.078 -2.256 -9.789 1.000 2 B ILE 39.310 1 +ATOM 1716 N N . LYS 142 142 ? B -14.266 -7.473 -6.430 1.000 2 B LYS 37.720 1 +ATOM 1717 C CA . LYS 142 142 ? B -15.258 -8.547 -6.379 1.000 2 B LYS 37.720 1 +ATOM 1718 C C . LYS 142 142 ? B -16.609 -8.070 -6.906 1.000 2 B LYS 37.720 1 +ATOM 1719 O O . LYS 142 142 ? B -16.969 -6.902 -6.754 1.000 2 B LYS 37.720 1 +ATOM 1720 C CB . LYS 142 142 ? B -15.406 -9.070 -4.953 1.000 2 B LYS 37.720 1 +ATOM 1721 C CG . LYS 142 142 ? B -14.266 -9.984 -4.508 1.000 2 B LYS 37.720 1 +ATOM 1722 C CD . LYS 142 142 ? B -14.516 -10.547 -3.115 1.000 2 B LYS 37.720 1 +ATOM 1723 C CE . LYS 142 142 ? B -13.375 -11.453 -2.666 1.000 2 B LYS 37.720 1 +ATOM 1724 N NZ . LYS 142 142 ? B -13.602 -11.977 -1.287 1.000 2 B LYS 37.720 1 +ATOM 1725 N N . ASN 143 143 ? B -17.203 -8.695 -7.969 1.000 2 B ASN 37.620 1 +ATOM 1726 C CA . ASN 143 143 ? B -18.609 -8.469 -8.273 1.000 2 B ASN 37.620 1 +ATOM 1727 C C . ASN 143 143 ? B -19.484 -8.688 -7.043 1.000 2 B ASN 37.620 1 +ATOM 1728 O O . ASN 143 143 ? B -19.594 -9.805 -6.543 1.000 2 B ASN 37.620 1 +ATOM 1729 C CB . ASN 143 143 ? B -19.062 -9.367 -9.422 1.000 2 B ASN 37.620 1 +ATOM 1730 C CG . ASN 143 143 ? B -20.109 -8.711 -10.297 1.000 2 B ASN 37.620 1 +ATOM 1731 O OD1 . ASN 143 143 ? B -20.594 -7.617 -10.000 1.000 2 B ASN 37.620 1 +ATOM 1732 N ND2 . ASN 143 143 ? B -20.484 -9.383 -11.383 1.000 2 B ASN 37.620 1 +ATOM 1733 N N . ASP 144 144 ? B -19.547 -7.867 -6.133 1.000 2 B ASP 38.910 1 +ATOM 1734 C CA . ASP 144 144 ? B -20.547 -7.969 -5.070 1.000 2 B ASP 38.910 1 +ATOM 1735 C C . ASP 144 144 ? B -21.953 -8.141 -5.645 1.000 2 B ASP 38.910 1 +ATOM 1736 O O . ASP 144 144 ? B -22.938 -7.910 -4.949 1.000 2 B ASP 38.910 1 +ATOM 1737 C CB . ASP 144 144 ? B -20.500 -6.734 -4.168 1.000 2 B ASP 38.910 1 +ATOM 1738 C CG . ASP 144 144 ? B -19.359 -6.773 -3.172 1.000 2 B ASP 38.910 1 +ATOM 1739 O OD1 . ASP 144 144 ? B -19.000 -7.871 -2.697 1.000 2 B ASP 38.910 1 +ATOM 1740 O OD2 . ASP 144 144 ? B -18.812 -5.695 -2.855 1.000 2 B ASP 38.910 1 +ATOM 1741 N N . PHE 145 145 ? B -22.375 -7.980 -7.012 1.000 2 B PHE 34.500 1 +ATOM 1742 C CA . PHE 145 145 ? B -23.797 -8.078 -7.254 1.000 2 B PHE 34.500 1 +ATOM 1743 C C . PHE 145 145 ? B -24.312 -9.477 -6.941 1.000 2 B PHE 34.500 1 +ATOM 1744 O O . PHE 145 145 ? B -23.672 -10.469 -7.277 1.000 2 B PHE 34.500 1 +ATOM 1745 C CB . PHE 145 145 ? B -24.125 -7.711 -8.703 1.000 2 B PHE 34.500 1 +ATOM 1746 C CG . PHE 145 145 ? B -24.281 -6.230 -8.938 1.000 2 B PHE 34.500 1 +ATOM 1747 C CD1 . PHE 145 145 ? B -25.516 -5.613 -8.773 1.000 2 B PHE 34.500 1 +ATOM 1748 C CD2 . PHE 145 145 ? B -23.203 -5.457 -9.312 1.000 2 B PHE 34.500 1 +ATOM 1749 C CE1 . PHE 145 145 ? B -25.672 -4.246 -8.984 1.000 2 B PHE 34.500 1 +ATOM 1750 C CE2 . PHE 145 145 ? B -23.344 -4.090 -9.531 1.000 2 B PHE 34.500 1 +ATOM 1751 C CZ . PHE 145 145 ? B -24.578 -3.484 -9.367 1.000 2 B PHE 34.500 1 +ATOM 1752 N N . THR 146 146 ? B -25.188 -9.562 -5.984 1.000 2 B THR 38.780 1 +ATOM 1753 C CA . THR 146 146 ? B -25.984 -10.773 -5.828 1.000 2 B THR 38.780 1 +ATOM 1754 C C . THR 146 146 ? B -26.719 -11.102 -7.125 1.000 2 B THR 38.780 1 +ATOM 1755 O O . THR 146 146 ? B -26.859 -10.250 -8.008 1.000 2 B THR 38.780 1 +ATOM 1756 C CB . THR 146 146 ? B -27.000 -10.625 -4.684 1.000 2 B THR 38.780 1 +ATOM 1757 O OG1 . THR 146 146 ? B -27.766 -9.430 -4.875 1.000 2 B THR 38.780 1 +ATOM 1758 C CG2 . THR 146 146 ? B -26.312 -10.555 -3.330 1.000 2 B THR 38.780 1 +ATOM 1759 N N . GLU 147 147 ? B -27.109 -12.172 -7.531 1.000 2 B GLU 39.340 1 +ATOM 1760 C CA . GLU 147 147 ? B -27.984 -12.422 -8.672 1.000 2 B GLU 39.340 1 +ATOM 1761 C C . GLU 147 147 ? B -29.141 -11.438 -8.703 1.000 2 B GLU 39.340 1 +ATOM 1762 O O . GLU 147 147 ? B -29.500 -10.914 -9.766 1.000 2 B GLU 39.340 1 +ATOM 1763 C CB . GLU 147 147 ? B -28.500 -13.859 -8.641 1.000 2 B GLU 39.340 1 +ATOM 1764 C CG . GLU 147 147 ? B -28.625 -14.500 -10.016 1.000 2 B GLU 39.340 1 +ATOM 1765 C CD . GLU 147 147 ? B -29.078 -15.945 -9.969 1.000 2 B GLU 39.340 1 +ATOM 1766 O OE1 . GLU 147 147 ? B -29.719 -16.422 -10.938 1.000 2 B GLU 39.340 1 +ATOM 1767 O OE2 . GLU 147 147 ? B -28.797 -16.625 -8.953 1.000 2 B GLU 39.340 1 +ATOM 1768 N N . GLU 148 148 ? B -29.812 -11.266 -7.531 1.000 2 B GLU 41.970 1 +ATOM 1769 C CA . GLU 148 148 ? B -30.984 -10.414 -7.359 1.000 2 B GLU 41.970 1 +ATOM 1770 C C . GLU 148 148 ? B -30.641 -8.938 -7.566 1.000 2 B GLU 41.970 1 +ATOM 1771 O O . GLU 148 148 ? B -31.375 -8.211 -8.227 1.000 2 B GLU 41.970 1 +ATOM 1772 C CB . GLU 148 148 ? B -31.594 -10.617 -5.973 1.000 2 B GLU 41.970 1 +ATOM 1773 C CG . GLU 148 148 ? B -32.625 -11.727 -5.922 1.000 2 B GLU 41.970 1 +ATOM 1774 C CD . GLU 148 148 ? B -33.250 -11.906 -4.543 1.000 2 B GLU 41.970 1 +ATOM 1775 O OE1 . GLU 148 148 ? B -34.406 -12.391 -4.445 1.000 2 B GLU 41.970 1 +ATOM 1776 O OE2 . GLU 148 148 ? B -32.562 -11.570 -3.547 1.000 2 B GLU 41.970 1 +ATOM 1777 N N . GLU 149 149 ? B -29.516 -8.430 -6.977 1.000 2 B GLU 39.190 1 +ATOM 1778 C CA . GLU 149 149 ? B -29.156 -7.023 -7.145 1.000 2 B GLU 39.190 1 +ATOM 1779 C C . GLU 149 149 ? B -28.594 -6.754 -8.539 1.000 2 B GLU 39.190 1 +ATOM 1780 O O . GLU 149 149 ? B -28.891 -5.719 -9.141 1.000 2 B GLU 39.190 1 +ATOM 1781 C CB . GLU 149 149 ? B -28.156 -6.586 -6.078 1.000 2 B GLU 39.190 1 +ATOM 1782 C CG . GLU 149 149 ? B -28.781 -6.168 -4.762 1.000 2 B GLU 39.190 1 +ATOM 1783 C CD . GLU 149 149 ? B -27.766 -5.828 -3.684 1.000 2 B GLU 39.190 1 +ATOM 1784 O OE1 . GLU 149 149 ? B -27.922 -4.785 -3.008 1.000 2 B GLU 39.190 1 +ATOM 1785 O OE2 . GLU 149 149 ? B -26.812 -6.613 -3.516 1.000 2 B GLU 39.190 1 +ATOM 1786 N N . GLU 150 150 ? B -27.797 -7.598 -9.258 1.000 2 B GLU 40.470 1 +ATOM 1787 C CA . GLU 150 150 ? B -27.469 -7.535 -10.680 1.000 2 B GLU 40.470 1 +ATOM 1788 C C . GLU 150 150 ? B -28.734 -7.520 -11.539 1.000 2 B GLU 40.470 1 +ATOM 1789 O O . GLU 150 150 ? B -28.812 -6.770 -12.508 1.000 2 B GLU 40.470 1 +ATOM 1790 C CB . GLU 150 150 ? B -26.578 -8.711 -11.078 1.000 2 B GLU 40.470 1 +ATOM 1791 C CG . GLU 150 150 ? B -25.438 -8.336 -12.023 1.000 2 B GLU 40.470 1 +ATOM 1792 C CD . GLU 150 150 ? B -24.578 -9.516 -12.438 1.000 2 B GLU 40.470 1 +ATOM 1793 O OE1 . GLU 150 150 ? B -23.766 -9.383 -13.375 1.000 2 B GLU 40.470 1 +ATOM 1794 O OE2 . GLU 150 150 ? B -24.734 -10.594 -11.812 1.000 2 B GLU 40.470 1 +ATOM 1795 N N . ALA 151 151 ? B -29.562 -8.359 -11.195 1.000 2 B ALA 48.090 1 +ATOM 1796 C CA . ALA 151 151 ? B -30.844 -8.383 -11.883 1.000 2 B ALA 48.090 1 +ATOM 1797 C C . ALA 151 151 ? B -31.594 -7.062 -11.695 1.000 2 B ALA 48.090 1 +ATOM 1798 O O . ALA 151 151 ? B -32.219 -6.570 -12.625 1.000 2 B ALA 48.090 1 +ATOM 1799 C CB . ALA 151 151 ? B -31.703 -9.547 -11.367 1.000 2 B ALA 48.090 1 +ATOM 1800 N N . GLN 152 152 ? B -31.516 -6.512 -10.500 1.000 2 B GLN 42.410 1 +ATOM 1801 C CA . GLN 152 152 ? B -32.125 -5.215 -10.219 1.000 2 B GLN 42.410 1 +ATOM 1802 C C . GLN 152 152 ? B -31.406 -4.098 -10.969 1.000 2 B GLN 42.410 1 +ATOM 1803 O O . GLN 152 152 ? B -32.031 -3.195 -11.516 1.000 2 B GLN 42.410 1 +ATOM 1804 C CB . GLN 152 152 ? B -32.125 -4.926 -8.719 1.000 2 B GLN 42.410 1 +ATOM 1805 C CG . GLN 152 152 ? B -33.344 -4.137 -8.234 1.000 2 B GLN 42.410 1 +ATOM 1806 C CD . GLN 152 152 ? B -34.000 -4.766 -7.031 1.000 2 B GLN 42.410 1 +ATOM 1807 O OE1 . GLN 152 152 ? B -33.531 -5.766 -6.488 1.000 2 B GLN 42.410 1 +ATOM 1808 N NE2 . GLN 152 152 ? B -35.125 -4.184 -6.602 1.000 2 B GLN 42.410 1 +ATOM 1809 N N . VAL 153 153 ? B -30.078 -4.070 -11.062 1.000 2 B VAL 40.470 1 +ATOM 1810 C CA . VAL 153 153 ? B -29.328 -3.088 -11.828 1.000 2 B VAL 40.470 1 +ATOM 1811 C C . VAL 153 153 ? B -29.516 -3.324 -13.320 1.000 2 B VAL 40.470 1 +ATOM 1812 O O . VAL 153 153 ? B -29.672 -2.373 -14.094 1.000 2 B VAL 40.470 1 +ATOM 1813 C CB . VAL 153 153 ? B -27.812 -3.139 -11.477 1.000 2 B VAL 40.470 1 +ATOM 1814 C CG1 . VAL 153 153 ? B -27.016 -2.270 -12.430 1.000 2 B VAL 40.470 1 +ATOM 1815 C CG2 . VAL 153 153 ? B -27.594 -2.697 -10.031 1.000 2 B VAL 40.470 1 +ATOM 1816 N N . ARG 154 154 ? B -29.406 -4.516 -13.703 1.000 2 B ARG 40.810 1 +ATOM 1817 C CA . ARG 154 154 ? B -29.781 -4.777 -15.094 1.000 2 B ARG 40.810 1 +ATOM 1818 C C . ARG 154 154 ? B -31.188 -4.258 -15.391 1.000 2 B ARG 40.810 1 +ATOM 1819 O O . ARG 154 154 ? B -31.422 -3.689 -16.453 1.000 2 B ARG 40.810 1 +ATOM 1820 C CB . ARG 154 154 ? B -29.703 -6.277 -15.398 1.000 2 B ARG 40.810 1 +ATOM 1821 C CG . ARG 154 154 ? B -29.062 -6.602 -16.734 1.000 2 B ARG 40.810 1 +ATOM 1822 C CD . ARG 154 154 ? B -28.984 -8.102 -16.969 1.000 2 B ARG 40.810 1 +ATOM 1823 N NE . ARG 154 154 ? B -28.375 -8.406 -18.266 1.000 2 B ARG 40.810 1 +ATOM 1824 C CZ . ARG 154 154 ? B -28.094 -9.633 -18.703 1.000 2 B ARG 40.810 1 +ATOM 1825 N NH1 . ARG 154 154 ? B -28.359 -10.695 -17.938 1.000 2 B ARG 40.810 1 +ATOM 1826 N NH2 . ARG 154 154 ? B -27.562 -9.805 -19.906 1.000 2 B ARG 40.810 1 +ATOM 1827 N N . LYS 155 155 ? B -32.125 -4.582 -14.477 1.000 2 B LYS 47.190 1 +ATOM 1828 C CA . LYS 155 155 ? B -33.469 -4.090 -14.648 1.000 2 B LYS 47.190 1 +ATOM 1829 C C . LYS 155 155 ? B -33.500 -2.566 -14.680 1.000 2 B LYS 47.190 1 +ATOM 1830 O O . LYS 155 155 ? B -34.250 -1.974 -15.469 1.000 2 B LYS 47.190 1 +ATOM 1831 C CB . LYS 155 155 ? B -34.375 -4.613 -13.531 1.000 2 B LYS 47.190 1 +ATOM 1832 C CG . LYS 155 155 ? B -35.750 -5.043 -14.008 1.000 2 B LYS 47.190 1 +ATOM 1833 C CD . LYS 155 155 ? B -36.594 -5.621 -12.867 1.000 2 B LYS 47.190 1 +ATOM 1834 C CE . LYS 155 155 ? B -38.000 -5.984 -13.320 1.000 2 B LYS 47.190 1 +ATOM 1835 N NZ . LYS 155 155 ? B -38.812 -6.523 -12.195 1.000 2 B LYS 47.190 1 +ATOM 1836 N N . GLU 156 156 ? B -32.719 -1.896 -13.859 1.000 2 B GLU 39.500 1 +ATOM 1837 C CA . GLU 156 156 ? B -32.781 -0.439 -13.789 1.000 2 B GLU 39.500 1 +ATOM 1838 C C . GLU 156 156 ? B -32.000 0.198 -14.938 1.000 2 B GLU 39.500 1 +ATOM 1839 O O . GLU 156 156 ? B -32.312 1.312 -15.359 1.000 2 B GLU 39.500 1 +ATOM 1840 C CB . GLU 156 156 ? B -32.219 0.047 -12.445 1.000 2 B GLU 39.500 1 +ATOM 1841 C CG . GLU 156 156 ? B -33.188 -0.082 -11.289 1.000 2 B GLU 39.500 1 +ATOM 1842 C CD . GLU 156 156 ? B -32.656 0.442 -9.969 1.000 2 B GLU 39.500 1 +ATOM 1843 O OE1 . GLU 156 156 ? B -33.438 0.624 -9.016 1.000 2 B GLU 39.500 1 +ATOM 1844 O OE2 . GLU 156 156 ? B -31.422 0.674 -9.898 1.000 2 B GLU 39.500 1 +ATOM 1845 N N . ASN 157 157 ? B -30.922 -0.350 -15.477 1.000 2 B ASN 34.940 1 +ATOM 1846 C CA . ASN 157 157 ? B -30.156 0.206 -16.594 1.000 2 B ASN 34.940 1 +ATOM 1847 C C . ASN 157 157 ? B -30.641 -0.343 -17.938 1.000 2 B ASN 34.940 1 +ATOM 1848 O O . ASN 157 157 ? B -29.969 -0.155 -18.953 1.000 2 B ASN 34.940 1 +ATOM 1849 C CB . ASN 157 157 ? B -28.656 -0.071 -16.406 1.000 2 B ASN 34.940 1 +ATOM 1850 C CG . ASN 157 157 ? B -27.906 1.119 -15.844 1.000 2 B ASN 34.940 1 +ATOM 1851 O OD1 . ASN 157 157 ? B -28.281 2.271 -16.062 1.000 2 B ASN 34.940 1 +ATOM 1852 N ND2 . ASN 157 157 ? B -26.844 0.849 -15.102 1.000 2 B ASN 34.940 1 +ATOM 1853 N N . GLN 158 158 ? B -31.578 -1.138 -18.109 1.000 2 B GLN 39.310 1 +ATOM 1854 C CA . GLN 158 158 ? B -32.219 -1.520 -19.359 1.000 2 B GLN 39.310 1 +ATOM 1855 C C . GLN 158 158 ? B -32.438 -0.304 -20.266 1.000 2 B GLN 39.310 1 +ATOM 1856 O O . GLN 158 158 ? B -32.500 -0.432 -21.484 1.000 2 B GLN 39.310 1 +ATOM 1857 C CB . GLN 158 158 ? B -33.562 -2.225 -19.094 1.000 2 B GLN 39.310 1 +ATOM 1858 C CG . GLN 158 158 ? B -33.531 -3.707 -19.453 1.000 2 B GLN 39.310 1 +ATOM 1859 C CD . GLN 158 158 ? B -34.844 -4.402 -19.094 1.000 2 B GLN 39.310 1 +ATOM 1860 O OE1 . GLN 158 158 ? B -35.688 -3.830 -18.406 1.000 2 B GLN 39.310 1 +ATOM 1861 N NE2 . GLN 158 158 ? B -35.000 -5.641 -19.547 1.000 2 B GLN 39.310 1 +ATOM 1862 N N . TRP 159 159 ? B -32.625 0.851 -19.672 1.000 2 B TRP 36.380 1 +ATOM 1863 C CA . TRP 159 159 ? B -32.938 1.958 -20.562 1.000 2 B TRP 36.380 1 +ATOM 1864 C C . TRP 159 159 ? B -31.750 2.369 -21.406 1.000 2 B TRP 36.380 1 +ATOM 1865 O O . TRP 159 159 ? B -31.906 2.943 -22.484 1.000 2 B TRP 36.380 1 +ATOM 1866 C CB . TRP 159 159 ? B -33.469 3.156 -19.766 1.000 2 B TRP 36.380 1 +ATOM 1867 C CG . TRP 159 159 ? B -34.250 2.781 -18.547 1.000 2 B TRP 36.380 1 +ATOM 1868 C CD1 . TRP 159 159 ? B -35.031 1.654 -18.375 1.000 2 B TRP 36.380 1 +ATOM 1869 C CD2 . TRP 159 159 ? B -34.375 3.537 -17.344 1.000 2 B TRP 36.380 1 +ATOM 1870 N NE1 . TRP 159 159 ? B -35.594 1.665 -17.141 1.000 2 B TRP 36.380 1 +ATOM 1871 C CE2 . TRP 159 159 ? B -35.219 2.807 -16.484 1.000 2 B TRP 36.380 1 +ATOM 1872 C CE3 . TRP 159 159 ? B -33.875 4.762 -16.906 1.000 2 B TRP 36.380 1 +ATOM 1873 C CZ2 . TRP 159 159 ? B -35.562 3.264 -15.203 1.000 2 B TRP 36.380 1 +ATOM 1874 C CZ3 . TRP 159 159 ? B -34.188 5.211 -15.633 1.000 2 B TRP 36.380 1 +ATOM 1875 C CH2 . TRP 159 159 ? B -35.031 4.465 -14.797 1.000 2 B TRP 36.380 1 +ATOM 1876 N N . CYS 160 160 ? B -30.500 2.383 -20.875 1.000 2 B CYS 32.000 1 +ATOM 1877 C CA . CYS 160 160 ? B -29.438 2.963 -21.672 1.000 2 B CYS 32.000 1 +ATOM 1878 C C . CYS 160 160 ? B -29.062 2.043 -22.828 1.000 2 B CYS 32.000 1 +ATOM 1879 O O . CYS 160 160 ? B -28.219 2.398 -23.672 1.000 2 B CYS 32.000 1 +ATOM 1880 C CB . CYS 160 160 ? B -28.203 3.236 -20.812 1.000 2 B CYS 32.000 1 +ATOM 1881 S SG . CYS 160 160 ? B -28.297 4.781 -19.875 1.000 2 B CYS 32.000 1 +ATOM 1882 N N . GLU 161 161 ? B -29.375 0.712 -22.750 1.000 2 B GLU 33.220 1 +ATOM 1883 C CA . GLU 161 161 ? B -29.156 -0.074 -23.969 1.000 2 B GLU 33.220 1 +ATOM 1884 C C . GLU 161 161 ? B -30.078 0.378 -25.094 1.000 2 B GLU 33.220 1 +ATOM 1885 O O . GLU 161 161 ? B -29.844 0.060 -26.266 1.000 2 B GLU 33.220 1 +ATOM 1886 C CB . GLU 161 161 ? B -29.359 -1.564 -23.688 1.000 2 B GLU 33.220 1 +ATOM 1887 C CG . GLU 161 161 ? B -28.125 -2.262 -23.141 1.000 2 B GLU 33.220 1 +ATOM 1888 C CD . GLU 161 161 ? B -28.281 -3.770 -23.016 1.000 2 B GLU 33.220 1 +ATOM 1889 O OE1 . GLU 161 161 ? B -27.266 -4.500 -23.125 1.000 2 B GLU 33.220 1 +ATOM 1890 O OE2 . GLU 161 161 ? B -29.438 -4.223 -22.828 1.000 2 B GLU 33.220 1 +ATOM 1891 N N . GLU 162 162 ? B -31.172 0.975 -24.672 1.000 2 B GLU 32.030 1 +ATOM 1892 C CA . GLU 162 162 ? B -32.031 1.327 -25.797 1.000 2 B GLU 32.030 1 +ATOM 1893 C C . GLU 162 162 ? B -31.578 2.607 -26.484 1.000 2 B GLU 32.030 1 +ATOM 1894 O O . GLU 162 162 ? B -31.891 2.854 -27.641 1.000 2 B GLU 32.030 1 +ATOM 1895 C CB . GLU 162 162 ? B -33.500 1.483 -25.328 1.000 2 B GLU 32.030 1 +ATOM 1896 C CG . GLU 162 162 ? B -34.281 0.183 -25.328 1.000 2 B GLU 32.030 1 +ATOM 1897 C CD . GLU 162 162 ? B -35.750 0.374 -24.984 1.000 2 B GLU 32.030 1 +ATOM 1898 O OE1 . GLU 162 162 ? B -36.594 -0.335 -25.547 1.000 2 B GLU 32.030 1 +ATOM 1899 O OE2 . GLU 162 162 ? B -36.062 1.240 -24.125 1.000 2 B GLU 32.030 1 +ATOM 1900 N N . LYS 163 163 ? B -30.609 3.463 -26.031 1.000 2 B LYS 25.800 1 +ATOM 1901 C CA . LYS 163 163 ? B -30.328 4.449 -27.078 1.000 2 B LYS 25.800 1 +ATOM 1902 C C . LYS 163 163 ? B -29.234 3.957 -28.016 1.000 2 B LYS 25.800 1 +ATOM 1903 O O . LYS 163 163 ? B -28.328 3.234 -27.594 1.000 2 B LYS 25.800 1 +ATOM 1904 C CB . LYS 163 163 ? B -29.922 5.785 -26.453 1.000 2 B LYS 25.800 1 +ATOM 1905 C CG . LYS 163 163 ? B -31.047 6.812 -26.391 1.000 2 B LYS 25.800 1 +ATOM 1906 C CD . LYS 163 163 ? B -30.547 8.148 -25.844 1.000 2 B LYS 25.800 1 +ATOM 1907 C CE . LYS 163 163 ? B -31.672 9.180 -25.812 1.000 2 B LYS 25.800 1 +ATOM 1908 N NZ . LYS 163 163 ? B -31.188 10.508 -25.328 1.000 2 B LYS 25.800 1 +# +# +loop_ +_atom_type.symbol +C +N +O +S +# +# +loop_ +_ma_entry_associated_files.id +_ma_entry_associated_files.entry_id +_ma_entry_associated_files.file_url +_ma_entry_associated_files.file_type +_ma_entry_associated_files.file_format +_ma_entry_associated_files.file_content +_ma_entry_associated_files.details +_ma_entry_associated_files.data_id +1 692-UBIQUITIN_SKP1 692-Ubiquitin_SKP1.zip archive zip +'archive with multiple files' . . +# +# +loop_ +_ma_associated_archive_file_details.id +_ma_associated_archive_file_details.archive_file_id +_ma_associated_archive_file_details.file_path +_ma_associated_archive_file_details.file_format +_ma_associated_archive_file_details.file_content +_ma_associated_archive_file_details.description +_ma_associated_archive_file_details.data_id +1 1 692-Ubiquitin_SKP1_local_pairwise_qa.cif cif 'local pairwise QA scores' +'Predicted aligned error' . +2 1 692-Ubiquitin_SKP1_coverage.png other other +'PNG file showing number of sequences in the MSA covering each position in the target sequences' +. +3 1 692-Ubiquitin_SKP1_plddt.png other other +'PNG file showing pLDDT at each residue position for each of the 5 models produced' +. +4 1 692-Ubiquitin_SKP1_pae.png other other +'PNG file showing the PAE matrices for each of the 5 models produced' . +5 1 692-Ubiquitin_SKP1_rank_2_model_5_seed_000.cif cif other +'#2 ranked model; pLDDT 69.9, pTM 0.531, ipTM 0.144' . +6 1 692-Ubiquitin_SKP1_rank_3_model_1_seed_000.cif cif other +'#3 ranked model; pLDDT 69.4, pTM 0.495, ipTM 0.14' . +# +# +loop_ +_ma_qa_metric.id +_ma_qa_metric.name +_ma_qa_metric.description +_ma_qa_metric.type +_ma_qa_metric.mode +_ma_qa_metric.type_other_details +_ma_qa_metric.software_group_id +1 pLDDT 'Predicted accuracy according to the CA-only lDDT in [0,100]' pLDDT +global . . +2 pTM 'Predicted accuracy according to the TM-score score in [0,1]' pTM global . +. +3 ipTM 'Predicted protein-protein interface score based on TM-score in [0,1]' +ipTM global . . +4 pLDDT 'Predicted accuracy according to the CA-only lDDT in [0,100]' pLDDT +local . . +5 PAE 'Predicted aligned error (in Angstroms)' PAE local-pairwise . . +# +# +loop_ +_ma_qa_metric_global.ordinal_id +_ma_qa_metric_global.model_id +_ma_qa_metric_global.metric_id +_ma_qa_metric_global.metric_value +1 1 1 74.100 +2 1 2 0.544 +3 1 3 0.163 +# +# +loop_ +_ma_qa_metric_local.ordinal_id +_ma_qa_metric_local.model_id +_ma_qa_metric_local.label_asym_id +_ma_qa_metric_local.label_seq_id +_ma_qa_metric_local.label_comp_id +_ma_qa_metric_local.metric_id +_ma_qa_metric_local.metric_value +1 1 A 1 MET 4 73.940 +2 1 A 2 GLN 4 80.620 +3 1 A 3 ILE 4 82.810 +4 1 A 4 PHE 4 79.250 +5 1 A 5 VAL 4 84.380 +6 1 A 6 LYS 4 77.310 +7 1 A 7 THR 4 73.000 +8 1 A 8 LEU 4 63.970 +9 1 A 9 THR 4 63.780 +10 1 A 10 GLY 4 62.690 +11 1 A 11 LYS 4 71.190 +12 1 A 12 THR 4 73.000 +13 1 A 13 ILE 4 80.880 +14 1 A 14 THR 4 82.310 +15 1 A 15 LEU 4 87.060 +16 1 A 16 GLU 4 86.250 +17 1 A 17 VAL 4 86.060 +18 1 A 18 GLU 4 83.250 +19 1 A 19 PRO 4 79.440 +20 1 A 20 SER 4 83.810 +21 1 A 21 ASP 4 87.880 +22 1 A 22 THR 4 88.690 +23 1 A 23 ILE 4 91.690 +24 1 A 24 GLU 4 91.810 +25 1 A 25 ASN 4 91.750 +26 1 A 26 VAL 4 93.190 +27 1 A 27 LYS 4 91.190 +28 1 A 28 ALA 4 93.560 +29 1 A 29 LYS 4 90.810 +30 1 A 30 ILE 4 90.750 +31 1 A 31 GLN 4 90.310 +32 1 A 32 ASP 4 89.750 +33 1 A 33 LYS 4 84.620 +34 1 A 34 GLU 4 86.000 +35 1 A 35 GLY 4 85.440 +36 1 A 36 ILE 4 85.000 +37 1 A 37 PRO 4 88.560 +38 1 A 38 PRO 4 88.440 +39 1 A 39 ASP 4 84.620 +40 1 A 40 GLN 4 85.380 +41 1 A 41 GLN 4 87.060 +42 1 A 42 ARG 4 84.310 +43 1 A 43 LEU 4 85.750 +44 1 A 44 ILE 4 79.000 +45 1 A 45 PHE 4 76.620 +46 1 A 46 ALA 4 63.690 +47 1 A 47 GLY 4 58.660 +48 1 A 48 LYS 4 66.500 +49 1 A 49 GLN 4 73.880 +50 1 A 50 LEU 4 82.310 +51 1 A 51 GLU 4 85.940 +52 1 A 52 ASP 4 87.310 +53 1 A 53 GLY 4 86.120 +54 1 A 54 ARG 4 84.190 +55 1 A 55 THR 4 82.940 +56 1 A 56 LEU 4 82.690 +57 1 A 57 SER 4 73.380 +58 1 A 58 ASP 4 71.880 +59 1 A 59 TYR 4 72.000 +60 1 A 60 ASN 4 70.310 +61 1 A 61 ILE 4 72.380 +62 1 A 62 GLN 4 68.380 +63 1 A 63 LYS 4 68.880 +64 1 A 64 GLU 4 66.190 +65 1 A 65 SER 4 70.120 +66 1 A 66 THR 4 74.250 +67 1 A 67 LEU 4 81.000 +68 1 A 68 HIS 4 78.560 +69 1 A 69 LEU 4 85.120 +70 1 A 70 VAL 4 79.620 +71 1 A 71 LEU 4 79.940 +72 1 A 72 ARG 4 67.880 +73 1 A 73 LEU 4 55.970 +74 1 A 74 ARG 4 47.720 +75 1 A 75 GLY 4 39.810 +76 1 A 76 GLY 4 31.770 +77 1 B 1 MET 4 64.120 +78 1 B 2 PRO 4 82.810 +79 1 B 3 SER 4 87.810 +80 1 B 4 ILE 4 88.120 +81 1 B 5 LYS 4 90.500 +82 1 B 6 LEU 4 91.440 +83 1 B 7 GLN 4 89.560 +84 1 B 8 SER 4 88.060 +85 1 B 9 SER 4 83.120 +86 1 B 10 ASP 4 81.620 +87 1 B 11 GLY 4 85.500 +88 1 B 12 GLU 4 88.380 +89 1 B 13 ILE 4 91.310 +90 1 B 14 PHE 4 91.060 +91 1 B 15 GLU 4 90.690 +92 1 B 16 VAL 4 90.750 +93 1 B 17 ASP 4 90.380 +94 1 B 18 VAL 4 88.190 +95 1 B 19 GLU 4 88.380 +96 1 B 20 ILE 4 91.380 +97 1 B 21 ALA 4 89.750 +98 1 B 22 LYS 4 88.750 +99 1 B 23 GLN 4 90.690 +100 1 B 24 SER 4 91.380 +101 1 B 25 VAL 4 90.560 +102 1 B 26 THR 4 89.810 +103 1 B 27 ILE 4 89.060 +104 1 B 28 LYS 4 89.190 +105 1 B 29 THR 4 89.750 +106 1 B 30 MET 4 84.000 +107 1 B 31 LEU 4 82.440 +108 1 B 32 GLU 4 84.250 +109 1 B 33 ASP 4 81.440 +110 1 B 34 LEU 4 74.440 +111 1 B 35 GLY 4 70.620 +112 1 B 36 MET 4 62.220 +113 1 B 37 ASP 4 58.910 +114 1 B 38 ASP 4 56.620 +115 1 B 39 GLU 4 54.060 +116 1 B 40 GLY 4 54.880 +117 1 B 41 ASP 4 57.440 +118 1 B 42 ASP 4 67.620 +119 1 B 43 ASP 4 76.120 +120 1 B 44 PRO 4 86.310 +121 1 B 45 VAL 4 85.310 +122 1 B 46 PRO 4 87.120 +123 1 B 47 LEU 4 87.190 +124 1 B 48 PRO 4 84.500 +125 1 B 49 ASN 4 82.060 +126 1 B 50 VAL 4 87.060 +127 1 B 51 ASN 4 84.880 +128 1 B 52 ALA 4 85.190 +129 1 B 53 ALA 4 82.440 +130 1 B 54 ILE 4 88.310 +131 1 B 55 LEU 4 92.810 +132 1 B 56 LYS 4 90.500 +133 1 B 57 LYS 4 91.310 +134 1 B 58 VAL 4 93.940 +135 1 B 59 ILE 4 93.380 +136 1 B 60 GLN 4 91.690 +137 1 B 61 TRP 4 93.620 +138 1 B 62 CYS 4 93.120 +139 1 B 63 THR 4 92.310 +140 1 B 64 HIS 4 90.690 +141 1 B 65 HIS 4 90.560 +142 1 B 66 LYS 4 85.620 +143 1 B 67 ASP 4 82.690 +144 1 B 68 ASP 4 78.880 +145 1 B 69 PRO 4 67.810 +146 1 B 70 PRO 4 60.220 +147 1 B 71 PRO 4 57.720 +148 1 B 72 PRO 4 50.280 +149 1 B 73 GLU 4 49.620 +150 1 B 74 ASP 4 48.750 +151 1 B 75 ASP 4 45.380 +152 1 B 76 GLU 4 48.620 +153 1 B 77 ASN 4 46.910 +154 1 B 78 LYS 4 49.620 +155 1 B 79 GLU 4 47.410 +156 1 B 80 LYS 4 52.310 +157 1 B 81 ARG 4 58.590 +158 1 B 82 THR 4 61.560 +159 1 B 83 ASP 4 64.560 +160 1 B 84 ASP 4 75.750 +161 1 B 85 ILE 4 80.810 +162 1 B 86 PRO 4 83.120 +163 1 B 87 VAL 4 81.750 +164 1 B 88 TRP 4 80.750 +165 1 B 89 ASP 4 86.000 +166 1 B 90 GLN 4 78.940 +167 1 B 91 GLU 4 78.380 +168 1 B 92 PHE 4 81.000 +169 1 B 93 LEU 4 83.190 +170 1 B 94 LYS 4 73.690 +171 1 B 95 VAL 4 69.940 +172 1 B 96 ASP 4 64.250 +173 1 B 97 GLN 4 68.690 +174 1 B 98 GLY 4 67.310 +175 1 B 99 THR 4 77.190 +176 1 B 100 LEU 4 82.000 +177 1 B 101 PHE 4 80.620 +178 1 B 102 GLU 4 83.120 +179 1 B 103 LEU 4 90.190 +180 1 B 104 ILE 4 87.880 +181 1 B 105 LEU 4 86.880 +182 1 B 106 ALA 4 92.120 +183 1 B 107 ALA 4 92.560 +184 1 B 108 ASN 4 88.810 +185 1 B 109 TYR 4 89.380 +186 1 B 110 LEU 4 92.690 +187 1 B 111 ASP 4 88.500 +188 1 B 112 ILE 4 92.560 +189 1 B 113 LYS 4 87.120 +190 1 B 114 GLY 4 88.880 +191 1 B 115 LEU 4 92.440 +192 1 B 116 LEU 4 90.060 +193 1 B 117 ASP 4 86.690 +194 1 B 118 VAL 4 89.060 +195 1 B 119 THR 4 88.690 +196 1 B 120 CYS 4 84.190 +197 1 B 121 LYS 4 82.500 +198 1 B 122 THR 4 82.750 +199 1 B 123 VAL 4 79.060 +200 1 B 124 ALA 4 77.250 +201 1 B 125 ASN 4 74.250 +202 1 B 126 MET 4 70.310 +203 1 B 127 ILE 4 68.810 +204 1 B 128 LYS 4 66.880 +205 1 B 129 GLY 4 64.310 +206 1 B 130 LYS 4 62.160 +207 1 B 131 THR 4 61.280 +208 1 B 132 PRO 4 53.380 +209 1 B 133 GLU 4 54.750 +210 1 B 134 GLU 4 57.090 +211 1 B 135 ILE 4 54.340 +212 1 B 136 ARG 4 53.620 +213 1 B 137 LYS 4 52.810 +214 1 B 138 THR 4 53.340 +215 1 B 139 PHE 4 49.060 +216 1 B 140 ASN 4 44.690 +217 1 B 141 ILE 4 39.310 +218 1 B 142 LYS 4 37.720 +219 1 B 143 ASN 4 37.620 +220 1 B 144 ASP 4 38.910 +221 1 B 145 PHE 4 34.500 +222 1 B 146 THR 4 38.780 +223 1 B 147 GLU 4 39.340 +224 1 B 148 GLU 4 41.970 +225 1 B 149 GLU 4 39.190 +226 1 B 150 GLU 4 40.470 +227 1 B 151 ALA 4 48.090 +228 1 B 152 GLN 4 42.410 +229 1 B 153 VAL 4 40.470 +230 1 B 154 ARG 4 40.810 +231 1 B 155 LYS 4 47.190 +232 1 B 156 GLU 4 39.500 +233 1 B 157 ASN 4 34.940 +234 1 B 158 GLN 4 39.310 +235 1 B 159 TRP 4 36.380 +236 1 B 160 CYS 4 32.000 +237 1 B 161 GLU 4 33.220 +238 1 B 162 GLU 4 32.030 +239 1 B 163 LYS 4 25.800 +# diff --git a/projects/2024-04-ma-dm-prc/minimal_example_modelcif/692-Ubiquitin_SKP1.zip b/projects/2024-04-ma-dm-prc/minimal_example_modelcif/692-Ubiquitin_SKP1.zip new file mode 100644 index 0000000000000000000000000000000000000000..bb99842efc45c158ebda51d83bdaaff10c770203 Binary files /dev/null and b/projects/2024-04-ma-dm-prc/minimal_example_modelcif/692-Ubiquitin_SKP1.zip differ diff --git a/projects/2024-04-ma-dm-prc/minimal_example_modelcif/issues.json b/projects/2024-04-ma-dm-prc/minimal_example_modelcif/issues.json new file mode 100644 index 0000000000000000000000000000000000000000..77e12053c3d69c186b06ae8c442b0890bc565104 --- /dev/null +++ b/projects/2024-04-ma-dm-prc/minimal_example_modelcif/issues.json @@ -0,0 +1 @@ +[["692-Ubiquitin_SKP1","up_extra",["Ubiquitin_HUMAN_SKP1_HUMAN_unrelaxed_rank_001_alphafold2_multimer_v3_model_4_seed_000","A","P0CG48",76,0,609,0],[[],[1,76],[1,76]]],["692-Ubiquitin_SKP1","up_extra",["Ubiquitin_HUMAN_SKP1_HUMAN_unrelaxed_rank_002_alphafold2_multimer_v3_model_5_seed_000","A","P0CG48",76,0,609,0],[[],[1,76],[1,76]]],["692-Ubiquitin_SKP1","up_extra",["Ubiquitin_HUMAN_SKP1_HUMAN_unrelaxed_rank_003_alphafold2_multimer_v3_model_1_seed_000","A","P0CG48",76,0,609,0],[[],[1,76],[1,76]]]] \ No newline at end of file diff --git a/projects/2024-04-ma-dm-prc/translate2modelcif.py b/projects/2024-04-ma-dm-prc/translate2modelcif.py new file mode 100644 index 0000000000000000000000000000000000000000..04083c598e63a8c8578353abf907e08f0279263f --- /dev/null +++ b/projects/2024-04-ma-dm-prc/translate2modelcif.py @@ -0,0 +1,1854 @@ +#! /usr/local/bin/ost +# -*- coding: utf-8 -*- + +"""Translate PRC models for Juntao from PDB + extra data into ModelCIF.""" + +# EXAMPLES for running: +# ost translate2modelcif.py ./minimal_example ./modelcif + +import argparse +import datetime +import gzip +import os +import shutil +import sys +import zipfile + +from timeit import default_timer as timer +import numpy as np +import requests +import ujson as json + +import ihm +import ihm.citations +import modelcif +import modelcif.associated +import modelcif.dumper +import modelcif.model +import modelcif.protocol +import modelcif.reference + +import pandas as pd +from ost import io, seq + +################################################################################ +# GENERAL HELPER FUNCTIONS +################################################################################ +def _abort_msg(msg, exit_code=1): + """Write error message and exit with exit_code.""" + print(f"{msg}\nAborting.", file=sys.stderr) + sys.exit(exit_code) + + +def _warn_msg(msg): + """Write a warning message to stdout.""" + print(f"WARNING: {msg}") + + +def _check_file(file_path): + """Make sure a file exists and is actually a file.""" + if not os.path.exists(file_path): + _abort_msg(f"File not found: '{file_path}'.") + if not os.path.isfile(file_path): + _abort_msg(f"File path does not point to file: '{file_path}'.") + + +def _check_folder(dir_path): + """Make sure a file exists and is actually a file.""" + if not os.path.exists(dir_path): + _abort_msg(f"Path not found: '{dir_path}'.") + if not os.path.isdir(dir_path): + _abort_msg(f"Path does not point to a directory: '{dir_path}'.") + + +def _check_opts_folder(dir_path): + """Remove trailing '/' (return fixed one) and check if path valid.""" + if dir_path.endswith("/"): + dir_path = dir_path[:-1] + _check_folder(dir_path) + return dir_path + + +def _get_res_num(r, use_auth=False): + """Get res. num. from auth. IDs if reading from mmCIF files.""" + if use_auth: + return int(r.GetStringProp("pdb_auth_resnum")) + return r.number.num + + +def _get_ch_name(ch, use_auth=False): + """Get chain name from auth. IDs if reading from mmCIF files.""" + if use_auth: + return ch.GetStringProp("pdb_auth_chain_name") + return ch.name + + +def _get_sequence(chn, use_auth=False): + """Get the sequence out of an OST chain incl. '-' for gaps in resnums.""" + # initialise (add gaps if first is not at num. 1) + lst_rn = _get_res_num(chn.residues[0], use_auth) + idx = 1 + sqe = "-" * (lst_rn - 1) + chn.residues[0].one_letter_code + + for res in chn.residues[idx:]: + lst_rn += 1 + while lst_rn != _get_res_num(res, use_auth): + sqe += "-" + lst_rn += 1 + sqe += res.one_letter_code + return sqe +################################################################################ + +################################################################################ +# DATA HANDLING +################################################################################ +def _parse_args(): + """Parse command line arguments.""" + parser = argparse.ArgumentParser( + formatter_class=argparse.RawDescriptionHelpFormatter, + description=__doc__, + ) + + parser.add_argument( + "input_data_path", + type=str, + metavar="<INPUT DATA PATH>", + help="Data as provided by depositors. Expected to contain files " + + "Annotations.csv and Annotations.json with metadata, Config_Files " + + "directory with config_[X].json files for all X listed in Config " + + "column of Annotations.csv, and Zip_Files directory with files named " + + "[X]-[NAME].zip for each X listed in the metadata files.", + ) + parser.add_argument( + "out_dir", + type=str, + metavar="<OUTPUT DIR>", + help="Path to directory to store results ([X]-[NAME].* files and " + + "issues.json with any observed issues).", + ) + parser.add_argument( + "--compress", + default=False, + action="store_true", + help="Compress ModelCIF file with gzip.", + ) + parser.add_argument( + "--checks-only", + default=False, + action="store_true", + help="Perform only checks without producing ModelCIF files.", + ) + parser.add_argument( + "--no-extra-files", + default=False, + action="store_true", + help="Skip writing extra models, PNGs, and PAE (for testing).", + ) + parser.add_argument( + "--single-model", + type=str, + #metavar="<PDB WEB PATH>", + default=None, + help="If provided, only the model matching the provided string in the " + + "Annotations.json will be converted.", + ) + + opts = parser.parse_args() + + # check input + opts.input_data_path = _check_opts_folder(opts.input_data_path) + _check_file(os.path.join(opts.input_data_path, "Annotations.csv")) + _check_file(os.path.join(opts.input_data_path, "Annotations.json")) + _check_folder(os.path.join(opts.input_data_path, "Config_Files")) + _check_folder(os.path.join(opts.input_data_path, "Zip_Files")) + if opts.out_dir.endswith("/"): + opts.out_dir = opts.out_dir[:-1] + if not os.path.exists(opts.out_dir): + os.makedirs(opts.out_dir, exist_ok=True) + return opts + + +def _get_audit_authors(): + """Return the list of authors that produced this model.""" + return ( + "Shafiq, Tiasha A.", + "Yu, Juntao", + "Feng, Wenzhi", + "Zhang, Yizhe", + "Zhou, Haining", + "Paulo, Joao A.", + "Gygi, Steven P.", + "Moazed, Danesh", + ) + + +def _get_metadata(input_data_path, single_model=None): + """Read various metedata files and prepare for next steps.""" + metadata_json_dict = json.load(open( + os.path.join(input_data_path, "Annotations.json") + )) + metadata_csv = pd.read_csv(os.path.join(input_data_path, "Annotations.csv")) + if len(set(metadata_csv["Number"])) != len(metadata_csv): + raise RuntimeError(f"Model numbers in CSV file not unique") + metadata_csv_dict = metadata_csv.set_index("Number").to_dict(orient="index") + # fetch configs + configs = {} # key = X in config_[X].json, value = dict + config_path = os.path.join(input_data_path, "Config_Files") + for f_name in os.listdir(config_path): + if f_name.endswith(".json"): + # assume name config_[X].json + ss = os.path.splitext(f_name)[0].split("_") + if len(ss) != 2: + raise RuntimeError(f"Unexpected config file name {f_name}") + config_key = int(ss[1]) + # load config + configs[config_key] = json.load( + open(os.path.join(config_path, f_name)) + ) + # fetch zip file paths + zip_files = {} # key = int(X) in [X]-[NAME].zip, value = dict + zip_files_named = {} # key = NAME in [X]-[NAME].zip, value = dict + zip_path = os.path.join(input_data_path, "Zip_Files") + for f_name in os.listdir(zip_path): + if f_name.endswith(".zip"): + # assume name [X]-[NAME].zip + mdl_id = os.path.splitext(f_name)[0] + ss = mdl_id.split("-") + if len(ss) != 2: + raise RuntimeError(f"Unexpected zip file name {f_name}") + zip_dict = { + "mdl_id": mdl_id, + "zip_file_path": os.path.join(zip_path, f_name) + } + zip_files[int(ss[0])] = zip_dict + assert ss[1] not in zip_files_named + zip_files_named[ss[1]] = zip_dict + # join infos into metadata list of dicts + if len(metadata_json_dict) != len(metadata_csv_dict): + _warn_msg( + "Mismatch in number of entries between JSON and CSV. " \ + "Will ignore extra entries in CSV." + ) + if len(zip_files) != len(metadata_json_dict): + _warn_msg( + "Mismatch in number of entries between JSON and ZIP files. " \ + "Will ignore extra ZIP files." + ) + metadata_all = [] + for key, data in metadata_json_dict.items(): + # restrict if needed + if single_model is not None and key != single_model: + continue + # sanity checks on data + if sorted(data.keys()) != ["abstract", "chains", "title"]: + raise RuntimeError(f"Wrong dict-keys observed for mdl {key}") + name_checks = [] + for chain in data["chains"]: + if sorted(chain.keys()) != ["chain", "name", "up_ac"]: + raise RuntimeError( + f"Wrong dict-keys observed in chain for mdl {key}" + ) + name_checks.extend([chain["name"], chain["up_ac"]]) + str_name_checks = "|".join(name_checks) + # combine with infos in CSV + if int(key) not in metadata_csv_dict: + raise RuntimeError(f"Missing CSV entry for mdl {key}") + data.update(metadata_csv_dict[int(key)]) + for csv_key in ["Component_1", "Component_2", + "Component_1_UniProtKB ID", "Component_2_UniProtKB ID"]: + tst = data[csv_key] + if len(data["chains"]) == 2 and not pd.isna(tst) \ + and tst not in str_name_checks: + _warn_msg( + f"Data in column {csv_key} for mdl {key} not in chains" + ) + if not pd.isna(tst) and tst not in data["abstract"]: + _warn_msg( + f"Data in column {csv_key} for mdl {key} not in abstract" + ) + acc_check_txt1 = ( + f"The pLDDT score of the rank 1 model is {data['pLDDT_1']:g}, " \ + f"the pTM score of the rank 1 model is {data['pTM_1']:g}, and " \ + f"the ipTM score of the rank 1 model is {data['ipTM_1']:g}." + ) + acc_check_txt2 = ( + f"The average pLDDT score of all five models is " \ + f"{data['pLDDT_Avg']:g}, the average pTM score of all five " \ + f"models is {data['pTM_Avg']:g}, and the average ipTM score of " \ + f"all five models is {data['ipTM_Avg']:g}." + ) + if acc_check_txt1 not in data["abstract"]: + _warn_msg( + f"Unexpected abstract text for mdl {key} not containing " \ + f"'{acc_check_txt1}'." + ) + # expected to be missing for 742 + if acc_check_txt2 not in data["abstract"] and key != "742": + _warn_msg( + f"Unexpected abstract text for mdl {key} not containing " \ + f"'{acc_check_txt2}'." + ) + # combine with ZIP file infos + if int(key) not in zip_files: + raise RuntimeError(f"Missing ZIP file for mdl {key}") + zip_dict = zip_files[int(key)] + # HACK: does it match? + exp_name = f"{data['Component_1']}_{data['Component_2']}" + exp_mdl_id = f"{int(key):03d}-{exp_name}" + zip_mdl_id = zip_dict["mdl_id"] + if exp_mdl_id != zip_mdl_id and len(data["chains"]) == 2: + if exp_name not in zip_files_named: + raise RuntimeError(f"Missing ZIP file {exp_mdl_id}") + zip_dict = zip_files_named[exp_name] + zip_dict["orig_mdl_id"] = zip_dict["mdl_id"] + zip_dict["mdl_id"] = exp_mdl_id + _warn_msg( + f"Badly named ZIP file for mdl {key}. " \ + f"Expected {exp_mdl_id}, found {zip_mdl_id}. " \ + f"Revert to match to {zip_dict['orig_mdl_id']}.zip." + ) + # + data.update(zip_dict) + # combine with config + if data["Config"] not in configs: + raise RuntimeError(f"Missing config file for mdl {key}") + data["config_dict"] = configs[data["Config"]] + # keep it + metadata_all.append(data) + return metadata_all + + +def _zip_file_check(zf, file_name, desired_ranks, data_from_zip): + """Fill data_from_zip with info from given file in zipped file handle zf if + data is to be added (depends on file_name). + Only ranks from 1 to desired_ranks are parsed. + Return False if file not parsed (ok if file_name hidden file; weird else). + """ + file_base_path = os.path.basename(file_name) + # check for PNGs + for png_type in ["coverage", "plddt", "pae"]: + if file_name.endswith(f"_{png_type}.png"): + data_from_zip[f"{png_type}_png_data"] = zf.open(file_name).read() + data_from_zip[f"{png_type}_png_file_name"] = file_base_path + return True + # check for PDB and JSON of given rank + # -> PDB file name example: ..._unrelaxed_rank_001_..._model_1_seed_000.pdb + # -> JSON file name example: ..._scores_rank_001_..._model_1_seed_000.json + file_base, file_ext = os.path.splitext(file_base_path) + if file_ext in [".pdb", ".json"] and not file_base.startswith('.'): + ss = file_base.split('_') + mdl_rank = int(ss[ss.index("rank") + 1]) + if mdl_rank < 1 or mdl_rank > desired_ranks: + return False + mdl_rank_key = f"rank_{mdl_rank}" + if mdl_rank_key not in data_from_zip: + data_from_zip[mdl_rank_key] = {} + mdl_dict = data_from_zip[mdl_rank_key] + if file_ext == ".pdb": + # duplicate = extra unhandled file + if "ent" in mdl_dict: + return False + mdl_dict["mdl_file_base"] = file_base + mdl_dict["mdl_id"] = '_'.join(ss[ss.index("model"):]) + mdl_dict["ent"] = io.PDBStrToEntity( + zf.open(file_name).read(), + profile=io.profiles["DEFAULT"], + process=True + ) + return True + elif file_ext == ".json": + # duplicate = extra unhandled file + if "scores" in mdl_dict: + return False + mdl_dict["scores"] = json.load(zf.open(file_name)) + return True + return False + + +def _parse_zip_file(zip_file_path, desired_ranks): + """Parse data in provided ZIP files and checks for desired ranks. + Returns dict. with keys "rank_X" for X from 1 to desired_ranks incl.: + - mdl_file_base: file name of selected PDB file w/o directory and extension + - mdl_id: ID of selected PDB file (e.g. "model_1_seed_000") + - ent: OST entity for PDB file + - scores: dict. loaded from scores JSON + It further contains info on PNG files as: + - [png_type]_png_data: data in file (to be written back into accomp. zip) + - [png_type]_png_file_name: file name w/o directory used in ZIP file + for [png_type] in [coverage, plddt, pae]. + If expected files are not found, an exception is raised. + If additional files are found, a warning is shown. + """ + data_from_zip = {} + unparsed_files = [] + with zipfile.ZipFile(zip_file_path) as zf: + for file_name in zf.namelist(): + check = _zip_file_check(zf, file_name, desired_ranks, data_from_zip) + if not check and not os.path.basename(file_name).startswith('.'): + unparsed_files.append(file_name) + # check if complete + exp_keys = [f"rank_{num+1}" for num in range(desired_ranks)] + for png_type in ["coverage", "plddt", "pae"]: + exp_keys.extend([ + f"{png_type}_png_data", f"{png_type}_png_file_name" + ]) + unmatched_keys = set(exp_keys) - set(data_from_zip) + if len(unmatched_keys) != 0: + raise RuntimeError( + f"Could not find expected files in {zip_file_path}. " \ + f"Missing {sorted(unmatched_keys)}." + ) + exp_mdl_keys = sorted(["mdl_file_base", "mdl_id", "ent", "scores"]) + for num in range(desired_ranks): + if exp_mdl_keys != sorted(data_from_zip[f"rank_{num+1}"].keys()): + raise RuntimeError( + f"Could not find expected files in {zip_file_path} " \ + f"for rank {num + 1}." + ) + extra_stuff = sorted(set(data_from_zip) - set(exp_keys)) + extra_stuff += sorted(unparsed_files) + if len(extra_stuff) != 0: + _warn_msg( + f"Extra unexpected content found in {zip_file_path}: " \ + f"{extra_stuff}" + ) + return data_from_zip + + +def _check_scores(mdl_data_from_zip, metadata, mdl_rank): + """Check scores JSON. + Bad issues raise exceptions, minor ones are in returned list + (compatible with list returned by _get_entities) + """ + issues = [] + scores_json = mdl_data_from_zip["scores"] + # NOTE: cannot deal with gapped sequences here as we cannot map + # multiple chains to scores + ost_ent = mdl_data_from_zip["ent"] + exp_len = ost_ent.residue_count + assert "ptm" in scores_json + assert "iptm" in scores_json + assert len(scores_json["pae"]) == exp_len + assert len(scores_json["pae"][0]) == exp_len + # check actual scores + for score_id in ["pLDDT", "pTM", "ipTM"]: + score = scores_json[score_id.lower()] + md_score = metadata[f"{score_id}_{mdl_rank}"] + if score_id == "pLDDT": + score = np.mean(score) + check = abs(score - md_score) < 0.11 + else: + check = abs(score - md_score) < 0.011 + if not check: + issues.append(( + metadata['mdl_id'], + "score_mismatch", + (mdl_rank, score_id, score, md_score), + () + )) + # b-factor vs pLDDT in expected range? + ent_plddts = [] + for i, res in enumerate(ost_ent.residues): + b_factors = [a.b_factor for a in res.atoms] + assert len(set(b_factors)) == 1 # must all be equal! + ent_plddts.append(b_factors[0]) + scores_plddts = scores_json["plddt"] + assert len(ent_plddts) == len(scores_plddts) + plddt_max_diff = max([ + abs(s1 - s2) for s1, s2 in zip(ent_plddts, scores_plddts) + ]) + # threshold due to 0.01 accuracy in PDB file + numerical rounding + if plddt_max_diff > 0.0051: + issues.append(( + metadata['mdl_id'], + "plddt_vs_bf_mismatch", + (mdl_rank, plddt_max_diff), + () + )) + return issues + + +def _get_n_parse_up_entry(up_ac, up_txt_path): + """Get data for an UniProtKB entry and parse it.""" + # This is a simple parser for UniProtKB txt format, instead of breaking it + # up into multiple functions, we just allow many many branches & statements, + # here. + # pylint: disable=too-many-branches,too-many-statements + data = {} + data["up_organism"] = "" + data["up_sequence"] = "" + data["up_ac"] = up_ac + # check if we read from file or URL + if up_txt_path.startswith("http"): + rspns = requests.get(up_txt_path, timeout=180) + lines = rspns.iter_lines(decode_unicode=True) + else: + lines = open(up_txt_path).readlines() + for line_ in lines: + # need to strip trailing characters if reading from file (doesn't hurt) + line = line_.rstrip() + if line.startswith("ID "): + sline = line.split() + if len(sline) != 5: + raise RuntimeError(f"Unusual UniProtKB ID line found:\n" \ + f"'{line}'") + data["up_id"] = sline[1] + elif line.startswith("OX NCBI_TaxID="): + # Following strictly the UniProtKB format: 'OX NCBI_TaxID=<ID>;' + data["up_ncbi_taxid"] = line[len("OX NCBI_TaxID=") : -1] + data["up_ncbi_taxid"] = data["up_ncbi_taxid"].split("{")[0].strip() + elif line.startswith("OS "): + # multiple lines possible; last one ends in "." + if line[-1] == ".": + data["up_organism"] += line[len("OS ") : -1] + else: + data["up_organism"] += line[len("OS ") :] + " " + elif line.startswith("SQ "): + sline = line.split() + if len(sline) != 8: + raise RuntimeError(f"Unusual UniProtKB SQ line found:\n" \ + f"'{line}'") + data["up_seqlen"] = int(sline[2]) + data["up_crc64"] = sline[6] + elif line.startswith(" "): + sline = line.split() + if len(sline) > 6: + raise RuntimeError( + "Unusual UniProtKB sequence data line " + + f"found:\n'{line}'" + ) + data["up_sequence"] += "".join(sline) + elif line.startswith("DT "): + dt_flds = line[len("DT ") :].split(", ") + if dt_flds[1].upper().startswith("SEQUENCE VERSION "): + data["up_last_mod"] = datetime.datetime.strptime( + dt_flds[0], "%d-%b-%Y" + ) + elif dt_flds[1].upper().startswith("ENTRY VERSION "): + data["up_entry_version"] = dt_flds[1][len("ENTRY VERSION ") :] + if data["up_entry_version"][-1] == ".": + data["up_entry_version"] = data["up_entry_version"][:-1] + data["up_entry_version"] = int(data["up_entry_version"]) + elif line.startswith("GN Name="): + data["up_gn"] = line[len("GN Name=") :].split(";")[0] + data["up_gn"] = data["up_gn"].split("{")[0].strip() + + # in UP isoforms are identified in the AC so no need for this... + # -> in PDB (e.g. 8TRE), we see unset _struct_ref.pdbx_db_isoform in such cases + data["up_isoform"] = None + + # NOTE: no gene names in this set (use provided names instead) + # if "up_gn" not in data: + # _warn_msg( + # f"No gene name found for UniProtKB entry '{up_ac}', using " + # + "UniProtKB AC instead." + # ) + # data["up_gn"] = up_ac + if "up_last_mod" not in data: + raise RuntimeError(f"No sequence version found for UniProtKB entry " \ + f"'{up_ac}'.") + if "up_crc64" not in data: + raise RuntimeError(f"No CRC64 value found for UniProtKB entry " \ + f"'{up_ac}'.") + if len(data["up_sequence"]) == 0: + raise RuntimeError(f"No sequence found for UniProtKB entry '{up_ac}'.") + # check that sequence length and CRC64 is correct + if data["up_seqlen"] != len(data["up_sequence"]): + raise RuntimeError( + "Sequence length of SQ line and sequence data differ for " + + f"UniProtKB entry '{up_ac}': {data['up_seqlen']} != " + + f"{len(data['up_sequence'])}" + ) + + if "up_id" not in data: + raise RuntimeError(f"No ID found for UniProtKB entry '{up_ac}'.") + if "up_ncbi_taxid" not in data: + raise RuntimeError(f"No NCBI taxonomy ID found for UniProtKB entry " \ + f"'{up_ac}'.") + if len(data["up_organism"]) == 0: + raise RuntimeError(f"No organism species found for UniProtKB entry " \ + f"'{up_ac}'.") + return data + + +def _fetch_upkb_entry(up_ac): + """Get an UniProtKB entry.""" + return _get_n_parse_up_entry( + up_ac, f"https://rest.uniprot.org/uniprotkb/{up_ac}.txt" + ) + + +def _fetch_unisave_entry(up_ac, version): + """Get an UniSave entry, in contrast to an UniProtKB entry, that allows us + to specify a version.""" + return _get_n_parse_up_entry( + up_ac, + f"https://rest.uniprot.org/unisave/{up_ac}?format=txt&" + + f"versions={version}", + ) + + +# for cache below +upkb_entry_cache = {} # key = (up_ac, up_version) +def _fetch_upkb_cached(up_ac, up_version=None): + """Get versioned UniProtKB entry (version=None means latest). + Get it from cache if already fetched. + Return None if failed to parse entry. + """ + # check if in cache already + cache_key = (up_ac, up_version) + if cache_key in upkb_entry_cache: + return upkb_entry_cache[cache_key] + # fetch and parse + if up_version is None: + up_data = _fetch_upkb_entry(up_ac) + else: + try: + # note: can fail to parse very old UP versions... + up_data = _fetch_unisave_entry(up_ac, up_version) + except RuntimeError as ex: + #_warn_msg(f"Error in parsing v{next_v} of {up_ac}:\n{ex}") + up_data = None + # keep in cache + upkb_entry_cache[cache_key] = up_data + return up_data + + +def _align_sequences(mdl_sqe, ref_sqe, atomseq_aln=True, ref_fixes=[], + gapped_aa="XOUBJZ"): + """Compare sequence while paying attention on non-standard amino acids. + + Can pass list of tuples for OLCs expected to be changed between ref and mdl. + E.g. Jason set would have ref_fixes=[('B', 'D'), ('J', 'L'), ('Z', 'E')]. + Non-def. AA (listed in gapped_aa) in ref_sqe are assumed to be gaps (-) in + mdl_sqe (this is def. in CF/AF for "XOUBJZ"). + + Returns (mismatches, ref_range, mdl_range, covered_aln, mdl_seqres): + - mismatches = list of (ref_pos, mdl_pos, olc_ref, olc_mdl) + (positions are 1-indexed, None if gap and only if in range) + - ref_range / mdl_range = (start, end) tuples with 1-indexed positions of + start and end of covered range (mdl_range with respect to mdl_sqe!). + Extra non-covered residues in mdl or ref can be counted by comparing + ref_range / mdl_range with lengths of ref_sqe / mdl_sqe. + - covered_aln = alignment (seq. 0 = REF, seq. 1 = MDL) within covered range + (i.e. first and last column have no gaps). If atomseq_aln is True, the + alignment only includes non-gap residues of mdl_sqe. OST seq. offsets are + set with respect to mdl_sqe, ref_sqe (0-indexed). Note that offsets are + only guaranteed to fit ref_range / mdl_range if atomseq_aln is False. + - mdl_seqres = mdl_sqe with gaps (-) replaced with seq. from ref. if + non-def-AA there or with X otherwise (i.e. both have same length!). + Here guaranteed for mdl_seqres to match mdl_sqe if AA in gapped_aa and X + are replaced by gaps (-). + """ + # add fixes if needed + ref_sqe_fixed = ref_sqe + for olc1, olc2 in ref_fixes: + ref_sqe_fixed = ref_sqe_fixed.replace(olc1, olc2) + # put gaps for parts not modelled by AF2 (i.e. any non-def-AA) + ref_sqe_fixed_gapped = ref_sqe_fixed + for olc in gapped_aa: + assert olc not in mdl_sqe + ref_sqe_fixed_gapped = ref_sqe_fixed_gapped.replace(olc, '-') + # easy and preferred case: mdl_sqe is subset of ref_sqe + ref_idx = ref_sqe_fixed_gapped.find(mdl_sqe) + if ref_idx >= 0: + mismatches = [] + ref_range = (ref_idx + 1, ref_idx + len(mdl_sqe)) + mdl_range = (1, len(mdl_sqe)) + mdl_seqres = ref_sqe_fixed[ref_idx : ref_idx + len(mdl_sqe)] + # we handle covered_aln afterwards... + aln_s_ref = ref_sqe[ref_idx : ref_idx + len(mdl_sqe)] + aln_s_mdl = mdl_seqres + else: + # align and report mismatches + ref_seq = seq.CreateSequence("REF", ref_sqe_fixed) + # use X as first guess for gaps in model + mdl_seq = seq.CreateSequence("MDL", mdl_sqe.replace('-', 'x')) + aln = seq.alg.SemiGlobalAlign(ref_seq, mdl_seq, seq.alg.BLOSUM62)[0] + # get range + aligned_indices = [i for i, c in enumerate(aln) \ + if c[0] != '-' and c[1] != '-'] + ref_range = ( + aln.GetResidueIndex(0, aligned_indices[0]) + 1, + aln.GetResidueIndex(0, aligned_indices[-1]) + 1, + ) + mdl_range = ( + aln.GetResidueIndex(1, aligned_indices[0]) + 1, + aln.GetResidueIndex(1, aligned_indices[-1]) + 1, + ) + # build up strings as we go + aln_s_ref = "" + aln_s_mdl = "" + mdl_seqres = "" + # collect mismatches and fix seqs as we go + mismatches = [] + for idx, (olc_ref, olc_mdl) in enumerate(aln): + # fix seqres as needed + if olc_mdl == 'x' and olc_ref in gapped_aa: + olc_mdl = olc_ref + if olc_mdl != '-': + mdl_seqres += olc_mdl + if idx >= aligned_indices[0] and idx <= aligned_indices[-1]: + # fill aln_s_x as needed + if olc_ref != '-': + # must fetch from ref_sqe + ref_idx = aln.GetResidueIndex(0, idx) + aln_s_ref += ref_sqe[ref_idx] + ref_pos = ref_idx + 1 + else: + aln_s_ref += '-' + ref_pos = None + if olc_mdl != '-': + # fetch from mdl_seqres here + # (revert back to mdl_sqe afterwards) + mdl_idx = aln.GetResidueIndex(1, idx) + aln_s_mdl += mdl_seqres[mdl_idx] + mdl_pos = mdl_idx + 1 + else: + aln_s_mdl += '-' + mdl_pos = None + if olc_ref != olc_mdl: + mismatches.append((ref_pos, mdl_pos, olc_ref, olc_mdl)) + # fix remaining x in mdl_seqres + mdl_seqres = mdl_seqres.replace('x', 'X') + # create covered_aln + s_ref_offset = ref_range[0] - 1 + s_mdl_offset = mdl_range[0] - 1 + covered_aln = seq.CreateAlignment( + seq.CreateSequence("REF", aln_s_ref), + seq.CreateSequence("MDL", aln_s_mdl.replace('x', 'X')) + ) + # cut it once again if needed (only for atomseq_aln) + if atomseq_aln: + # revert + new_cols = [ + ( + olc_ref, + '-' if olc_mdl == 'x' or olc_mdl in gapped_aa else olc_mdl + ) for olc_ref, olc_mdl in zip(aln_s_ref, aln_s_mdl) + ] + aligned_indices = [i for i, c in enumerate(new_cols) \ + if c[0] != '-' and c[1] != '-'] + s_ref_offset += covered_aln.GetResidueIndex(0, aligned_indices[0]) + s_mdl_offset += covered_aln.GetResidueIndex(1, aligned_indices[0]) + cut_cols = new_cols[aligned_indices[0]:aligned_indices[-1]+1] + aln_s_ref = "".join([olc_ref for olc_ref, _ in cut_cols]) + aln_s_mdl = "".join([olc_mdl for _, olc_mdl in cut_cols]) + covered_aln = seq.CreateAlignment( + seq.CreateSequence("REF", aln_s_ref), + seq.CreateSequence("MDL", aln_s_mdl) + ) + covered_aln.SetSequenceOffset(0, s_ref_offset) + covered_aln.SetSequenceOffset(1, s_mdl_offset) + # check post assertions (as in docstring) + assert ref_sqe[covered_aln.GetSequenceOffset(0):]\ + .startswith(covered_aln.sequences[0].gapless_string) + if atomseq_aln: + assert mdl_sqe[covered_aln.GetSequenceOffset(1)] \ + == covered_aln.sequences[1].gapless_string[0] + assert mdl_sqe[covered_aln.GetSequenceOffset(1):].replace('-', '')\ + .startswith(covered_aln.sequences[1].gapless_string) + else: + assert covered_aln.sequences[0].gapless_string \ + == ref_sqe[ref_range[0]-1:ref_range[1]] + assert covered_aln.sequences[1].gapless_string \ + == mdl_seqres[mdl_range[0]-1:mdl_range[1]] + assert mdl_seqres[covered_aln.GetSequenceOffset(1):]\ + .startswith(covered_aln.sequences[1].gapless_string) + assert len(mdl_seqres) == len(mdl_sqe) + mdl_sqe_check = mdl_seqres.replace('X', '-') + for olc in gapped_aa: + mdl_sqe_check = mdl_sqe_check.replace(olc, '-') + assert mdl_sqe_check == mdl_sqe.replace('X', '-') + # + return mismatches, ref_range, mdl_range, covered_aln, mdl_seqres + + +def _get_entities(mdl_data_from_zip, metadata): + """Gather data for the mmCIF (target) entities. + Returns (list of cif_ents, list of issues) + """ + # merge info for matching chains + unique_chains = {} # key = sqe_gaps, value = partial cif_ent + chain_info = {ch["chain"]: { + "name": ch["name"], "up_ac": ch["up_ac"] + } for ch in metadata["chains"]} + ost_ent = mdl_data_from_zip["ent"] + for chn in ost_ent.chains: + pdb_chain_id = _get_ch_name(chn, False) + if pdb_chain_id not in chain_info: + raise RuntimeError( + f"Non-described chain {pdb_chain_id} in " \ + f"{metadata['mdl_id']}/{mdl_data_from_zip['mdl_file_base']}" + ) + sqe_gaps = _get_sequence(chn) + cif_ent = { + "pdb_sequence": sqe_gaps, + "pdb_chain_ids": [_get_ch_name(chn, False)], + "description": chain_info[pdb_chain_id]["name"], + "up_ac": chain_info[pdb_chain_id]["up_ac"], + } + if sqe_gaps in unique_chains: + other_cif_ent = unique_chains[sqe_gaps] + # sanity checks + for key, value in other_cif_ent.items(): + if key != "pdb_chain_ids" and value != cif_ent[key]: + raise RuntimeError( + f"Inconsistent info {key} for identical chains for " \ + f"chain {pdb_chain_id} vs chains " \ + f"{other_cif_ent['pdb_chain_ids']}." + ) + # add to list of chains + other_cif_ent['pdb_chain_ids'].append(pdb_chain_id) + else: + unique_chains[sqe_gaps] = cif_ent + # sort by model chain name (should ensure same order of chains in mmCIF) + entities = sorted( + unique_chains.values(), + key=lambda x: min(x["pdb_chain_ids"]) + ) + # compare with info from UP and complete data to return + issues = [] + for cif_ent in entities: + sqe_gaps = cif_ent["pdb_sequence"] + up_ac = cif_ent["up_ac"] + up_data = _fetch_upkb_cached(up_ac) + up_sqe = up_data["up_sequence"] + mismatches, up_range, mdl_range, _, mdl_seqres = _align_sequences( + sqe_gaps, up_sqe, atomseq_aln=False + ) + num_extra_ref = len(up_sqe) - (up_range[1] - up_range[0] + 1) + len_mdl_covered = (mdl_range[1] - mdl_range[0] + 1) + num_extra_mdl = len(sqe_gaps) - len_mdl_covered + if len(mismatches) > 0 or num_extra_ref > 0 or num_extra_mdl > 0: + # ok to cover subset of UP usually (e.g. Ubiquitin), rest big issue + if len(mismatches) > 0 or num_extra_mdl > 0: + issue_type = "up_mismatch" + else: + issue_type = "up_extra" + chain_names = ",".join(cif_ent["pdb_chain_ids"]) + short_data = ( + mdl_data_from_zip['mdl_file_base'], chain_names, up_ac, + len_mdl_covered, len(mismatches), num_extra_ref, num_extra_mdl + ) + long_data = (mismatches, up_range, mdl_range) + issues.append( + (metadata['mdl_id'], issue_type, short_data, long_data) + ) + # cannot deal with gapped sequences here as we cannot map to scores + if sqe_gaps != mdl_seqres: + issues.append(( + metadata['mdl_id'], + "gapped_seq", + (cif_ent['pdb_chain_ids']), + (sqe_gaps, mdl_seqres) + )) + cif_ent["seqres"] = mdl_seqres + cif_ent["up_range"] = up_range + cif_ent.update(up_data) + return entities, issues + + +def _get_cf_config(cf_config, ur30_db_version=None, tpl_db=None, + tpl_db_version=None): + """Define ColabFold setup. + Extra info needed from depositor for DBs used (depend on MMseqs2 server) + - ur30_db_version options: see dict in _get_sequence_dbs + - tpl_db options: None, "PDB70", "PDB100" + - tpl_db_version options: see dict in _get_sequence_dbs + -> can be set to None if DB not used at all (incl. custom tpls) + Note on versions used over time + - first: 2021_03 version of UniRef30, unclear what PDB70 + - after 13.7.22: updated the UniRef30 to 2022_02 and PDB70 to 220313 + - after 12.6.23: UniRef30 2023_02, PDB100 (instead of PDB70) 230517 + - also to define if DB used at all for tpls or custom tpls + - db versions only relevant if actually used + """ + # NOTES: + # - UP-TO-DATE (as of March 2024) generic parser given a config.json dict + # - custom MSA is assumed to be complemented with extra step (as for Jason) + + # keep version indep. of params (and add commit since versions are meh) + cf_version = cf_config["version"] + if "commit" in cf_config and cf_config["commit"] is not None: + cf_version += f" ({cf_config['commit'][:7]})" + # drop fields which are not relevant for model building + cf_config = cf_config.copy() + for key in ["num_queries", "commit", "version"]: + if key in cf_config: + del cf_config[key] + + # NOTE: following code from + # https://github.com/sokrypton/ColabFold/blob/main/colabfold/batch.py to + # understand config + # -> should be backward compatible with Tara and Niko sets + # -> see also https://github.com/sokrypton/ColabFold/wiki/v1.5.0 + + # deal with old names (some settings changed name in v1.5) + # -> code taken almost verbatim from https://github.com/sokrypton/ColabFold + old_names = {"MMseqs2 (UniRef+Environmental)": "mmseqs2_uniref_env", + "MMseqs2 (UniRef only)": "mmseqs2_uniref", + "unpaired+paired": "unpaired_paired", + "AlphaFold2-multimer-v1": "alphafold2_multimer_v1", + "AlphaFold2-multimer-v2": "alphafold2_multimer_v2", + "AlphaFold2-multimer-v3": "alphafold2_multimer_v3", + "AlphaFold2-ptm": "alphafold2_ptm", + "AlphaFold2": "alphafold2"} + msa_mode = old_names.get(cf_config["msa_mode"], cf_config["msa_mode"]) + if "pair_mode" in cf_config: + pair_mode = old_names.get(cf_config["pair_mode"], cf_config["pair_mode"]) + model_type = old_names.get(cf_config["model_type"], cf_config["model_type"]) + + # fix v1.5 defaults for num_recycles and recycle_early_stop_tolerance + # -> def. (set as "null" in config): + # - num_recycles == 20 if alphafold2_multimer_v3 else 3 + # - recycle_early_stop_tolerance == 0.5 if multimer else 0.0 + # -> valid from 1.5.0 until 1.5.5 (and probably later) + # -> defined in alphafold/model/config.py of steineggerlab/alphafold repo + if "num_recycles" in cf_config and cf_config["num_recycles"] is None: + if "multimer" in model_type and model_type not in [ + "alphafold2_multimer_v1", "alphafold2_multimer_v2" + ]: + cf_config["num_recycles"] = 20 + else: + cf_config["num_recycles"] = 3 + if "recycle_early_stop_tolerance" in cf_config \ + and cf_config["recycle_early_stop_tolerance"] is None: + cf_config["recycle_early_stop_tolerance"] = \ + 0.5 if "multimer" in model_type else 0.0 + + # fetch relevant data + # -> MSA mode + if msa_mode == "mmseqs2_uniref_env": + seq_dbs = ["UniRef", "Environmental"] + use_mmseqs = True + use_msa = True + elif msa_mode == "mmseqs2_uniref": + seq_dbs = ["UniRef"] + use_mmseqs = True + use_msa = True + elif msa_mode == "single_sequence": + seq_dbs = [] + use_mmseqs = False + use_msa = False + elif msa_mode == "custom": + seq_dbs = [] + use_mmseqs = False + use_msa = True + else: + raise ValueError(f"Unknown msa_mode {cf_config['msa_mode']}") + + # -> model type + if model_type == "alphafold2_multimer_v1": + # AF-Multimer as introduced in AlphaFold v2.1.0 + use_multimer = True + multimer_version = 1 + elif model_type == "alphafold2_multimer_v2": + # AF-Multimer as introduced in AlphaFold v2.2.0 + use_multimer = True + multimer_version = 2 + elif model_type == "alphafold2_multimer_v3": + # AF-Multimer as introduced in AlphaFold v2.3.0 + use_multimer = True + multimer_version = 3 + elif model_type == "alphafold2_ptm": + use_multimer = False + multimer_version = None + else: + raise ValueError(f"Unknown model_type {cf_config['model_type']}") + + # write modeling description + mdl_description = f"Model generated using ColabFold v{cf_version}" + if use_multimer: + mdl_description += f" with AlphaFold-Multimer (v{multimer_version})" + else: + mdl_description += " with AlphaFold" + # early stopping feature of ColabFold + upto_mdl = "" + upto_rec = "" + if cf_config.get("stop_at_score", 100) < 100: + upto_mdl = "up to " + upto_rec = "up to " + if cf_config.get("recycle_early_stop_tolerance", 0) > 0: + upto_rec = "up to " + if cf_config.get("num_seeds", 1) > 1: + mdl_str = f"{cf_config['num_models'] * cf_config['num_seeds']} " \ + f"models ({cf_config['num_seeds']} random seeds per " \ + f"parameter set)" + else: + mdl_str = f"{cf_config['num_models']} models" + mdl_description += f" producing {upto_mdl}{mdl_str} with {upto_rec}" \ + f"{cf_config['num_recycles']} recycles each" + if cf_config.get("use_amber", False) or \ + cf_config.get("num_relax", 0) > 0: + mdl_description += ", with AMBER relaxation" + else: + mdl_description += ", without model relaxation" + if cf_config["use_templates"]: + # tpl_db == None meant to mean that custom templates were used + # -> no need to stress it but just visible in search DBs + mdl_description += ", using templates" + else: + mdl_description += ", without templates" + if cf_config["rank_by"] == "plddt": + mdl_description += ", ranked by pLDDT" + elif cf_config["rank_by"] == "ptmscore": + mdl_description += ", ranked by pTM" + elif cf_config["rank_by"] == "multimer": + mdl_description += ", ranked by 80*ipTM+20*pTM" + else: + raise ValueError(f"Unknown rank_by {cf_config['rank_by']}") + if use_msa: + mdl_description += ", starting from" + if use_mmseqs: + msa_type = "MSA" + else: + msa_type = "custom MSA" + if use_multimer: + if pair_mode == "unpaired_paired": + mdl_description += f" paired and unpaired {msa_type}s" + elif pair_mode == "paired": + mdl_description += f" paired {msa_type}s" + elif pair_mode == "unpaired": + mdl_description += f" unpaired {msa_type}s" + else: + raise ValueError(f"Unknown pair_mode {cf_config['pair_mode']}") + elif msa_type.startswith('M'): + mdl_description += f" an {msa_type}" + else: + mdl_description += f" a {msa_type}" + if use_mmseqs: + mdl_description += f" from MMseqs2 ({'+'.join(seq_dbs)})" + else: + mdl_description += " without an MSA" + mdl_description += "." + + return { + "params": cf_config, + "version": cf_version, + "seq_dbs": seq_dbs, + "use_mmseqs": use_mmseqs, + "use_msa": use_msa, + "ur30_db_version": ur30_db_version, + "tpl_db": tpl_db, + "tpl_db_version": tpl_db_version, + "use_multimer": use_multimer, + "multimer_version": multimer_version, + "description": mdl_description, + } + + +def _get_mmseqs2_software(version=None): + """Get MMseqs2 as a dictionary, suitable to create a modelcif software + object.""" + return { + "name": "MMseqs2", + "classification": "data collection", + "description": "Many-against-Many sequence searching", + "citation": ihm.citations.mmseqs2, + "location": "https://github.com/soedinglab/mmseqs2", + "type": "package", + "version": version, + } + + +def _get_colabfold_software(version=None): + """Get ColabFold as a dictionary, suitable to create a modelcif software + object.""" + return { + "name": "ColabFold", + "classification": "model building", + "description": "Structure prediction", + "citation": ihm.citations.colabfold, + "location": "https://github.com/sokrypton/ColabFold", + "type": "package", + "version": version, + } + + +def _get_af2_software(version=None, is_multimer=False): + """Get AF2 as dictionary, suitable to create a modelcif software object.""" + if is_multimer: + return { + "name": "AlphaFold-Multimer", + "classification": "model building", + "description": "Structure prediction", + "citation": ihm.Citation( + pmid=None, + title="Protein complex prediction with " + + "AlphaFold-Multimer.", + journal="bioRxiv", + volume=None, + page_range=None, + year=2021, + authors=[ + "Evans, R.", + "O'Neill, M.", + "Pritzel, A.", + "Antropova, N.", + "Senior, A.", + "Green, T.", + "Zidek, A.", + "Bates, R.", + "Blackwell, S.", + "Yim, J.", + "Ronneberger, O.", + "Bodenstein, S.", + "Zielinski, M.", + "Bridgland, A.", + "Potapenko, A.", + "Cowie, A.", + "Tunyasuvunakool, K.", + "Jain, R.", + "Clancy, E.", + "Kohli, P.", + "Jumper, J.", + "Hassabis, D.", + ], + doi="10.1101/2021.10.04.463034", + ), + "location": "https://github.com/deepmind/alphafold", + "type": "package", + "version": version, + } + else: + return { + "name": "AlphaFold", + "classification": "model building", + "description": "Structure prediction", + "citation": ihm.citations.alphafold2, + "location": "https://github.com/deepmind/alphafold", + "type": "package", + "version": version, + } + + +def _get_protocol_steps_and_software(cf_config): + """Create the list of protocol steps with software and parameters used.""" + protocol = [] + + # build up SW + sw_plus_params = [ + ( + _get_colabfold_software(cf_config["version"]), cf_config["params"] + ) + ] + if cf_config["use_mmseqs"]: + sw_plus_params.append((_get_mmseqs2_software(), {})) + sw_plus_params.append(( + _get_af2_software(is_multimer=cf_config["use_multimer"]), {} + )) + + # modelling step + protocol.append({ + "method_type": "modeling", + "name": None, + "details": cf_config["description"], + "input": "target_sequences_and_ref_DBs", + "output": "model", + "software_plus_params": sw_plus_params, + }) + + return protocol + + +def _get_title(metadata): + """Get a title for this modelling experiment.""" + return metadata["title"].strip() + + +def _get_model_details(metadata): + """Get the model description.""" + return metadata["abstract"].strip() +################################################################################ + +################################################################################ +# ModelCIF HANDLING +################################################################################ +# pylint: disable=too-few-public-methods +class _GlobalPTM(modelcif.qa_metric.Global, modelcif.qa_metric.PTM): + """Predicted accuracy according to the TM-score score in [0,1]""" + + name = "pTM" + software = None + + +class _GlobalIpTM(modelcif.qa_metric.Global, modelcif.qa_metric.IpTM): + """Predicted protein-protein interface score based on TM-score in [0,1]""" + + name = "ipTM" + software = None + + +class _GlobalPLDDT(modelcif.qa_metric.Global, modelcif.qa_metric.PLDDT): + """Predicted accuracy according to the CA-only lDDT in [0,100]""" + name = "pLDDT" + software = None + + +class _LocalPLDDT(modelcif.qa_metric.Local, modelcif.qa_metric.PLDDT): + """Predicted accuracy according to the CA-only lDDT in [0,100]""" + name = "pLDDT" + software = None + + +class _LocalPairwisePAE(modelcif.qa_metric.LocalPairwise, modelcif.qa_metric.PAE): + """Predicted aligned error (in Angstroms)""" + + name = "PAE" + software = None + + +class _LPeptideAlphabetWithXO(ihm.LPeptideAlphabet): + """Have the default amino acid alphabet plus 'X' for unknown residues + and 'O' as allowed non-def. AA (U already in alphabet).""" + + # extra entry added according to LPeptideAlphabet def. in + # https://python-ihm.readthedocs.io/en/latest/_modules/ihm.html + # and https://files.rcsb.org/view/1NTH.cif for values for 'O'. + + def __init__(self): + """Create the alphabet.""" + super().__init__() + self._comps["X"] = self._comps["UNK"] + self._comps['O'] = ihm.LPeptideChemComp( + "PYL", "O", "O", "PYRROLYSINE", "C12 H21 N3 O3" + ) +# pylint: enable=too-few-public-methods + + +class _OST2ModelCIF(modelcif.model.AbInitioModel): + """Map OST entity elements to ihm.model""" + + def __init__(self, *args, **kwargs): + """Initialise a model""" + for i in ["ost_entity", "asym", "scores_json", "incl_pae"]: + if i not in kwargs: + raise TypeError(f"Required keyword argument '{i}' not found.") + self.ost_entity = kwargs.pop("ost_entity") + self.asym = kwargs.pop("asym") + self.scores_json = kwargs.pop("scores_json") + self.incl_pae = kwargs.pop("incl_pae") + + # use auth IDs for res. nums and chain names + self.use_auth = False + # what accuracy to use for PAE? (writer uses 3 anyway) + self.pae_digits = 3 + + super().__init__(*args, **kwargs) + + def get_atoms(self): + # ToDo [internal]: Take B-factor out since its not a B-factor? + # NOTE: this assumes that _get_res_num maps residue to pos. in seqres + # within asym + for atm in self.ost_entity.atoms: + yield modelcif.model.Atom( + asym_unit=self.asym[_get_ch_name(atm.chain, self.use_auth)], + seq_id=_get_res_num(atm.residue, self.use_auth), + atom_id=atm.name, + type_symbol=atm.element, + x=atm.pos[0], + y=atm.pos[1], + z=atm.pos[2], + het=atm.is_hetatom, + biso=atm.b_factor, + occupancy=atm.occupancy, + ) + + def add_scores(self): + """Add QA metrics from AF2 scores.""" + # global scores + self.qa_metrics.extend( + ( + _GlobalPLDDT(self.scores_json["plddt_global"]), + _GlobalPTM(self.scores_json["ptm"]), + _GlobalIpTM(self.scores_json["iptm"]), + ) + ) + + # NOTE: none of the below expected top work if we have unmodelled gaps! + + # local scores + lpae = [] + i = 0 + for chn_i in self.ost_entity.chains: + ch_name_i = _get_ch_name(chn_i, self.use_auth) + for res_i in chn_i.residues: + # local pLDDT + res_num_i = _get_res_num(res_i, self.use_auth) + self.qa_metrics.append( + _LocalPLDDT( + self.asym[ch_name_i].residue(res_num_i), + self.scores_json["plddt"][i], + ) + ) + + # pairwise alignment error + if self.incl_pae: + j = 0 + for chn_j in self.ost_entity.chains: + ch_name_j = _get_ch_name(chn_j, self.use_auth) + for res_j in chn_j.residues: + res_num_j = _get_res_num(res_j, self.use_auth) + pae_ij = self.scores_json["pae"][i][j] + lpae.append( + _LocalPairwisePAE( + self.asym[ch_name_i].residue(res_num_i), + self.asym[ch_name_j].residue(res_num_j), + round(pae_ij, self.pae_digits), + ) + ) + j += 1 + + i += 1 + + if self.incl_pae: + self.qa_metrics.extend(lpae) + + +def _get_modelcif_entities(target_ents, asym_units, system): + """Create ModelCIF entities and asymmetric units.""" + alphabet = _LPeptideAlphabetWithXO() + for cif_ent in target_ents: + # collect references + up_ref = modelcif.reference.UniProt( + code=cif_ent["up_id"], + accession=cif_ent["up_ac"], + isoform=cif_ent["up_isoform"], + ncbi_taxonomy_id=cif_ent["up_ncbi_taxid"], + organism_scientific=cif_ent["up_organism"], + sequence_version_date=cif_ent["up_last_mod"], + sequence_crc64=cif_ent["up_crc64"], + sequence=cif_ent["up_sequence"], + ) + # ASSUME: full model covered w/o mismatches + # -> NOTE: sequence passed above is cut based on alignments! + up_ref.alignments.append(modelcif.reference.Alignment( + db_begin=cif_ent["up_range"][0], + db_end=cif_ent["up_range"][1], + entity_begin=1, + entity_end=len(cif_ent["seqres"]), + seq_dif=[] + )) + # + references = [up_ref] + # combine into ModelCIF entity + mdlcif_ent = modelcif.Entity( + cif_ent["seqres"], + description=cif_ent["description"], + alphabet=alphabet, + source=ihm.source.Natural( + ncbi_taxonomy_id=cif_ent["up_ncbi_taxid"], + scientific_name=cif_ent["up_organism"], + ), + references=references, + ) + # NOTE: this assigns (potentially new) alphabetic chain names + for pdb_chain_id in cif_ent["pdb_chain_ids"]: + asym_units[pdb_chain_id] = modelcif.AsymUnit( + mdlcif_ent, strand_id=pdb_chain_id, + ) + system.entities.append(mdlcif_ent) + + +def _get_assoc_pae_file(entry_id, mdl_name): + """Generate a associated file object to extract PAE to extra file.""" + return modelcif.associated.LocalPairwiseQAScoresFile( + f"{mdl_name}_local_pairwise_qa.cif", + categories=["_ma_qa_metric_local_pairwise"], + copy_categories=["_ma_qa_metric"], + entry_id=entry_id, + entry_details="This file is an associated file consisting " + + "of local pairwise QA metrics. This is a partial mmCIF " + + "file and can be validated by merging with the main " + + "mmCIF file containing the model coordinates and other " + + "associated data.", + details="Predicted aligned error", + ) + + +def _get_assoc_png_file(fle_path, png_type): + """Generate a modelcif.associated.File object pointing to PNG file + with content defined by png_type (coverage, plddt, or pae). + """ + details = { + "coverage": "PNG file showing number of sequences in the MSA covering " + + "each position in the target sequences", + "plddt": "PNG file showing pLDDT at each residue position for each " + + "of the 5 models produced", + "pae": "PNG file showing the PAE matrices for each of the 5 models " + + "produced", + } + afile = modelcif.associated.File( + fle_path, + details=details[png_type], + ) + # NOTE: file_format can be set to "png" in future ModelCIF versions + # (i.e. when https://github.com/ihmwg/ModelCIF/issues/17 is resolved) + afile.file_format = "other" + afile.file_content = "other" + return afile + + +def _get_assoc_mdl_file(fle_path, data_json): + """Generate a modelcif.associated.File object that looks like a CIF file. + The dedicated CIFFile functionality in modelcif would also try to write it. + """ + cfile = modelcif.associated.File( + fle_path, + details=f"#{data_json['mdl_rank']} ranked model; " + + f"pLDDT {round(data_json['plddt_global'], 1)}, " + + f"pTM {round(data_json['ptm'], 3)}, " + + f"ipTM {round(data_json['iptm'], 3)}", + ) + cfile.file_format = "cif" + return cfile + + +def _get_assoc_zip_file(fle_path, data_json): + """Create a modelcif.associated.File object that looks like a ZIP file. + This is NOT the archive ZIP file for the PAEs but to store that in the + ZIP archive of the selected model.""" + zfile = modelcif.associated.File( + fle_path, + details="archive with multiple files for " + + f"#{data_json['mdl_rank']} ranked model", + ) + zfile.file_format = "other" + return zfile + + +def _get_associated_files(mdl_name, arc_files): + """Create entry for associated files.""" + # package all into zip file + return modelcif.associated.Repository( + "", + [modelcif.associated.ZipFile(f"{mdl_name}.zip", files=arc_files)], + ) + # NOTE: by convention MA expects zip file with same name as model-cif + + +def _get_sequence_dbs(config_data): + """Get ColabFold seq. DBs.""" + # Uses HC list of known DBs used in ColabFold + # -> see also notes in _get_config + db_dict = { + "UniRef_2021_03": modelcif.ReferenceDatabase( + "UniRef30", + "https://wwwuser.gwdg.de/~compbiol/colabfold/uniref30_2103.tar.gz", + version="2021_03", + ), + "UniRef_2022_02": modelcif.ReferenceDatabase( + "UniRef30", + "https://wwwuser.gwdg.de/~compbiol/colabfold/uniref30_2202.tar.gz", + version="2022_02", + ), + "UniRef_2023_02": modelcif.ReferenceDatabase( + "UniRef30", + "https://wwwuser.gwdg.de/~compbiol/colabfold/uniref30_2302.tar.gz", + version="2023_02", + ), + "Environmental": modelcif.ReferenceDatabase( + "ColabFold DB", + "https://wwwuser.gwdg.de/~compbiol/colabfold/" + + "colabfold_envdb_202108.tar.gz", + version="2021_08", + ), + "PDB100_230517": modelcif.ReferenceDatabase( + "PDB100", + "https://wwwuser.gwdg.de/~compbiol/data/hhsuite/databases/" + + "hhsuite_dbs/pdb100_foldseek_230517.tar.gz", + release_date=datetime.datetime(2023, 5, 17) + ), + "PDB70_211027": modelcif.ReferenceDatabase( + "PDB70", + "https://wwwuser.gwdg.de/~compbiol/data/hhsuite/databases/" + + "hhsuite_dbs/pdb70_from_mmcif_211027.tar.gz", + release_date=datetime.datetime(2021, 10, 27) + ), + "PDB70_211117": modelcif.ReferenceDatabase( + "PDB70", + "https://wwwuser.gwdg.de/~compbiol/data/hhsuite/databases/" + + "hhsuite_dbs/pdb70_from_mmcif_211117.tar.gz", + release_date=datetime.datetime(2021, 11, 17) + ), + "PDB70_220313": modelcif.ReferenceDatabase( + "PDB70", + "https://wwwuser.gwdg.de/~compbiol/data/hhsuite/databases/" + + "hhsuite_dbs/pdb70_from_mmcif_220313.tar.gz", + release_date=datetime.datetime(2022, 3, 13) + ), + } + # fill list of DBs + seq_dbs = [] + search_keys = [] + for seq_db in config_data["seq_dbs"]: + if seq_db == "UniRef": + if config_data['ur30_db_version'] is None: + raise ValueError("Cannot use UniRef without version") + search_key = f"UniRef_{config_data['ur30_db_version']}" + else: + search_key = seq_db + search_keys.append(search_key) + if config_data["tpl_db"] is not None: + if config_data["tpl_db_version"] is None: + raise ValueError("Cannot have tpl DB without version") + search_keys.append( + f"{config_data['tpl_db']}_{config_data['tpl_db_version']}" + ) + for search_key in search_keys: + if search_key not in db_dict: + raise ValueError(f"Unknown seq. DB {search_key}") + seq_dbs.append(db_dict[search_key]) + return seq_dbs + + +def _assemble_modelcif_software(soft_dict, params_dict): + """Create a modelcif.SoftwareWithParameters instance from dictionaries.""" + # create SW object + sw = modelcif.Software( + soft_dict["name"], + soft_dict["classification"], + soft_dict["description"], + soft_dict["location"], + soft_dict["type"], + soft_dict["version"], + citation=soft_dict["citation"], + ) + # assemble parameters + params = [] + for key, val in params_dict.items(): + params.append(modelcif.SoftwareParameter(key, val)) + # put them together + return modelcif.SoftwareWithParameters(sw, params) + + +def _get_modelcif_protocol_software(js_step): + """Assemble software entries for a ModelCIF protocol step.""" + # new setup in python-modelcif (as of late 2023): params with each SW + sw_list = [] + for sw, sw_params in js_step["software_plus_params"]: + sw_list.append(_assemble_modelcif_software(sw, sw_params)) + # group and done... + if sw_list: + return modelcif.SoftwareGroup(sw_list) + else: + return None + + +def _get_modelcif_protocol_data(data_label, target_entities, model, ref_dbs): + """Assemble data for a ModelCIF protocol step.""" + if data_label == "target_sequences_and_ref_DBs": + data = modelcif.data.DataGroup(target_entities) + data.extend(ref_dbs) + elif data_label == "model": + data = model + else: + raise RuntimeError(f"Unknown protocol data: '{data_label}'") + return data + + +def _get_modelcif_protocol(protocol_steps, target_entities, model, ref_dbs): + """Create the protocol for the ModelCIF file.""" + protocol = modelcif.protocol.Protocol() + for js_step in protocol_steps: + sftwre = _get_modelcif_protocol_software(js_step) + input_data = _get_modelcif_protocol_data( + js_step["input"], target_entities, model, ref_dbs + ) + output_data = _get_modelcif_protocol_data( + js_step["output"], target_entities, model, ref_dbs + ) + + protocol.steps.append( + modelcif.protocol.Step( + input_data=input_data, + output_data=output_data, + name=js_step["name"], + details=js_step["details"], + software=sftwre, + ) + ) + protocol.steps[-1].method_type = js_step["method_type"] + return protocol + + +def _compress_cif_file(cif_file): + """Compress cif file and delete original.""" + with open(cif_file, "rb") as f_in: + with gzip.open(cif_file + ".gz", "wb") as f_out: + shutil.copyfileobj(f_in, f_out) + os.remove(cif_file) + + +def _package_associated_files(repo): + """Compress associated files into single zip file and delete original.""" + # zip settings tested for good speed vs compression + for archive in repo.files: + with zipfile.ZipFile(archive.path, "w", zipfile.ZIP_BZIP2) as cif_zip: + for zfile in archive.files: + cif_zip.write(zfile.path, arcname=zfile.path) + os.remove(zfile.path) + + +def _store_as_modelcif(data_json, ost_ent, out_dir, mdl_name, compress, + add_pae, add_pngs, add_files): + """Mix all the data into a ModelCIF file.""" + print(" generating ModelCIF objects...", end="") + pstart = timer() + # create system to gather all the data + system = modelcif.System( + title=data_json["title"], + id=data_json["mdl_id"].upper(), + model_details=data_json["model_details"], + ) + + # create an asymmetric unit and an entity per target sequence + asym_units = {} + _get_modelcif_entities(data_json["target_entities"], asym_units, system) + + # audit_authors + system.authors.extend(data_json["audit_authors"]) + + # set up the model to produce coordinates + model = _OST2ModelCIF( + assembly=modelcif.Assembly(asym_units.values()), + asym=asym_units, + ost_entity=ost_ent, + scores_json=data_json, + name=data_json["mdl_name"], + incl_pae=add_pae, + ) + print(f" ({timer()-pstart:.2f}s)") + print(" processing QA scores...", end="", flush=True) + pstart = timer() + model.add_scores() + print(f" ({timer()-pstart:.2f}s)") + + model_group = modelcif.model.ModelGroup([model]) + system.model_groups.append(model_group) + + # handle additional files + arc_files = [] + if add_pae: + arc_files.append(_get_assoc_pae_file(system.id, mdl_name)) + if add_pngs: + for png_type in ["coverage", "plddt", "pae"]: + arc_files.append(_get_assoc_png_file( + data_json[f"{png_type}_png_file_name"], png_type + )) + arc_files.extend(add_files) + if arc_files: + system.repositories.append(_get_associated_files(mdl_name, arc_files)) + + # get data and steps + ref_dbs = _get_sequence_dbs(data_json["cf_config"]) + protocol = _get_modelcif_protocol( + data_json["protocol"], system.entities, model, ref_dbs, + ) + system.protocols.append(protocol) + + # write modelcif System to file + print(" write to disk...", end="", flush=True) + pstart = timer() + # NOTE: this will dump PAE on path provided in add_scores + # -> hence we cheat by changing path and back while being exception-safe... + oldpwd = os.getcwd() + os.chdir(out_dir) + mdl_fle = f"{mdl_name}.cif" + try: + with open(mdl_fle, "w", encoding="ascii") as mmcif_fh: + modelcif.dumper.write(mmcif_fh, [system]) + if add_pngs: + for png_type in ["coverage", "plddt", "pae"]: + with open(data_json[f"{png_type}_png_file_name"], "wb") as fh: + fh.write(data_json[f"{png_type}_png_data"]) + if arc_files: + _package_associated_files(system.repositories[0]) + if compress: + _compress_cif_file(mdl_fle) + mdl_fle += ".gz" + finally: + os.chdir(oldpwd) + print(f" ({timer()-pstart:.2f}s)") + assoc_files = [_get_assoc_mdl_file(mdl_fle, data_json)] + if arc_files: + assoc_files.append( + _get_assoc_zip_file(system.repositories[0].files[0].path, data_json) + ) + return assoc_files +################################################################################ + +################################################################################ +# HANDLE FULL DATA SET +################################################################################ +def _translate2modelcif_single( + metadata, opts, mdl_id, data_from_zip, entities, mdl_rank, add_files=[] +): + """Convert a single model with its accompanying data to ModelCIF.""" + mdl_data_from_zip = data_from_zip[f"rank_{mdl_rank}"] + mdl_af_id = mdl_data_from_zip["mdl_id"] + if mdl_rank > 1: + mdl_id += f"_rank_{mdl_rank}_{mdl_af_id}" + + print(f" translating {mdl_id}...") + pdb_start = timer() + + # gather data into JSON-like structure + print(" preparing data...", end="") + pstart = timer() + + mdlcf_json = {} + # note: must override num_recycles with Recycle_number in config + config_dict = metadata["config_dict"].copy() + config_dict["num_recycles"] = metadata["Recycle_number"] + cf_config = _get_cf_config(config_dict, ur30_db_version="2023_02") + mdlcf_json["audit_authors"] = _get_audit_authors() + mdlcf_json["protocol"] = _get_protocol_steps_and_software(cf_config) + mdlcf_json["cf_config"] = cf_config + mdlcf_json["mdl_id"] = mdl_id # used for entry ID + mdlcf_json["mdl_rank"] = mdl_rank + if mdl_rank == 1: + mdlcf_json["mdl_name"] = f"Top ranked model ({mdl_af_id})" + else: + mdlcf_json["mdl_name"] = f"#{mdl_rank} ranked model ({mdl_af_id})" + mdlcf_json["target_entities"] = entities + for scores_key in ["plddt", "pae"]: + mdlcf_json[scores_key] = mdl_data_from_zip["scores"][scores_key] + # override global scores with higher accuracy ones in metadata + mdlcf_json["plddt_global"] = metadata[f"pLDDT_{mdl_rank}"] + mdlcf_json["ptm"] = metadata[f"pTM_{mdl_rank}"] + mdlcf_json["iptm"] = metadata[f"ipTM_{mdl_rank}"] + # + mdlcf_json["title"] = _get_title(metadata) + if mdl_rank != 1: + mdlcf_json["title"] += f" (#{mdl_rank} ranked model)" + mdlcf_json["model_details"] = _get_model_details(metadata) + # fill PNG data + for png_type in ["coverage", "plddt", "pae"]: + mdlcf_json[f"{png_type}_png_file_name"] = f"{mdl_id}_{png_type}.png" + mdlcf_json[f"{png_type}_png_data"] = data_from_zip[ + f"{png_type}_png_data" + ] + + print(f" ({timer()-pstart:.2f}s)") + + # save ModelCIF + assoc_files = _store_as_modelcif( + data_json=mdlcf_json, + ost_ent=mdl_data_from_zip["ent"], + out_dir=opts.out_dir, + mdl_name=mdl_id, + compress=(mdl_rank == 1 and opts.compress), + add_pae=(mdl_rank == 1 and not opts.no_extra_files), + add_pngs=(mdl_rank == 1 and not opts.no_extra_files), + add_files=add_files + ) + + # check if result can be read and has expected seq. + mdl_path = os.path.join(opts.out_dir, assoc_files[0].path) + ent, ss = io.LoadMMCIF(mdl_path, seqres=True) + exp_seqs = [] + for trg_ent in mdlcf_json["target_entities"]: + exp_seqs += [trg_ent["pdb_sequence"]] * len(trg_ent["pdb_chain_ids"]) + assert ent.chain_count == len(exp_seqs), f"Bad chain count {mdl_id}" + # NOTE: here we expect auth = label IDs + ent_seq = "".join([_get_sequence(chn, False) for chn in ent.chains]) + ent_seq_a = "".join([_get_sequence(chn, True) for chn in ent.chains]) + assert ent_seq == ent_seq_a + assert ent_seq == "".join(exp_seqs), f"Bad seq. {mdl_id}" + ent_seqres = [ss.FindSequence(chn.name).string for chn in ent.chains] + exp_seqres = [] + for trg_ent in mdlcf_json["target_entities"]: + exp_seqres += [trg_ent["seqres"]] * len(trg_ent["pdb_chain_ids"]) + assert ent_seqres == exp_seqres, f"Bad seqres {mdl_id}" + + print(f" ... done with {mdl_id} ({timer()-pdb_start:.2f}s).") + + return assoc_files + + +def _translate2modelcif(metadata, opts, desired_ranks=3): + """Convert a model with its accompanying data to ModelCIF.""" + mdl_id = metadata["mdl_id"] + # skip if done already (done later here due to info to be returned) + if opts.compress: + cifext = "cif.gz" + else: + cifext = "cif" + mdl_path = os.path.join(opts.out_dir, f"{mdl_id}.{cifext}") + + # prepare data for models to convert (also gets all issues) + issues = [] + ent_dict = {} + data_from_zip = _parse_zip_file( + metadata["zip_file_path"], desired_ranks=desired_ranks + ) + for num in range(desired_ranks): + mdl_rank = num + 1 + mdl_data_from_zip = data_from_zip[f"rank_{mdl_rank}"] + entities, ent_issues = _get_entities(mdl_data_from_zip, metadata) + issues.extend(ent_issues) + ent_dict[mdl_rank] = entities + scores_issues = _check_scores(mdl_data_from_zip, metadata, mdl_rank) + issues.extend(scores_issues) + + # abort here if already done + if opts.checks_only: + return issues + if os.path.exists(mdl_path): + print(f" {mdl_id} already done...") + return issues + + # convert models if needed starting from lower ranked ones + add_files = [] + if not opts.no_extra_files: + for num in range(1, desired_ranks): + mdl_rank = num + 1 + entities = ent_dict[mdl_rank] + assoc_files = _translate2modelcif_single( + metadata, opts, mdl_id, data_from_zip, entities, mdl_rank + ) + add_files.extend(assoc_files) + # do top ranked one with assoc. files + _translate2modelcif_single( + metadata, opts, mdl_id, data_from_zip, ent_dict[1], + mdl_rank=1, add_files=add_files + ) + return issues + + +def _main(): + """Run as script.""" + + # parse/fetch global data + opts = _parse_args() + metadata_all = _get_metadata(opts.input_data_path, opts.single_model) + + # iterate over models + print(f"Working on models in {opts.input_data_path}...") + issues = [] + for metadata in metadata_all: + new_issues = _translate2modelcif(metadata, opts) + issues.extend(new_issues) + print(f"... done with models in {opts.input_data_path}.") + + if opts.single_model is None: + # dump issues + issues_file_path = os.path.join(opts.out_dir, "issues.json") + json.dump(issues, open(issues_file_path, "w")) + # dump info on which ones to export to 3D-Beacons + to_export_file_path = os.path.join(opts.out_dir, "MA_to_export.json") + to_export = { + metadata["mdl_id"]: (metadata["Interaction_note"] == "known interaction") \ + for metadata in metadata_all + } + json.dump(to_export, open(to_export_file_path, "w")) + + # TEST: to judge res. needed on cluster + import resource + print('mem', resource.getrusage(resource.RUSAGE_SELF).ru_maxrss / 1000) + + +if __name__ == "__main__": + _main() diff --git a/projects/2024-07-ma-osf-ppp2r2a/README.md b/projects/2024-07-ma-osf-ppp2r2a/README.md new file mode 100644 index 0000000000000000000000000000000000000000..d8db478f7e54f80c44d07a77994ec310e8ab83b9 --- /dev/null +++ b/projects/2024-07-ma-osf-ppp2r2a/README.md @@ -0,0 +1,30 @@ +# Modelling of PP2A-B55-design + +[Link to project in ModelArchive](https://www.modelarchive.org/doi/10.5452/ma-osf-ppp2r2a) (incl. background on project itself) + +Input files for conversion: +- info_of_submitted_structures.csv with metadata (incl. per chain mapping to entity description, subsetting, UniProtKB AC, iPlddt and iPAE scores, binding experiment results ) +_ 4 models directories : screen_256, holoenzyme, relaxes_figures and design +- config.json in every directory above +- [NAME].pdb for each model NAME listed in info_of_submitted_structures.csv, in the directory indicated in the metadata +- [NAME].json their respective scores for every model except the relaxed models containg the string 'only_pep' in their NAME + +Modelling setup: +- Classic ColabFold setup with PDB coordinates and scores JSON and link to UniProt +- Two custom model quality scores (ipLDDT, iPAE) +- Experimental validation of binding and only validated or known interactions exported to 3D-Beacons +- Four different modelling setups (PPI screening, relaxed models of selected screening results, full holoenzyme complex, design of protein-peptide complex) + +Special features here compared to PRC-complexes script: +- UniProt-link: deal with subsets of sequences and with old versions and fixed cache of UniProt data (keyed also on entity sequence) +- Use of modelcif.reference.SeqDif to deal with mismatches between UniProt and entity sequence +- Deal with synthetic constructs (design) which do not have a link to UniProt +- Custom cut of JSON scores for PAE and pLDDT for some relaxed models +- Updated _get_cf_config function to deal with null values in ColabFold's config.json and to reset tpl_db and tpl_db_version if no templates used + +Content: +- translate2modelcif.py : script to do conversion (run in virtual environment with same setup as Docker container here but with OST 2.8 and very latest main branch of python-modelcif and python-ihm from 20.6.2024) +- modelarchive_submission.zip: example inputs to convert selected complex from this set, compressed +- expected_output_modelcif.zip: compressed output from running conversion of modelarchive_submission with the following command : +`python3 translate2modelcif.py ./modelarchive_submission ./modelcif --no-extra-files` + diff --git a/projects/2024-07-ma-osf-ppp2r2a/expected_output_modelcif.zip b/projects/2024-07-ma-osf-ppp2r2a/expected_output_modelcif.zip new file mode 100644 index 0000000000000000000000000000000000000000..f73737bf5a1d9e4e3682525ba1162edee68b5524 Binary files /dev/null and b/projects/2024-07-ma-osf-ppp2r2a/expected_output_modelcif.zip differ diff --git a/projects/2024-07-ma-osf-ppp2r2a/modelarchive_submission.zip b/projects/2024-07-ma-osf-ppp2r2a/modelarchive_submission.zip new file mode 100644 index 0000000000000000000000000000000000000000..48ec525bc0390aba8156f3754aec35224aade901 Binary files /dev/null and b/projects/2024-07-ma-osf-ppp2r2a/modelarchive_submission.zip differ diff --git a/projects/2024-07-ma-osf-ppp2r2a/translate2modelcif.py b/projects/2024-07-ma-osf-ppp2r2a/translate2modelcif.py new file mode 100644 index 0000000000000000000000000000000000000000..725d7976725b1804c136490daaef5ce1ca5d1cc0 --- /dev/null +++ b/projects/2024-07-ma-osf-ppp2r2a/translate2modelcif.py @@ -0,0 +1,1831 @@ +#! /usr/local/bin/ost +# -*- coding: utf-8 -*- + +"""Translate PP2A-B55 models for Julia from PDB + extra data into ModelCIF.""" + +# EXAMPLES for running: +# ost translate2modelcif.py ./modelarchive_submission ./modelcif + +import argparse +import datetime +import gzip +import os +import shutil +import sys +import zipfile + +from timeit import default_timer as timer +import numpy as np +import requests +import ujson as json + +import ihm +import ihm.citations +import modelcif +import modelcif.associated +import modelcif.dumper +import modelcif.model +import modelcif.protocol +import modelcif.reference + +import pandas as pd +from ost import io, seq + + +################################################################################ +# GENERAL HELPER FUNCTIONS +################################################################################ +def _abort_msg(msg, exit_code=1): + """Write error message and exit with exit_code.""" + print(f"{msg}\nAborting.", file=sys.stderr) + sys.exit(exit_code) + + +def _warn_msg(msg): + """Write a warning message to stdout.""" + print(f"WARNING: {msg}") + + +def _check_file(file_path): + """Make sure a file exists and is actually a file.""" + if not os.path.exists(file_path): + _abort_msg(f"File not found: '{file_path}'.") + if not os.path.isfile(file_path): + _abort_msg(f"File path does not point to file: '{file_path}'.") + + +def _check_folder(dir_path): + """Make sure a file exists and is actually a file.""" + if not os.path.exists(dir_path): + _abort_msg(f"Path not found: '{dir_path}'.") + if not os.path.isdir(dir_path): + _abort_msg(f"Path does not point to a directory: '{dir_path}'.") + + +def _check_opts_folder(dir_path): + """Remove trailing '/' (return fixed one) and check if path valid.""" + if dir_path.endswith("/"): + dir_path = dir_path[:-1] + _check_folder(dir_path) + return dir_path + + +def _get_res_num(r, use_auth=False): + """Get res. num. from auth. IDs if reading from mmCIF files.""" + if use_auth: + return int(r.GetStringProp("pdb_auth_resnum")) + return r.number.num + + +def _get_ch_name(ch, use_auth=False): + """Get chain name from auth. IDs if reading from mmCIF files.""" + if use_auth: + return ch.GetStringProp("pdb_auth_chain_name") + return ch.name + + +def _get_sequence(chn, use_auth=False): + """Get the sequence out of an OST chain incl. '-' for gaps in resnums.""" + # initialise (add gaps if first is not at num. 1) + lst_rn = _get_res_num(chn.residues[0], use_auth) + idx = 1 + sqe = "-" * (lst_rn - 1) + chn.residues[0].one_letter_code + + for res in chn.residues[idx:]: + lst_rn += 1 + while lst_rn != _get_res_num(res, use_auth): + sqe += "-" + lst_rn += 1 + sqe += res.one_letter_code + return sqe +################################################################################ + + +################################################################################ +# DATA HANDLING +################################################################################ +def _parse_args(): + """Parse command line arguments.""" + parser = argparse.ArgumentParser( + formatter_class=argparse.RawDescriptionHelpFormatter, + description=__doc__, + ) + + parser.add_argument( + "input_data_path", + type=str, + metavar="<INPUT DATA PATH>", + help="Data as provided by depositors. Expected to contain files " + + "Annotations.csv and Annotations.json with metadata, Config_Files " + + "directory with config_[X].json files for all X listed in Config " + + "column of Annotations.csv, and Zip_Files directory with files named " + + "[X]-[NAME].zip for each X listed in the metadata files.", + ) + parser.add_argument( + "out_dir", + type=str, + metavar="<OUTPUT DIR>", + help="Path to directory to store results ([X]-[NAME].* files and " + + "issues.json with any observed issues).", + ) + parser.add_argument( + "--compress", + default=False, + action="store_true", + help="Compress ModelCIF file with gzip.", + ) + parser.add_argument( + "--checks-only", + default=False, + action="store_true", + help="Perform only checks without producing ModelCIF files.", + ) + parser.add_argument( + "--no-extra-files", + default=False, + action="store_true", + help="Skip writing extra models, PNGs, and PAE (for testing).", + ) + parser.add_argument( + "--single-model", + type=str, + #metavar="<PDB WEB PATH>", + default=None, + help="If provided, only the model matching the provided string in the " + + "Annotations.json will be converted.", + ) + + opts = parser.parse_args() + + # check input + opts.input_data_path = _check_opts_folder(opts.input_data_path) + _check_file(os.path.join(opts.input_data_path, "info_of_submitted_structures.csv")) + _check_folder(os.path.join(opts.input_data_path, "screen_256")) + if opts.out_dir.endswith("/"): + opts.out_dir = opts.out_dir[:-1] + if not os.path.exists(opts.out_dir): + os.makedirs(opts.out_dir, exist_ok=True) + return opts + + +def _get_audit_authors(): + """Return the list of authors that produced this model.""" + return ( + "Schueler-Furman, Ora", + "Varga, Julia Kornelia", + ) + + +def _check_scores(mdl_data, metadata): + """Check scores JSON. + Bad issues raise exceptions, minor ones are in returned list + (compatible with list returned by _get_entities) + """ + issues = [] + scores_json = mdl_data["scores"] + # NOTE: cannot deal with gapped sequences here as we cannot map + # multiple chains to scores + ost_ent = mdl_data["ent"] + exp_len = ost_ent.residue_count + assert "ptm" in scores_json + assert "iptm" in scores_json + assert len(scores_json["pae"]) == exp_len + assert len(scores_json["pae"][0]) == exp_len + # b-factor vs pLDDT in expected range? + ent_plddts = [] + for i, res in enumerate(ost_ent.residues): + b_factors = [a.b_factor for a in res.atoms] + assert len(set(b_factors)) == 1 # must all be equal! + ent_plddts.append(b_factors[0]) + scores_plddts = scores_json["plddt"] + assert len(ent_plddts) == len(scores_plddts) + plddt_max_diff = max([ + abs(s1 - s2) for s1, s2 in zip(ent_plddts, scores_plddts) + ]) + # threshold due to 0.01 accuracy in PDB file + numerical rounding + if plddt_max_diff > 0.0051: + issues.append(( + metadata['mdl_id'], + "plddt_vs_bf_mismatch", + (plddt_max_diff), + () + )) + return issues + + +def _get_n_parse_up_entry(up_ac, up_txt_path): + """Get data for an UniProtKB entry and parse it.""" + # This is a simple parser for UniProtKB txt format, instead of breaking it + # up into multiple functions, we just allow many many branches & statements, + # here. + # pylint: disable=too-many-branches,too-many-statements + data = {} + data["up_organism"] = "" + data["up_sequence"] = "" + data["up_ac"] = up_ac + # check if we read from file or URL + if up_txt_path.startswith("http"): + rspns = requests.get(up_txt_path, timeout=180) + lines = rspns.iter_lines(decode_unicode=True) + else: + lines = open(up_txt_path).readlines() + for line_ in lines: + # need to strip trailing characters if reading from file (doesn't hurt) + line = line_.rstrip() + if line.startswith("ID "): + sline = line.split() + if len(sline) != 5: + raise RuntimeError(f"Unusual UniProtKB ID line found:\n" \ + f"'{line}'") + data["up_id"] = sline[1] + elif line.startswith("OX NCBI_TaxID="): + # Following strictly the UniProtKB format: 'OX NCBI_TaxID=<ID>;' + data["up_ncbi_taxid"] = line[len("OX NCBI_TaxID=") : -1] + data["up_ncbi_taxid"] = data["up_ncbi_taxid"].split("{")[0].strip() + elif line.startswith("OS "): + # multiple lines possible; last one ends in "." + if line[-1] == ".": + data["up_organism"] += line[len("OS ") : -1] + else: + data["up_organism"] += line[len("OS ") :] + " " + elif line.startswith("SQ "): + sline = line.split() + if len(sline) != 8: + raise RuntimeError(f"Unusual UniProtKB SQ line found:\n" \ + f"'{line}'") + data["up_seqlen"] = int(sline[2]) + data["up_crc64"] = sline[6] + elif line.startswith(" "): + sline = line.split() + if len(sline) > 6: + raise RuntimeError( + "Unusual UniProtKB sequence data line " + + f"found:\n'{line}'" + ) + data["up_sequence"] += "".join(sline) + elif line.startswith("DT "): + dt_flds = line[len("DT ") :].split(", ") + if dt_flds[1].upper().startswith("SEQUENCE VERSION "): + data["up_last_mod"] = datetime.datetime.strptime( + dt_flds[0], "%d-%b-%Y" + ) + elif dt_flds[1].upper().startswith("ENTRY VERSION "): + data["up_entry_version"] = dt_flds[1][len("ENTRY VERSION ") :] + if data["up_entry_version"][-1] == ".": + data["up_entry_version"] = data["up_entry_version"][:-1] + data["up_entry_version"] = int(data["up_entry_version"]) + elif line.startswith("GN Name="): + data["up_gn"] = line[len("GN Name=") :].split(";")[0] + data["up_gn"] = data["up_gn"].split("{")[0].strip() + + # in UP isoforms are identified in the AC so no need for this... + # -> in PDB (e.g. 8TRE), we see unset _struct_ref.pdbx_db_isoform in such cases + data["up_isoform"] = None + + # NOTE: no gene names in this set (use provided names instead) + if "up_gn" not in data: + _warn_msg( + f"No gene name found for UniProtKB entry '{up_ac}', using " + + "UniProtKB AC instead." + ) + data["up_gn"] = up_ac + if "up_last_mod" not in data: + raise RuntimeError(f"No sequence version found for UniProtKB entry " \ + f"'{up_ac}'.") + if "up_crc64" not in data: + raise RuntimeError(f"No CRC64 value found for UniProtKB entry " \ + f"'{up_ac}'.") + if len(data["up_sequence"]) == 0: + raise RuntimeError(f"No sequence found for UniProtKB entry '{up_ac}'.") + # check that sequence length and CRC64 is correct + if data["up_seqlen"] != len(data["up_sequence"]): + raise RuntimeError( + "Sequence length of SQ line and sequence data differ for " + + f"UniProtKB entry '{up_ac}': {data['up_seqlen']} != " + + f"{len(data['up_sequence'])}" + ) + if "up_id" not in data: + raise RuntimeError(f"No ID found for UniProtKB entry '{up_ac}'.") + if "up_ncbi_taxid" not in data: + raise RuntimeError(f"No NCBI taxonomy ID found for UniProtKB entry " + f"'{up_ac}'.") + if len(data["up_organism"]) == 0: + raise RuntimeError(f"No organism species found for UniProtKB entry " + f"'{up_ac}'.") + return data + + +def _fetch_upkb_entry(up_ac): + """Get an UniProtKB entry.""" + return _get_n_parse_up_entry( + up_ac, f"https://rest.uniprot.org/uniprotkb/{up_ac}.txt" + ) + + +def _fetch_unisave_entry(up_ac, version): + """Get an UniSave entry, in contrast to an UniProtKB entry, that allows us + to specify a version.""" + return _get_n_parse_up_entry( + up_ac, + f"https://rest.uniprot.org/unisave/{up_ac}?format=txt&" + + f"versions={version}", + ) + + +# for cache below +upkb_entry_cache = {} # key = (up_ac, up_version, mdl_sequence) +def _fetch_upkb_cached(sqe, up_ac, up_version=None): + """Get best matching UniProtKB entry for sequence sqe. + Get it from cache if already fetched. + up_version defines starting version in UP to check. + Note that the returned UP entry may be a different version than the one in up_version. + Returned UP data includes result of _align_sequences. + If no perfect match found, it prints a warning. + """ + # check if in cache already + cache_key = (up_ac, up_version, sqe) + if cache_key in upkb_entry_cache: + return upkb_entry_cache[cache_key] + # fetch and parse first guess + if up_version is None: + up_data = _fetch_upkb_entry(up_ac) + else: + up_data = _fetch_unisave_entry(up_ac, up_version) + # find best match starting from first guess + min_up_data = None + while True: + mismatches, up_range, mdl_range, covered_aln, mdl_seqres = _align_sequences( + sqe, up_data["up_sequence"], atomseq_aln=False) + + if min_up_data is None or \ + len(mismatches) < len(min_up_data["mismatches"]): + min_up_data = up_data + min_up_data["mismatches"] = mismatches + min_up_data["up_range"] = up_range + min_up_data["mdl_range"] = mdl_range + min_up_data["covered_aln"] = covered_aln + min_up_data["mdl_seqres"] = mdl_seqres + + if len(mismatches) == 0: + # found hit; done + break + # fetch next one (skip if exceptions happen) + next_v = up_data["up_entry_version"] - 1 + while next_v > 0: + try: + # note: can fail to parse very old UP versions... + up_data = _fetch_unisave_entry(up_ac, next_v) + # can move on if no exception happened + break + except RuntimeError as ex: + # _warn_msg(f"Error in parsing v{next_v} of {up_ac}:\n{ex}") + # try next one + next_v -= 1 + if next_v == 0: + # warn user about failure to find match and abort + min_mismatches = min_up_data["mismatches"] + msg = f"Sequences not equal from file: {sqe}, from UniProtKB: " \ + f"{min_up_data['up_sequence']} ({up_ac}), checked entire " \ + f"entry history and best match had following mismatches " \ + f"in v{min_up_data['up_entry_version']} (range " \ + f"{min_up_data['up_range']}): {min_up_data['mismatches']}." + _warn_msg(msg) + # raise RuntimeError(msg) + break + + # keep in cache + upkb_entry_cache[cache_key] = up_data + return min_up_data + + +def _align_sequences(mdl_sqe, ref_sqe, atomseq_aln=True, ref_fixes=[], + gapped_aa="XOUBJZ"): + """Compare sequence while paying attention on non-standard amino acids. + + Can pass list of tuples for OLCs expected to be changed between ref and mdl. + E.g. Jason set would have ref_fixes=[('B', 'D'), ('J', 'L'), ('Z', 'E')]. + Non-def. AA (listed in gapped_aa) in ref_sqe are assumed to be gaps (-) in + mdl_sqe (this is def. in CF/AF for "XOUBJZ"). + + Returns (mismatches, ref_range, mdl_range, covered_aln, mdl_seqres): + - mismatches = list of (ref_pos, mdl_pos, olc_ref, olc_mdl) + (positions are 1-indexed, None if gap and only if in range) + - ref_range / mdl_range = (start, end) tuples with 1-indexed positions of + start and end of covered range (mdl_range with respect to mdl_sqe!). + Extra non-covered residues in mdl or ref can be counted by comparing + ref_range / mdl_range with lengths of ref_sqe / mdl_sqe. + - covered_aln = alignment (seq. 0 = REF, seq. 1 = MDL) within covered range + (i.e. first and last column have no gaps). If atomseq_aln is True, the + alignment only includes non-gap residues of mdl_sqe. OST seq. offsets are + set with respect to mdl_sqe, ref_sqe (0-indexed). Note that offsets are + only guaranteed to fit ref_range / mdl_range if atomseq_aln is False. + - mdl_seqres = mdl_sqe with gaps (-) replaced with seq. from ref. if + non-def-AA there or with X otherwise (i.e. both have same length!). + Here guaranteed for mdl_seqres to match mdl_sqe if AA in gapped_aa and X + are replaced by gaps (-). + """ + # add fixes if needed + ref_sqe_fixed = ref_sqe + for olc1, olc2 in ref_fixes: + ref_sqe_fixed = ref_sqe_fixed.replace(olc1, olc2) + # put gaps for parts not modelled by AF2 (i.e. any non-def-AA) + ref_sqe_fixed_gapped = ref_sqe_fixed + for olc in gapped_aa: + assert olc not in mdl_sqe + ref_sqe_fixed_gapped = ref_sqe_fixed_gapped.replace(olc, '-') + # easy and preferred case: mdl_sqe is subset of ref_sqe + ref_idx = ref_sqe_fixed_gapped.find(mdl_sqe) + if ref_idx >= 0: + mismatches = [] + ref_range = (ref_idx + 1, ref_idx + len(mdl_sqe)) + mdl_range = (1, len(mdl_sqe)) + mdl_seqres = ref_sqe_fixed[ref_idx : ref_idx + len(mdl_sqe)] + # we handle covered_aln afterwards... + aln_s_ref = ref_sqe[ref_idx : ref_idx + len(mdl_sqe)] + aln_s_mdl = mdl_seqres + else: + # align and report mismatches + ref_seq = seq.CreateSequence("REF", ref_sqe_fixed) + # use X as first guess for gaps in model + mdl_seq = seq.CreateSequence("MDL", mdl_sqe.replace('-', 'x')) + aln = seq.alg.SemiGlobalAlign(ref_seq, mdl_seq, seq.alg.BLOSUM62)[0] + # get range + aligned_indices = [i for i, c in enumerate(aln) \ + if c[0] != '-' and c[1] != '-'] + ref_range = ( + aln.GetResidueIndex(0, aligned_indices[0]) + 1, + aln.GetResidueIndex(0, aligned_indices[-1]) + 1, + ) + mdl_range = ( + aln.GetResidueIndex(1, aligned_indices[0]) + 1, + aln.GetResidueIndex(1, aligned_indices[-1]) + 1, + ) + # build up strings as we go + aln_s_ref = "" + aln_s_mdl = "" + mdl_seqres = "" + # collect mismatches and fix seqs as we go + mismatches = [] + for idx, (olc_ref, olc_mdl) in enumerate(aln): + # fix seqres as needed + if olc_mdl == 'x' and olc_ref in gapped_aa: + olc_mdl = olc_ref + if olc_mdl != '-': + mdl_seqres += olc_mdl + if idx >= aligned_indices[0] and idx <= aligned_indices[-1]: + # fill aln_s_x as needed + if olc_ref != '-': + # must fetch from ref_sqe + ref_idx = aln.GetResidueIndex(0, idx) + aln_s_ref += ref_sqe[ref_idx] + ref_pos = ref_idx + 1 + else: + aln_s_ref += '-' + ref_pos = None + if olc_mdl != '-': + # fetch from mdl_seqres here + # (revert back to mdl_sqe afterwards) + mdl_idx = aln.GetResidueIndex(1, idx) + aln_s_mdl += mdl_seqres[mdl_idx] + mdl_pos = mdl_idx + 1 + else: + aln_s_mdl += '-' + mdl_pos = None + if olc_ref != olc_mdl: + mismatches.append((ref_pos, mdl_pos, olc_ref, olc_mdl)) + # fix remaining x in mdl_seqres + mdl_seqres = mdl_seqres.replace('x', 'X') + # create covered_aln + s_ref_offset = ref_range[0] - 1 + s_mdl_offset = mdl_range[0] - 1 + covered_aln = seq.CreateAlignment( + seq.CreateSequence("REF", aln_s_ref), + seq.CreateSequence("MDL", aln_s_mdl.replace('x', 'X')) + ) + # cut it once again if needed (only for atomseq_aln) + if atomseq_aln: + # revert + new_cols = [ + ( + olc_ref, + '-' if olc_mdl == 'x' or olc_mdl in gapped_aa else olc_mdl + ) for olc_ref, olc_mdl in zip(aln_s_ref, aln_s_mdl) + ] + aligned_indices = [i for i, c in enumerate(new_cols) \ + if c[0] != '-' and c[1] != '-'] + s_ref_offset += covered_aln.GetResidueIndex(0, aligned_indices[0]) + s_mdl_offset += covered_aln.GetResidueIndex(1, aligned_indices[0]) + cut_cols = new_cols[aligned_indices[0]:aligned_indices[-1]+1] + aln_s_ref = "".join([olc_ref for olc_ref, _ in cut_cols]) + aln_s_mdl = "".join([olc_mdl for _, olc_mdl in cut_cols]) + covered_aln = seq.CreateAlignment( + seq.CreateSequence("REF", aln_s_ref), + seq.CreateSequence("MDL", aln_s_mdl) + ) + covered_aln.SetSequenceOffset(0, s_ref_offset) + covered_aln.SetSequenceOffset(1, s_mdl_offset) + # check post assertions (as in docstring) + assert ref_sqe[covered_aln.GetSequenceOffset(0):]\ + .startswith(covered_aln.sequences[0].gapless_string) + if atomseq_aln: + assert mdl_sqe[covered_aln.GetSequenceOffset(1)] \ + == covered_aln.sequences[1].gapless_string[0] + assert mdl_sqe[covered_aln.GetSequenceOffset(1):].replace('-', '')\ + .startswith(covered_aln.sequences[1].gapless_string) + else: + assert covered_aln.sequences[0].gapless_string \ + == ref_sqe[ref_range[0]-1:ref_range[1]] + assert covered_aln.sequences[1].gapless_string \ + == mdl_seqres[mdl_range[0]-1:mdl_range[1]] + assert mdl_seqres[covered_aln.GetSequenceOffset(1):]\ + .startswith(covered_aln.sequences[1].gapless_string) + assert len(mdl_seqres) == len(mdl_sqe) + mdl_sqe_check = mdl_seqres.replace('X', '-') + for olc in gapped_aa: + mdl_sqe_check = mdl_sqe_check.replace(olc, '-') + assert mdl_sqe_check == mdl_sqe.replace('X', '-') + # + return mismatches, ref_range, mdl_range, covered_aln, mdl_seqres + + +def _get_entities(mdl_data, metadata): + """Gather data for the mmCIF (target) entities. + Returns (list of cif_ents, list of issues) + """ + # merge info for matching chains + unique_chains = {} # key = sqe_gaps, value = partial cif_ent + chain_info = {ch["chain"]: { + "up_ac": ch["up_ac"], "up_range": ch["up_range"], "is_synthetic_construct": ch["is_synthetic_construct"] + } for ch in metadata["chains"]} + ost_ent = mdl_data["ent"] + for chn in ost_ent.chains: + pdb_chain_id = _get_ch_name(chn, False) + if pdb_chain_id not in chain_info: + raise RuntimeError( + f"Non-described chain {pdb_chain_id} in " \ + f"{metadata['mdl_id']}" + ) + sqe_gaps = _get_sequence(chn) + cif_ent = { + "pdb_sequence": sqe_gaps, + "pdb_chain_ids": [_get_ch_name(chn, False)], + "up_ac": chain_info[pdb_chain_id]["up_ac"], + # expected up range as parsed in metadata + "exp_up_range": chain_info[pdb_chain_id]["up_range"], + "is_synthetic_construct": chain_info[pdb_chain_id]["is_synthetic_construct"], + } + if sqe_gaps in unique_chains: + other_cif_ent = unique_chains[sqe_gaps] + # sanity checks + for key, value in other_cif_ent.items(): + if key != "pdb_chain_ids" and value != cif_ent[key]: + raise RuntimeError( + f"Inconsistent info {key} for identical chains for " \ + f"chain {pdb_chain_id} vs chains " \ + f"{other_cif_ent['pdb_chain_ids']}." + ) + # add to list of chains + other_cif_ent['pdb_chain_ids'].append(pdb_chain_id) + else: + unique_chains[sqe_gaps] = cif_ent + # sort by model chain name (should ensure same order of chains in mmCIF) + entities = sorted( + unique_chains.values(), + key=lambda x: min(x["pdb_chain_ids"]) + ) + # compare with info from UP and complete data to return + issues = [] + for cif_ent in entities: + sqe_gaps = cif_ent["pdb_sequence"] + if cif_ent["is_synthetic_construct"]: + cif_ent["seqres"] = sqe_gaps + cif_ent["description"] = f"Synthetic construct" + continue + up_ac = cif_ent["up_ac"] + up_data = _fetch_upkb_cached(sqe_gaps, up_ac) + + num_extra_ref = len(up_data["up_sequence"]) - (up_data["up_range"][1] - up_data["up_range"][0] + 1) + len_mdl_covered = (up_data["mdl_range"][1] - up_data["mdl_range"][0] + 1) + num_extra_mdl = len(sqe_gaps) - len_mdl_covered + if len(up_data["mismatches"]) > 0 or num_extra_ref > 0 or num_extra_mdl > 0: + # ok to cover subset of UP usually (e.g. Ubiquitin), rest big issue + if len(up_data["mismatches"]) > 0 or num_extra_mdl > 0: + issue_type = "up_mismatch" + else: + issue_type = "up_extra" + + if cif_ent['exp_up_range'] == None: + cif_up_range = (1, len(up_data["up_sequence"])) + else: + cif_up_range = tuple( + map(int, cif_ent['exp_up_range'].split('-'))) + if (issue_type == "up_extra" and up_data["up_range"] != cif_up_range) or issue_type == "up_mismatch": + chain_names = ",".join(cif_ent["pdb_chain_ids"]) + short_data = ( + mdl_data['mdl_name'], chain_names, up_ac, + len_mdl_covered, len(up_data["mismatches"]), num_extra_ref, num_extra_mdl + ) + long_data = (up_data["mismatches"], up_data["up_range"], up_data["mdl_range"]) + issues.append( + (metadata['mdl_id'], issue_type, short_data, long_data) + ) + # cannot deal with gapped sequences here as we cannot map to scores + if sqe_gaps != up_data["mdl_seqres"]: + issues.append(( + metadata['mdl_id'], + "gapped_seq", + (cif_ent['pdb_chain_ids']), + (sqe_gaps, up_data["mdl_seqres"]) + )) + cif_ent["seqres"] = up_data["mdl_seqres"] + cif_ent["description"] = f"{up_data['up_organism']} {up_data['up_gn']} ({up_data['up_ac']})" + cif_ent.update(up_data) + return entities, issues + + +def _get_cf_config(cf_config, ur30_db_version=None, tpl_db=None, + tpl_db_version=None): + """Define ColabFold setup. + Extra info needed from depositor for DBs used (depend on MMseqs2 server) + - ur30_db_version options: see dict in _get_sequence_dbs + - tpl_db options: None, "PDB70", "PDB100" + - tpl_db_version options: see dict in _get_sequence_dbs + -> can be set to None if DB not used at all (incl. custom tpls) + Note on versions used over time + - first: 2021_03 version of UniRef30, unclear what PDB70 + - after 13.7.22: updated the UniRef30 to 2022_02 and PDB70 to 220313 + - after 12.6.23: UniRef30 2023_02, PDB100 (instead of PDB70) 230517 + - also to define if DB used at all for tpls or custom tpls + - db versions only relevant if actually used + """ + # NOTES: + # - UP-TO-DATE (as of March 2024) generic parser given a config.json dict + # - custom MSA is assumed to be complemented with extra step (as for Jason) + + # keep version indep. of params (and add commit since versions are meh) + cf_version = cf_config["version"] + if "commit" in cf_config and cf_config["commit"] is not None: + cf_version += f" ({cf_config['commit'][:7]})" + # drop fields which are not relevant for model building + cf_config = cf_config.copy() + for key in ["num_queries", "commit", "version", "user_agent"]: + if key in cf_config: + del cf_config[key] + + # NOTE: following code from + # https://github.com/sokrypton/ColabFold/blob/main/colabfold/batch.py to + # understand config + # -> should be backward compatible with Tara and Niko sets + # -> see also https://github.com/sokrypton/ColabFold/wiki/v1.5.0 + + # deal with old names (some settings changed name in v1.5) + # -> code taken almost verbatim from https://github.com/sokrypton/ColabFold + old_names = {"MMseqs2 (UniRef+Environmental)": "mmseqs2_uniref_env", + "MMseqs2 (UniRef only)": "mmseqs2_uniref", + "unpaired+paired": "unpaired_paired", + "AlphaFold2-multimer-v1": "alphafold2_multimer_v1", + "AlphaFold2-multimer-v2": "alphafold2_multimer_v2", + "AlphaFold2-multimer-v3": "alphafold2_multimer_v3", + "AlphaFold2-ptm": "alphafold2_ptm", + "AlphaFold2": "alphafold2"} + msa_mode = old_names.get(cf_config["msa_mode"], cf_config["msa_mode"]) + if "pair_mode" in cf_config: + pair_mode = old_names.get(cf_config["pair_mode"], cf_config["pair_mode"]) + model_type = old_names.get(cf_config["model_type"], cf_config["model_type"]) + + # fix v1.5 defaults for num_recycles and recycle_early_stop_tolerance + # -> def. (set as "null" in config): + # - num_recycles == 20 if alphafold2_multimer_v3 else 3 + # - recycle_early_stop_tolerance == 0.5 if multimer else 0.0 + # -> valid from 1.5.0 until 1.5.5 (and probably later) + # -> defined in alphafold/model/config.py of steineggerlab/alphafold repo + if "num_recycles" in cf_config and cf_config["num_recycles"] is None: + if "multimer" in model_type and model_type not in [ + "alphafold2_multimer_v1", "alphafold2_multimer_v2" + ]: + cf_config["num_recycles"] = 20 + else: + cf_config["num_recycles"] = 3 + if "recycle_early_stop_tolerance" in cf_config \ + and cf_config["recycle_early_stop_tolerance"] is None: + cf_config["recycle_early_stop_tolerance"] = \ + 0.5 if "multimer" in model_type else 0.0 + + # remove null config entries (ASSUME: None = use default) + cf_config = {k: v for k, v in cf_config.items() if v is not None} + + # fetch relevant data + # -> MSA mode + if msa_mode == "mmseqs2_uniref_env": + seq_dbs = ["UniRef", "Environmental"] + use_mmseqs = True + use_msa = True + elif msa_mode == "mmseqs2_uniref": + seq_dbs = ["UniRef"] + use_mmseqs = True + use_msa = True + elif msa_mode == "single_sequence": + seq_dbs = [] + use_mmseqs = False + use_msa = False + elif msa_mode == "custom": + seq_dbs = [] + use_mmseqs = False + use_msa = True + else: + raise ValueError(f"Unknown msa_mode {cf_config['msa_mode']}") + + # -> model type + if model_type == "alphafold2_multimer_v1": + # AF-Multimer as introduced in AlphaFold v2.1.0 + use_multimer = True + multimer_version = 1 + elif model_type == "alphafold2_multimer_v2": + # AF-Multimer as introduced in AlphaFold v2.2.0 + use_multimer = True + multimer_version = 2 + elif model_type == "alphafold2_multimer_v3": + # AF-Multimer as introduced in AlphaFold v2.3.0 + use_multimer = True + multimer_version = 3 + elif model_type == "alphafold2_ptm": + use_multimer = False + multimer_version = None + else: + raise ValueError(f"Unknown model_type {cf_config['model_type']}") + + # write modeling description + mdl_description = f"Model generated using ColabFold v{cf_version}" + if use_multimer: + mdl_description += f" with AlphaFold-Multimer (v{multimer_version})" + else: + mdl_description += " with AlphaFold" + # early stopping feature of ColabFold + upto_mdl = "" + upto_rec = "" + if cf_config.get("stop_at_score", 100) < 100: + upto_mdl = "up to " + upto_rec = "up to " + if cf_config.get("recycle_early_stop_tolerance", 0) > 0: + upto_rec = "up to " + if cf_config.get("num_seeds", 1) > 1: + mdl_str = f"{cf_config['num_models'] * cf_config['num_seeds']} " \ + f"models ({cf_config['num_seeds']} random seeds per " \ + f"parameter set)" + else: + mdl_str = f"{cf_config['num_models']} models" + mdl_description += f" producing {upto_mdl}{mdl_str} with {upto_rec}" \ + f"{cf_config['num_recycles']} recycles each" + if cf_config.get("use_amber", False) or \ + cf_config.get("num_relax", 0) > 0: + mdl_description += ", with AMBER relaxation" + else: + mdl_description += ", without model relaxation" + if cf_config["use_templates"]: + # tpl_db == None meant to mean that custom templates were used + # -> no need to stress it but just visible in search DBs + mdl_description += ", using templates" + else: + mdl_description += ", without templates" + tpl_db = None + tpl_db_version = None + if cf_config["rank_by"] == "plddt": + mdl_description += ", ranked by pLDDT" + elif cf_config["rank_by"] == "ptmscore": + mdl_description += ", ranked by pTM" + elif cf_config["rank_by"] == "multimer": + mdl_description += ", ranked by 80*ipTM+20*pTM" + else: + raise ValueError(f"Unknown rank_by {cf_config['rank_by']}") + if use_msa: + mdl_description += ", starting from" + if use_mmseqs: + msa_type = "MSA" + else: + msa_type = "custom MSA" + if use_multimer: + if pair_mode == "unpaired_paired": + mdl_description += f" paired and unpaired {msa_type}s" + elif pair_mode == "paired": + mdl_description += f" paired {msa_type}s" + elif pair_mode == "unpaired": + mdl_description += f" unpaired {msa_type}s" + else: + raise ValueError(f"Unknown pair_mode {cf_config['pair_mode']}") + elif msa_type.startswith('M'): + mdl_description += f" an {msa_type}" + else: + mdl_description += f" a {msa_type}" + if use_mmseqs: + mdl_description += f" from MMseqs2 ({'+'.join(seq_dbs)})" + else: + mdl_description += " without an MSA" + mdl_description += "." + + return { + "params": cf_config, + "version": cf_version, + "seq_dbs": seq_dbs, + "use_mmseqs": use_mmseqs, + "use_msa": use_msa, + "ur30_db_version": ur30_db_version, + "tpl_db": tpl_db, + "tpl_db_version": tpl_db_version, + "use_multimer": use_multimer, + "multimer_version": multimer_version, + "description": mdl_description, + } + + +def _get_mmseqs2_software(version=None): + """Get MMseqs2 as a dictionary, suitable to create a modelcif software + object.""" + return { + "name": "MMseqs2", + "classification": "data collection", + "description": "Many-against-Many sequence searching", + "citation": ihm.citations.mmseqs2, + "location": "https://github.com/soedinglab/mmseqs2", + "type": "package", + "version": version, + } + + +def _get_colabfold_software(version=None): + """Get ColabFold as a dictionary, suitable to create a modelcif software + object.""" + return { + "name": "ColabFold", + "classification": "model building", + "description": "Structure prediction", + "citation": ihm.citations.colabfold, + "location": "https://github.com/sokrypton/ColabFold", + "type": "package", + "version": version, + } + + +def _get_af2_software(version=None, is_multimer=False): + """Get AF2 as dictionary, suitable to create a modelcif software object.""" + if is_multimer: + return { + "name": "AlphaFold-Multimer", + "classification": "model building", + "description": "Structure prediction", + "citation": ihm.Citation( + pmid=None, + title="Protein complex prediction with " + + "AlphaFold-Multimer.", + journal="bioRxiv", + volume=None, + page_range=None, + year=2021, + authors=[ + "Evans, R.", + "O'Neill, M.", + "Pritzel, A.", + "Antropova, N.", + "Senior, A.", + "Green, T.", + "Zidek, A.", + "Bates, R.", + "Blackwell, S.", + "Yim, J.", + "Ronneberger, O.", + "Bodenstein, S.", + "Zielinski, M.", + "Bridgland, A.", + "Potapenko, A.", + "Cowie, A.", + "Tunyasuvunakool, K.", + "Jain, R.", + "Clancy, E.", + "Kohli, P.", + "Jumper, J.", + "Hassabis, D.", + ], + doi="10.1101/2021.10.04.463034", + ), + "location": "https://github.com/deepmind/alphafold", + "type": "package", + "version": version, + } + else: + return { + "name": "AlphaFold", + "classification": "model building", + "description": "Structure prediction", + "citation": ihm.citations.alphafold2, + "location": "https://github.com/deepmind/alphafold", + "type": "package", + "version": version, + } + + +def _get_protocol_steps_and_software(cf_config, model_selection_step=False, binding_test_result=None, refinement=None): + """Create the list of protocol steps with software and parameters used.""" + protocol = [] + + # build up SW + sw_plus_params = [ + ( + _get_colabfold_software(cf_config["version"]), cf_config["params"] + ) + ] + if cf_config["use_mmseqs"]: + sw_plus_params.append((_get_mmseqs2_software(), {})) + sw_plus_params.append(( + _get_af2_software(is_multimer=cf_config["use_multimer"]), {} + )) + + # modelling step + protocol.append({ + "method_type": "modeling", + "name": None, + "details": cf_config["description"], + "input": "target_sequences_and_ref_DBs", + "output": "model", + "software_plus_params": sw_plus_params, + }) + + # model selection step + if model_selection_step: + step = { + "method_type": "model selection", + "name": None, + "details": "Best model selected according to average interface pLDDT.", + } + step["input"] = "model" + step["output"] = "model" + step["software_plus_params"] = {} + protocol.append(step) + + if binding_test_result == 'nan' or (isinstance(binding_test_result, float) and np.isnan(binding_test_result)): + pass + elif binding_test_result == 'yes': + protocol.append({ + "method_type": "other", + "name": None, + "details": "Experimental validation was performed and showed that two proteins are binding", + "input": "model", + "output": "model", + "software_plus_params": {} + }) + + elif binding_test_result == 'no': + protocol.append({ + "method_type": "other", + "name": None, + "details": "Experimental validation was performed and showed that two proteins are not binding", + "input": "model", + "output": "model", + "software_plus_params": {} + }) + else: + raise RuntimeError(f"Invalid result for binding experimental validation: " + f"{binding_test_result}") + + if refinement is not None: + if refinement == "cropped_and_relax": + refinement_details = "Selected full-length model cropped to the interface and relaxed with AMBER, using ColabFold's default protocol." + elif refinement == "relax": + refinement_details = "Selected model relaxed with AMBER, using ColabFold's default protocol." + else: + raise RuntimeError( + "Unexpect protocol step for model refinement: " + + f"'{refinement}'" + ) + protocol.append({ + "method_type": "other", + "name": None, + "details": refinement_details, + "input": "model", + "output": "model", + "software_plus_params": {} + }) + return protocol + + +def _get_title(metadata): + """Get a title for this modelling experiment.""" + return metadata["title"].strip() + + +def _get_model_details(metadata): + """Get the model description.""" + return metadata["abstract"].strip() +################################################################################ + + +################################################################################ +# ModelCIF HANDLING +################################################################################ +# pylint: disable=too-few-public-methods +class _GlobalPTM(modelcif.qa_metric.Global, modelcif.qa_metric.PTM): + """Predicted accuracy according to the TM-score score in [0,1]""" + + name = "pTM" + software = None + + +class _GlobalIpTM(modelcif.qa_metric.Global, modelcif.qa_metric.IpTM): + """Predicted protein-protein interface score based on TM-score in [0,1]""" + + name = "ipTM" + software = None + + +class _GlobalPLDDT(modelcif.qa_metric.Global, modelcif.qa_metric.PLDDT): + """Predicted accuracy according to the CA-only lDDT in [0,100]""" + name = "pLDDT" + software = None + + +class _GlobalIpLDDT(modelcif.qa_metric.Global, modelcif.qa_metric.PLDDT): + """Average pLDDT for interface residues of binding partner of B55""" + name = "average interface pLDDT" + software = None + + +class _GlobalIPAE(modelcif.qa_metric.Global, modelcif.qa_metric.PAE): + """Median PAE for interface residues between peptide (rows) and receptor (columns)""" + name = "median interface PAE" + software = None + + +class _LocalPLDDT(modelcif.qa_metric.Local, modelcif.qa_metric.PLDDT): + """Predicted accuracy according to the CA-only lDDT in [0,100]""" + name = "pLDDT" + software = None + + +class _LocalPairwisePAE(modelcif.qa_metric.LocalPairwise, modelcif.qa_metric.PAE): + """Predicted aligned error (in Angstroms)""" + name = "PAE" + software = None + + +class _LPeptideAlphabetWithXO(ihm.LPeptideAlphabet): + """Have the default amino acid alphabet plus 'X' for unknown residues + and 'O' as allowed non-def. AA (U already in alphabet).""" + + # extra entry added according to LPeptideAlphabet def. in + # https://python-ihm.readthedocs.io/en/latest/_modules/ihm.html + # and https://files.rcsb.org/view/1NTH.cif for values for 'O'. + + def __init__(self): + """Create the alphabet.""" + super().__init__() + self._comps["X"] = self._comps["UNK"] + self._comps['O'] = ihm.LPeptideChemComp( + "PYL", "O", "O", "PYRROLYSINE", "C12 H21 N3 O3" + ) +# pylint: enable=too-few-public-methods + + +class _OST2ModelCIF(modelcif.model.AbInitioModel): + """Map OST entity elements to ihm.model""" + + def __init__(self, *args, **kwargs): + """Initialise a model""" + for i in ["ost_entity", "asym", "scores_json", "incl_pae"]: + if i not in kwargs: + raise TypeError(f"Required keyword argument '{i}' not found.") + self.ost_entity = kwargs.pop("ost_entity") + self.asym = kwargs.pop("asym") + self.scores_json = kwargs.pop("scores_json") + self.incl_pae = kwargs.pop("incl_pae") + + # use auth IDs for res. nums and chain names + self.use_auth = False + # what accuracy to use for PAE? (writer uses 3 anyway) + self.pae_digits = 3 + + super().__init__(*args, **kwargs) + + def get_atoms(self): + # ToDo [internal]: Take B-factor out since its not a B-factor? + # NOTE: this assumes that _get_res_num maps residue to pos. in seqres + # within asym + for atm in self.ost_entity.atoms: + yield modelcif.model.Atom( + asym_unit=self.asym[_get_ch_name(atm.chain, self.use_auth)], + seq_id=_get_res_num(atm.residue, self.use_auth), + atom_id=atm.name, + type_symbol=atm.element, + x=atm.pos[0], + y=atm.pos[1], + z=atm.pos[2], + het=atm.is_hetatom, + biso=atm.b_factor, + occupancy=atm.occupancy, + ) + + def add_scores(self): + """Add QA metrics from AF2 scores.""" + # global scores + self.qa_metrics.extend( + ( + _GlobalPLDDT(np.mean(self.scores_json["plddt"])), + ) + ) + if self.scores_json["ptm"] != None and self.scores_json["ptm"] != "None": + self.qa_metrics.extend( + ( + _GlobalPTM(self.scores_json["ptm"]), + ) + ) + if self.scores_json["iptm"] != None and self.scores_json["iptm"] != "None": + self.qa_metrics.extend( + ( + _GlobalIpTM(self.scores_json["iptm"]), + ) + ) + if self.scores_json["ipLDDT"] != None and self.scores_json["ipLDDT"] != "None": + self.qa_metrics.extend( + ( + _GlobalIpLDDT(self.scores_json["ipLDDT"]), + ) + ) + if self.scores_json["iPAE"] != None and self.scores_json["iPAE"] != "None": + self.qa_metrics.extend( + ( + _GlobalIPAE(self.scores_json["iPAE"]), + ) + ) + + # NOTE: none of the below expected top work if we have unmodelled gaps! + + # local scores + lpae = [] + i = 0 + for chn_i in self.ost_entity.chains: + ch_name_i = _get_ch_name(chn_i, self.use_auth) + for res_i in chn_i.residues: + # local pLDDT + res_num_i = _get_res_num(res_i, self.use_auth) + self.qa_metrics.append( + _LocalPLDDT( + self.asym[ch_name_i].residue(res_num_i), + self.scores_json["plddt"][i], + ) + ) + + # pairwise alignment error + if self.incl_pae: + j = 0 + for chn_j in self.ost_entity.chains: + ch_name_j = _get_ch_name(chn_j, self.use_auth) + for res_j in chn_j.residues: + res_num_j = _get_res_num(res_j, self.use_auth) + pae_ij = self.scores_json["pae"][i][j] + lpae.append( + _LocalPairwisePAE( + self.asym[ch_name_i].residue(res_num_i), + self.asym[ch_name_j].residue(res_num_j), + round(pae_ij, self.pae_digits), + ) + ) + j += 1 + + i += 1 + + if self.incl_pae: + self.qa_metrics.extend(lpae) + + +def _get_modelcif_entities(target_ents, asym_units, system): + """Create ModelCIF entities and asymmetric units.""" + alphabet = _LPeptideAlphabetWithXO() + for cif_ent in target_ents: + if cif_ent["is_synthetic_construct"]: + references = [] + mdlcif_ent = modelcif.Entity( + cif_ent["seqres"], + description=cif_ent["description"], + alphabet=alphabet, + source=ihm.source.Synthetic(), + references=references, + ) + else: + # collect references + up_ref = modelcif.reference.UniProt( + code=cif_ent["up_id"], + accession=cif_ent["up_ac"], + isoform=cif_ent["up_isoform"], + ncbi_taxonomy_id=cif_ent["up_ncbi_taxid"], + organism_scientific=cif_ent["up_organism"], + sequence_version_date=cif_ent["up_last_mod"], + sequence_crc64=cif_ent["up_crc64"], + sequence=cif_ent["up_sequence"], + ) + # ASSUME: full model covered w/o mismatches + # -> NOTE: sequence passed above is cut based on alignments! + up_ref.alignments.append(modelcif.reference.Alignment( + db_begin=cif_ent["up_range"][0], + db_end=cif_ent["up_range"][1], + entity_begin=1, + entity_end=len(cif_ent["seqres"]), + seq_dif=[ + ihm.reference.SeqDif( + mismatch[1], + alphabet[mismatch[2]], + alphabet[mismatch[3]] + ) for mismatch in cif_ent["mismatches"] + ] + )) + # + references = [up_ref] + # combine into ModelCIF entity + mdlcif_ent = modelcif.Entity( + cif_ent["seqres"], + description=cif_ent["description"], + alphabet=alphabet, + source=ihm.source.Natural( + ncbi_taxonomy_id=cif_ent["up_ncbi_taxid"], + scientific_name=cif_ent["up_organism"], + ), + references=references, + ) + # NOTE: this assigns (potentially new) alphabetic chain names + for pdb_chain_id in cif_ent["pdb_chain_ids"]: + asym_units[pdb_chain_id] = modelcif.AsymUnit( + mdlcif_ent, strand_id=pdb_chain_id, + ) + system.entities.append(mdlcif_ent) + + +def _get_assoc_pae_file(entry_id, mdl_name): + """Generate a associated file object to extract PAE to extra file.""" + return modelcif.associated.LocalPairwiseQAScoresFile( + f"{mdl_name}_local_pairwise_qa.cif", + categories=["_ma_qa_metric_local_pairwise"], + copy_categories=["_ma_qa_metric"], + entry_id=entry_id, + entry_details="This file is an associated file consisting " + + "of local pairwise QA metrics. This is a partial mmCIF " + + "file and can be validated by merging with the main " + + "mmCIF file containing the model coordinates and other " + + "associated data.", + details="Predicted aligned error", + ) + + +def _get_assoc_png_file(fle_path, png_type): + """Generate a modelcif.associated.File object pointing to PNG file + with content defined by png_type (coverage, plddt, or pae). + """ + details = { + "coverage": "PNG file showing number of sequences in the MSA covering " + + "each position in the target sequences", + "plddt": "PNG file showing pLDDT at each residue position for each " + + "of the 5 models produced", + "pae": "PNG file showing the PAE matrices for each of the 5 models " + + "produced", + } + afile = modelcif.associated.File( + fle_path, + details=details[png_type], + ) + # NOTE: file_format can be set to "png" in future ModelCIF versions + # (i.e. when https://github.com/ihmwg/ModelCIF/issues/17 is resolved) + afile.file_format = "other" + afile.file_content = "other" + return afile + + +def _get_associated_files(mdl_name, arc_files): + """Create entry for associated files.""" + # package all into zip file + return modelcif.associated.Repository( + "", + [modelcif.associated.ZipFile(f"{mdl_name}.zip", files=arc_files)], + ) + # NOTE: by convention MA expects zip file with same name as model-cif + + +def _get_sequence_dbs(config_data): + """Get ColabFold seq. DBs.""" + # Uses HC list of known DBs used in ColabFold + # -> see also notes in _get_config + db_dict = { + "UniRef_2021_03": modelcif.ReferenceDatabase( + "UniRef30", + "https://wwwuser.gwdg.de/~compbiol/colabfold/uniref30_2103.tar.gz", + version="2021_03", + ), + "UniRef_2022_02": modelcif.ReferenceDatabase( + "UniRef30", + "https://wwwuser.gwdg.de/~compbiol/colabfold/uniref30_2202.tar.gz", + version="2022_02", + ), + "UniRef_2023_02": modelcif.ReferenceDatabase( + "UniRef30", + "https://wwwuser.gwdg.de/~compbiol/colabfold/uniref30_2302.tar.gz", + version="2023_02", + ), + "Environmental": modelcif.ReferenceDatabase( + "ColabFold DB", + "https://wwwuser.gwdg.de/~compbiol/colabfold/" + + "colabfold_envdb_202108.tar.gz", + version="2021_08", + ), + "PDB100_230517": modelcif.ReferenceDatabase( + "PDB100", + "https://wwwuser.gwdg.de/~compbiol/data/hhsuite/databases/" + + "hhsuite_dbs/pdb100_foldseek_230517.tar.gz", + release_date=datetime.datetime(2023, 5, 17) + ), + "PDB70_211027": modelcif.ReferenceDatabase( + "PDB70", + "https://wwwuser.gwdg.de/~compbiol/data/hhsuite/databases/" + + "hhsuite_dbs/pdb70_from_mmcif_211027.tar.gz", + release_date=datetime.datetime(2021, 10, 27) + ), + "PDB70_211117": modelcif.ReferenceDatabase( + "PDB70", + "https://wwwuser.gwdg.de/~compbiol/data/hhsuite/databases/" + + "hhsuite_dbs/pdb70_from_mmcif_211117.tar.gz", + release_date=datetime.datetime(2021, 11, 17) + ), + "PDB70_220313": modelcif.ReferenceDatabase( + "PDB70", + "https://wwwuser.gwdg.de/~compbiol/data/hhsuite/databases/" + + "hhsuite_dbs/pdb70_from_mmcif_220313.tar.gz", + release_date=datetime.datetime(2022, 3, 13) + ), + } + # fill list of DBs + seq_dbs = [] + search_keys = [] + for seq_db in config_data["seq_dbs"]: + if seq_db == "UniRef": + if config_data['ur30_db_version'] is None: + raise ValueError("Cannot use UniRef without version") + search_key = f"UniRef_{config_data['ur30_db_version']}" + else: + search_key = seq_db + search_keys.append(search_key) + if config_data["tpl_db"] is not None: + if config_data["tpl_db_version"] is None: + raise ValueError("Cannot have tpl DB without version") + search_keys.append( + f"{config_data['tpl_db']}_{config_data['tpl_db_version']}" + ) + for search_key in search_keys: + if search_key not in db_dict: + raise ValueError(f"Unknown seq. DB {search_key}") + seq_dbs.append(db_dict[search_key]) + return seq_dbs + + +def _assemble_modelcif_software(soft_dict, params_dict): + """Create a modelcif.SoftwareWithParameters instance from dictionaries.""" + # create SW object + sw = modelcif.Software( + soft_dict["name"], + soft_dict["classification"], + soft_dict["description"], + soft_dict["location"], + soft_dict["type"], + soft_dict["version"], + citation=soft_dict["citation"], + ) + # assemble parameters + params = [] + for key, val in params_dict.items(): + params.append(modelcif.SoftwareParameter(key, val)) + # put them together + return modelcif.SoftwareWithParameters(sw, params) + + +def _get_modelcif_protocol_software(js_step): + """Assemble software entries for a ModelCIF protocol step.""" + # new setup in python-modelcif (as of late 2023): params with each SW + sw_list = [] + for sw, sw_params in js_step["software_plus_params"]: + sw_list.append(_assemble_modelcif_software(sw, sw_params)) + # group and done... + if sw_list: + return modelcif.SoftwareGroup(sw_list) + else: + return None + + +def _get_modelcif_protocol_data(data_label, target_entities, model, ref_dbs): + """Assemble data for a ModelCIF protocol step.""" + if data_label == "target_sequences_and_ref_DBs": + data = modelcif.data.DataGroup(target_entities) + data.extend(ref_dbs) + elif data_label == "model": + data = model + else: + raise RuntimeError(f"Unknown protocol data: '{data_label}'") + return data + + +def _get_modelcif_protocol(protocol_steps, target_entities, model, ref_dbs): + """Create the protocol for the ModelCIF file.""" + protocol = modelcif.protocol.Protocol() + for js_step in protocol_steps: + sftwre = _get_modelcif_protocol_software(js_step) + input_data = _get_modelcif_protocol_data( + js_step["input"], target_entities, model, ref_dbs + ) + output_data = _get_modelcif_protocol_data( + js_step["output"], target_entities, model, ref_dbs + ) + + protocol.steps.append( + modelcif.protocol.Step( + input_data=input_data, + output_data=output_data, + name=js_step["name"], + details=js_step["details"], + software=sftwre, + ) + ) + protocol.steps[-1].method_type = js_step["method_type"] + return protocol + + +def _compress_cif_file(cif_file): + """Compress cif file and delete original.""" + with open(cif_file, "rb") as f_in: + with gzip.open(cif_file + ".gz", "wb") as f_out: + shutil.copyfileobj(f_in, f_out) + os.remove(cif_file) + + +def _package_associated_files(repo): + """Compress associated files into single zip file and delete original.""" + # zip settings tested for good speed vs compression + for archive in repo.files: + with zipfile.ZipFile(archive.path, "w", zipfile.ZIP_BZIP2) as cif_zip: + for zfile in archive.files: + cif_zip.write(zfile.path, arcname=zfile.path) + os.remove(zfile.path) + + +def _store_as_modelcif(data_json, ost_ent, out_dir, mdl_name, compress, add_pae, add_files): + """Mix all the data into a ModelCIF file.""" + print(" generating ModelCIF objects...", end="") + pstart = timer() + # create system to gather all the data + system = modelcif.System( + title=data_json["title"], + id=data_json["mdl_id"].upper(), + model_details=data_json["model_details"], + ) + + # create an asymmetric unit and an entity per target sequence + asym_units = {} + _get_modelcif_entities(data_json["target_entities"], asym_units, system) + + # audit_authors + system.authors.extend(data_json["audit_authors"]) + + # set up the model to produce coordinates + model = _OST2ModelCIF( + assembly=modelcif.Assembly(asym_units.values()), + asym=asym_units, + ost_entity=ost_ent, + scores_json=data_json, + name=data_json["mdl_name"], + incl_pae=add_pae, + ) + print(f" ({timer()-pstart:.2f}s)") + print(" processing QA scores...", end="", flush=True) + pstart = timer() + model.add_scores() + print(f" ({timer()-pstart:.2f}s)") + + model_group = modelcif.model.ModelGroup([model]) + system.model_groups.append(model_group) + + # handle additional files + arc_files = [] + if add_pae: + arc_files.append(_get_assoc_pae_file(system.id, mdl_name)) + arc_files.extend(add_files) + if arc_files: + system.repositories.append(_get_associated_files(mdl_name, arc_files)) + + # get data and steps + ref_dbs = _get_sequence_dbs(data_json["cf_config"]) + protocol = _get_modelcif_protocol( + data_json["protocol"], system.entities, model, ref_dbs, + ) + system.protocols.append(protocol) + + # write modelcif System to file + print(" write to disk...", end="", flush=True) + pstart = timer() + # NOTE: this will dump PAE on path provided in add_scores + # -> hence we cheat by changing path and back while being exception-safe... + oldpwd = os.getcwd() + os.chdir(out_dir) + mdl_fle = f"{mdl_name}.cif" + try: + with open(mdl_fle, "w", encoding="ascii") as mmcif_fh: + modelcif.dumper.write(mmcif_fh, [system]) + if arc_files: + _package_associated_files(system.repositories[0]) + if compress: + _compress_cif_file(mdl_fle) + mdl_fle += ".gz" + finally: + os.chdir(oldpwd) + print(f" ({timer()-pstart:.2f}s)") + return mdl_fle +################################################################################ + + +################################################################################ +# HANDLE FULL DATA SET +################################################################################ + +def generate_json_filepath(pdb_filename, directory, input_data_path): + + # Split the pdb filename into parts + parts = pdb_filename.split('_') + + if directory == "relaxed_figures/": + if parts[-2] == "seed": + parts[-10] = parts[-10].replace('relaxed', 'scores') + parts[-1] = parts[-1].replace('.pdb', '.json') + json_filename = '_'.join(parts) + json_file_path = input_data_path + '/' + directory + json_filename + + if parts[-2] == "only": + parts[3] = parts[3].replace('relaxed', 'scores') + if parts[6] == 'v3': + parts[6] = 'alphafold2_multimer_v3' + else: + parts[6] = 'alphafold2_multimer_v3_' + parts[6] + parts[-2] = parts[-2].replace('only', 'seed') + parts[-1] = parts[-1].replace('pep.pdb', '000.json') + json_filename = '_'.join(parts) + json_file_path = input_data_path + '/' + "screen_256/" + json_filename + elif directory in ("screen_256/", 'design/', 'holoenzyme/'): + json_file_path = os.path.join( + input_data_path, + directory, + pdb_filename.replace("_relaxed_", "_scores_") + .replace("_unrelaxed_", "_scores_") + .replace(".pdb", ".json") + ) + + return json_file_path + + +def _translate2modelcif(metadata, opts, add_files=[]): + """Convert a model with its accompanying data to ModelCIF.""" + mdl_id = metadata["mdl_id"] # here for exemple : ma-osf-ppp2r2a-00x + # skip if done already (done later here due to info to be returned) + + if opts.compress: + cifext = "cif.gz" + else: + cifext = "cif" + mdl_path = os.path.join(opts.out_dir, f"{mdl_id}.{cifext}") + + # prepare data for models to convert (also gets all issues) + issues = [] + + mdl_dict = [] + pdb_file_path = opts.input_data_path + '/' + \ + metadata['directory'] + metadata['pdb_file_name'] + with open(pdb_file_path, 'r') as file: + pdb_data = file.read() + + mdl_dict = dict() + mdl_dict["ent"] = io.PDBStrToEntity( + pdb_data, + profile=io.profiles["DEFAULT"], + process=True + ) + file_base, file_ext = os.path.splitext(metadata['pdb_file_name']) + mdl_dict["mdl_name"] = file_base + json_file_path = metadata['json_file_path'] + with open(json_file_path, 'r') as file: + json_data = json.load(file) + json_data["iPAE"] = metadata['iPAE'] + json_data["ipLDDT"] = metadata['ipLDDT'] + + # subset the scores + chain_lengths = [] + for ch in metadata["chains"]: + ost_chain = mdl_dict["ent"].FindChain(ch["chain"]) + if ch["up_range"] is not None and "only_pep" in metadata['pdb_file_name']: + start, end = map(int, ch["up_range"].split('-')) + # I make the assumption that the chain with a subset is the last one + previous_chains_length = sum(chain_lengths) + json_data["plddt"] = json_data["plddt"][:previous_chains_length] + \ + json_data["plddt"][(previous_chains_length+start-1) + :(previous_chains_length+end)] + json_data["pae"] = json_data["pae"][:previous_chains_length] + \ + json_data["pae"][(previous_chains_length+start-1) + :(previous_chains_length+end)] + for i in range(len(json_data["pae"])): + row = json_data["pae"][i] + json_data["pae"][i] = row[:previous_chains_length] + \ + row[(previous_chains_length+start-1) + :(previous_chains_length+end)] + json_data['ptm'] = None + json_data['iptm'] = None + continue + chain_lengths.append(ost_chain.residue_count) + + mdl_dict["scores"] = json_data + metadata["plddt"] = json_data["plddt"] + entities, issues = _get_entities(mdl_dict, metadata) + scores_issues = _check_scores(mdl_dict, metadata) + issues.extend(scores_issues) + + # abort here if already done + if opts.checks_only: + return issues + if os.path.exists(mdl_path): + print(f" {mdl_id} already done...") + return issues + + # now the translation from the single function + mdlcf_json = {} + # the timepoint and the dictionary and the chain_B are the elements determining the config + config_dict = metadata["config"].copy() + cf_config = _get_cf_config( + config_dict, metadata['ur30_db_version'], metadata['tpl_db'], metadata['tpl_db_version']) + + mdlcf_json["audit_authors"] = _get_audit_authors() + if (mdl_dict["scores"]["ipLDDT"] is not None) and (mdl_dict["scores"]["ipLDDT"] != 'None') and (float(mdl_dict["scores"]["ipLDDT"]) > 0): + mdlcf_json["protocol"] = _get_protocol_steps_and_software( + cf_config, model_selection_step=True, binding_test_result=metadata['binding_test_result'], refinement=metadata['refinement']) + else: + mdlcf_json["protocol"] = _get_protocol_steps_and_software( + cf_config, model_selection_step=False, binding_test_result=metadata['binding_test_result'], refinement=metadata['refinement']) + mdlcf_json["cf_config"] = cf_config + mdlcf_json["mdl_id"] = mdl_id # used for entry ID + mdlcf_json["mdl_name"] = file_base + mdlcf_json["target_entities"] = entities + for scores_key in ["plddt", "pae", "ipLDDT", "iPAE"]: + mdlcf_json[scores_key] = mdl_dict["scores"][scores_key] + mdlcf_json["title"] = _get_title(metadata) + mdlcf_json["model_details"] = _get_model_details(metadata) + mdlcf_json["iptm"] = json_data['iptm'] + mdlcf_json["ptm"] = json_data['ptm'] + + print(f" translating {mdl_id}...") + pdb_start = timer() + + file_name = _store_as_modelcif( + data_json=mdlcf_json, + ost_ent=mdl_dict["ent"], + out_dir=opts.out_dir, + mdl_name=mdl_id, + compress=opts.compress, + add_pae=not opts.no_extra_files, + add_files=add_files + ) + + # check if result can be read and has expected seq. + mdl_path = os.path.join(opts.out_dir, file_name) + ent, ss = io.LoadMMCIF(mdl_path, seqres=True) + exp_seqs = [] + for trg_ent in mdlcf_json["target_entities"]: + exp_seqs += [trg_ent["pdb_sequence"]] * len(trg_ent["pdb_chain_ids"]) + assert ent.chain_count == len(exp_seqs), f"Bad chain count {mdl_id}" + # NOTE: here we expect auth = label IDs + ent_seq = "".join([_get_sequence(chn, False) for chn in ent.chains]) + ent_seq_a = "".join([_get_sequence(chn, True) for chn in ent.chains]) + assert ent_seq == ent_seq_a + assert ent_seq == "".join(exp_seqs), f"Bad seq. {mdl_id}" + ent_seqres = [ss.FindSequence(chn.name).string for chn in ent.chains] + exp_seqres = [] + for trg_ent in mdlcf_json["target_entities"]: + exp_seqres += [trg_ent["seqres"]] * len(trg_ent["pdb_chain_ids"]) + assert ent_seqres == exp_seqres, f"Bad seqres {mdl_id}" + print(f" ... done with {mdl_id} ({timer()-pdb_start:.2f}s).") + + return issues + + +def _get_metadata(input_data_path, single_model=None): + + metadata_csv = pd.read_csv(os.path.join( + input_data_path, "info_of_submitted_structures.csv"), sep='\t') + metadata_full = [] + + # fetch configs + configs = {} + for directory in metadata_csv['directory'].unique(): + config_path = os.path.join(input_data_path, directory) + configs[directory] = json.load( + open(os.path.join(config_path, "config.json")) + ) + for mdl_idx, metadata in metadata_csv.iterrows(): + mdl_num = mdl_idx + 1 + if single_model is not None and mdl_num != int(single_model): + continue + data = dict() + data['mdl_num'] = mdl_num + data['mdl_id'] = "ma-osf-ppp2r2a-" + f"{mdl_num:03}" + data['title'] = metadata['title'] + + data['abstract'] = metadata['description'] + chains = [] + for chain_id in "ABCD": + if isinstance(metadata["chain_" + chain_id], str): + chain_data = metadata["chain_" + chain_id] + if chain_data == "synthetic construct": + chain = { + "chain": chain_id, + "up_ac": None, + "up_range": None, + "is_synthetic_construct": True + } + else: + if ':' in chain_data: + up_ac, up_range = chain_data.split(':', 1) + else: + up_ac, up_range = (chain_data, None) + + chain = { + "chain": chain_id, + "up_ac": up_ac, + "up_range": up_range, + "is_synthetic_construct": False + } + chains.append(chain) + + data['chains'] = chains + + data['ipLDDT'] = metadata['ipLDDT_in_conserved_binding_site'] + data['iPAE'] = metadata['iPAE_in_conserved binding site'] + data['json_file_path'] = generate_json_filepath( + metadata['pdb'], metadata['directory'], input_data_path) + data['pdb_file_name'] = metadata['pdb'] + data['directory'] = metadata['directory'] + if metadata['timepoint'] == "between 13.7.22 and 12.6.23": + data['ur30_db_version'] = "2022_02" + data['tpl_db'] = "PDB70" + data['tpl_db_version'] = "220313" + elif metadata['timepoint'] == "after 12.6.23": + data['ur30_db_version'] = "2023_02" + data['tpl_db'] = "PDB100" + data['tpl_db_version'] = "230517" + else: + raise RuntimeError(f"Unknown databases version for this timepoint :" + f"{metadata['timepoint']}") + data['config'] = configs[metadata['directory']] + + # protocol + if metadata['directory'] == "relaxed_figures/": + if "only_pep" in metadata['pdb']: + data['refinement'] = "cropped_and_relax" + else: + data['refinement'] = "relax" + else: + data['refinement'] = None + data['binding_test_result'] = metadata['binding'] + metadata_full.append(data) + return metadata_full + + +################################################################################ +# HANDLE ONE MODEL IN A NOTEBOOK +################################################################################ + +def _main(): + """Run as script.""" + + # parse/fetch global data + opts = _parse_args() + + # parse/fetch global data + metadata_all = _get_metadata(opts.input_data_path, opts.single_model) + + # iterate over models + print(f"Working on models in {opts.input_data_path}...") + issues = [] + for metadata in metadata_all: + new_issues = _translate2modelcif(metadata, opts) + issues.extend(new_issues) + print(f"... done with models in {opts.input_data_path}.") + + if opts.single_model is None: + # dump issues + issues_file_path = os.path.join(opts.out_dir, "issues.json") + json.dump(issues, open(issues_file_path, "w")) + # dump info on which ones to export to 3D-Beacons + to_export_file_path = os.path.join(opts.out_dir, "MA_to_export.json") + to_export = { + metadata["mdl_num"]: ((metadata["binding_test_result"] == "yes") or ( + metadata["directory"] == "relaxed_figures/") or (metadata["directory"] == "holoenzyme/")) + for metadata in metadata_all + } + json.dump(to_export, open(to_export_file_path, "w")) + + +if __name__ == "__main__": + _main() diff --git a/projects/2024-08-ma-dm-hisrep/README.md b/projects/2024-08-ma-dm-hisrep/README.md new file mode 100644 index 0000000000000000000000000000000000000000..8d8984e505bf940562e477755823b65f6848f8b6 --- /dev/null +++ b/projects/2024-08-ma-dm-hisrep/README.md @@ -0,0 +1,29 @@ +# Modelling of histone complexes (structural prediction screen) + +[Link to project in ModelArchive](https://www.modelarchive.org/doi/10.5452/ma-dm-hisrep) (incl. background on project itself) + +Input files for conversion: +- Annotations.json with metadata (incl. UniProtKB AC) +- Config_Files directory with config_[X].json files and dates.json with model creation dates +- Zip_Files directory with files named [X]-[NAME].zip for each X listed in the metadata files +- ZIP files are expected to contain the 3 top ranked models (.pdb) with their respective scores (.json) and .png files for coverage-, pLDDT-, and PAE-plots as produced by ColabFold (all with their original file names) + +Modelling setup: +- Classic ColabFold setup with PDB coordinates and scores JSON and link to UniProt + +Special features here compared to the PRC-complexes script and PP2A-B55-design project: +- UniProt link: handling of subsets of sequences and with old versions and fixed cache of UniProt data, code adapted from PP2A-B55-design project +- Use of modelcif.reference.SeqDif to handle mismatches between UniProt and entity sequence, improved from PP2A-B55-design to use more of the alignment data (alignment start/end) +- Improved of PP2A-B55-design caching mechanism, with model sequence specific keys, when fetching UniProt data +- Processing of the model creation dates in dates.json, intentionally provided to determine ColabFold database names and versions, with the _get_cf_db_versions function +- Renaming of chain labels in the .pdb that have been shifted compared to the metadata ('A'->'B', 'B'->'C', etc.) +- Special processing for the model 263, to ignore the 5 first aa introduced in the model sequence during the sequence alignment algorithm +- local plddt accuracy threshold doubled to 0.011 +- iptm score check ignored + +Content: +- translate2modelcif.py : script to do conversion (run in virtual environment with same setup as Docker container here but with OST 2.8 and very latest main branch of python-modelcif and python-ihm from 20.6.2024) +- minimal_example.zip: example input to convert a single complex from this set +- minimal_example_modelcif: output from running conversion of minimal_example with the command bellow : + +```python3 translate2modelcif.py ./ModelArchive-ma-dm-hisrep ./modelcif --single-model 3 --no-extra-files``` \ No newline at end of file diff --git a/projects/2024-08-ma-dm-hisrep/minimal_example.zip b/projects/2024-08-ma-dm-hisrep/minimal_example.zip new file mode 100644 index 0000000000000000000000000000000000000000..e367b08507c4d78b899332932a503d7b50da3ed5 Binary files /dev/null and b/projects/2024-08-ma-dm-hisrep/minimal_example.zip differ diff --git a/projects/2024-08-ma-dm-hisrep/minimal_example_modelcif.zip b/projects/2024-08-ma-dm-hisrep/minimal_example_modelcif.zip new file mode 100644 index 0000000000000000000000000000000000000000..bd359580ceb82f55bcf37a3f2ed1ccbc081db2bd Binary files /dev/null and b/projects/2024-08-ma-dm-hisrep/minimal_example_modelcif.zip differ diff --git a/projects/2024-08-ma-dm-hisrep/translate2modelcif.py b/projects/2024-08-ma-dm-hisrep/translate2modelcif.py new file mode 100644 index 0000000000000000000000000000000000000000..aa4efee9af890418938b72e64682e1b44361f0cf --- /dev/null +++ b/projects/2024-08-ma-dm-hisrep/translate2modelcif.py @@ -0,0 +1,1916 @@ +#! /usr/local/bin/ost +# -*- coding: utf-8 -*- + +"""Translate PRC models for Juntao from PDB + extra data into ModelCIF.""" + +# EXAMPLES for running: +# ost translate2modelcif.py ./ModelArchive-ma-dm-hisrep ./modelcif + +import argparse +import datetime +import gzip +import os +import shutil +import sys +import zipfile + +from timeit import default_timer as timer +import numpy as np +import requests +import ujson as json + +import ihm +import ihm.citations +import modelcif +import modelcif.associated +import modelcif.dumper +import modelcif.model +import modelcif.protocol +import modelcif.reference + +import pandas as pd +from ost import io, seq, mol + +################################################################################ +# GENERAL HELPER FUNCTIONS +################################################################################ +def _abort_msg(msg, exit_code=1): + """Write error message and exit with exit_code.""" + print(f"{msg}\nAborting.", file=sys.stderr) + sys.exit(exit_code) + + +def _warn_msg(msg): + """Write a warning message to stdout.""" + print(f"WARNING: {msg}") + + +def _check_file(file_path): + """Make sure a file exists and is actually a file.""" + if not os.path.exists(file_path): + _abort_msg(f"File not found: '{file_path}'.") + if not os.path.isfile(file_path): + _abort_msg(f"File path does not point to file: '{file_path}'.") + + +def _check_folder(dir_path): + """Make sure a file exists and is actually a file.""" + if not os.path.exists(dir_path): + _abort_msg(f"Path not found: '{dir_path}'.") + if not os.path.isdir(dir_path): + _abort_msg(f"Path does not point to a directory: '{dir_path}'.") + + +def _check_opts_folder(dir_path): + """Remove trailing '/' (return fixed one) and check if path valid.""" + if dir_path.endswith("/"): + dir_path = dir_path[:-1] + _check_folder(dir_path) + return dir_path + + +def _get_res_num(r, use_auth=False): + """Get res. num. from auth. IDs if reading from mmCIF files.""" + if use_auth: + return int(r.GetStringProp("pdb_auth_resnum")) + return r.number.num + + +def _get_ch_name(ch, use_auth=False): + """Get chain name from auth. IDs if reading from mmCIF files.""" + if use_auth: + return ch.GetStringProp("pdb_auth_chain_name") + return ch.name + + +def _get_sequence(chn, use_auth=False): + """Get the sequence out of an OST chain incl. '-' for gaps in resnums.""" + # initialise (add gaps if first is not at num. 1) + lst_rn = _get_res_num(chn.residues[0], use_auth) + idx = 1 + sqe = "-" * (lst_rn - 1) + chn.residues[0].one_letter_code + + for res in chn.residues[idx:]: + lst_rn += 1 + while lst_rn != _get_res_num(res, use_auth): + sqe += "-" + lst_rn += 1 + sqe += res.one_letter_code + return sqe + +def shift_letter(letter): + """Move letters backward in the alphabet. 'A' would raise an exception.""" + # Convert letter to its ASCII value + ascii_value = ord(letter) + + # Shift the ASCII value by -1 + shifted_value = ascii_value - 1 + + # Handle 'A' + if shifted_value < ord('A'): + raise RuntimeError( + f"Could not shift the letter 'A' forward in the alphabet" + ) + # Convert back to a character + return chr(shifted_value) +################################################################################ + +################################################################################ +# DATA HANDLING +################################################################################ +def _parse_args(): + """Parse command line arguments.""" + parser = argparse.ArgumentParser( + formatter_class=argparse.RawDescriptionHelpFormatter, + description=__doc__, + ) + + parser.add_argument( + "input_data_path", + type=str, + metavar="<INPUT DATA PATH>", + help="Data as provided by depositors. Expected to contain files " + + "Annotations.json with metadata, Config_Files " + + "directory with config_[X].json files, dates.json" + + "with model creation dates, and Zip_Files directory with files named " + + "[X]-[NAME].zip for each X listed in the metadata files.", + ) + parser.add_argument( + "out_dir", + type=str, + metavar="<OUTPUT DIR>", + help="Path to directory to store results ([X]-[NAME].* files and " + + "issues.json with any observed issues).", + ) + parser.add_argument( + "--compress", + default=False, + action="store_true", + help="Compress ModelCIF file with gzip.", + ) + parser.add_argument( + "--checks-only", + default=False, + action="store_true", + help="Perform only checks without producing ModelCIF files.", + ) + parser.add_argument( + "--no-extra-files", + default=False, + action="store_true", + help="Skip writing extra models, PNGs, and PAE (for testing).", + ) + parser.add_argument( + "--single-model", + type=str, + #metavar="<PDB WEB PATH>", + default=None, + help="If provided, only the model matching the provided string in the " + + "Annotations.json will be converted.", + ) + + opts = parser.parse_args() + + # check input + opts.input_data_path = _check_opts_folder(opts.input_data_path) + _check_file(os.path.join(opts.input_data_path, "Annotations", "Annotations.json")) + _check_folder(os.path.join(opts.input_data_path, "Configs")) + _check_folder(os.path.join(opts.input_data_path, "Zip_files")) + if opts.out_dir.endswith("/"): + opts.out_dir = opts.out_dir[:-1] + if not os.path.exists(opts.out_dir): + os.makedirs(opts.out_dir, exist_ok=True) + return opts + + +def _get_audit_authors(): + """Return the list of authors that produced this model.""" + return ( + "Yu, Juntao", + "Zhang, Yujie", + "Fang, Yimeng", + "Paulo, Joao A.", + "Yaghoubi, Dadmehr", + "Hua, Xu", + "Shipkovenska, Gergana", + "Toda, Takenori", + "Zhang, Zhiguo", + "Gygi, Steven P.", + "Jia, Songtao", + "Li, Qing", + "Moazed, Danesh", + ) + + +def _zip_file_check(zf, file_name, desired_ranks, data_from_zip, exp_relaxed): + """Fill data_from_zip with info from given file in zipped file handle zf if + data is to be added (depends on file_name). + Only ranks from 1 to desired_ranks are parsed. + Return False if file not parsed (ok if file_name hidden file; weird else). + """ + file_base_path = os.path.basename(file_name) + file_base, file_ext = os.path.splitext(file_base_path) + if file_base.startswith('.'): + return False + # check for PNGs + for png_type in ["coverage", "plddt", "pae"]: #TODO: should not allow overwrite + if file_name.lower().endswith(f"_{png_type}.png") and ("{png_type}_png_data" not in data_from_zip.keys()): + data_from_zip[f"{png_type}_png_data"] = zf.open(file_name).read() + data_from_zip[f"{png_type}_png_file_name"] = file_base_path + return True + # special case: config.json + if file_name == "config.json": + data_from_zip["config"] = json.load(zf.open(file_name)) + return True + # check for PDB and JSON of given rank + # -> PDB file name example: ..._unrelaxed_rank_001_..._model_1_seed_000.pdb + # -> JSON file name example: ..._scores_rank_001_..._model_1_seed_000.json + if file_ext in [".pdb", ".json"]: + ss = file_base.split('_') + mdl_rank = int(ss[ss.index("rank") + 1]) + if mdl_rank < 1 or mdl_rank > desired_ranks: + return False + mdl_rank_key = f"rank_{mdl_rank}" + if mdl_rank_key not in data_from_zip: + data_from_zip[mdl_rank_key] = {} + mdl_dict = data_from_zip[mdl_rank_key] + if file_ext == ".pdb": + # duplicate = extra unhandled file + if "ent" in mdl_dict or (exp_relaxed and "_unrelaxed_" in file_base): + return False + mdl_dict["mdl_file_base"] = file_base + mdl_dict["mdl_id"] = '_'.join(ss[ss.index("model"):]) + mdl_dict["ent"] = io.PDBStrToEntity( + zf.open(file_name).read(), + profile=io.profiles["DEFAULT"], + process=True + ) + return True + elif file_ext == ".json": + # duplicate = extra unhandled file + if "scores" in mdl_dict: + return False + mdl_dict["scores"] = json.load(zf.open(file_name)) + return True + return False + + +def _parse_zip_file(zip_file_path, desired_ranks, incl_relaxed=False): + """Parse data in provided ZIP files and checks for desired ranks. + Returns dict. with keys "rank_X" for X from 1 to desired_ranks incl.: + - mdl_file_base: file name of selected PDB file w/o directory and extension + - mdl_id: ID of selected PDB file (e.g. "model_1_seed_000") + - ent: OST entity for PDB file + - scores: dict. loaded from scores JSON + It further contains info on PNG files as: + - [png_type]_png_data: data in file (to be written back into accomp. zip) + - [png_type]_png_file_name: file name w/o directory used in ZIP file + for [png_type] in [coverage, plddt, pae]. + Optionally available keys (if available in zip file): + - config: dict. loaded from config.json + If expected files are not found, an exception is raised. + If additional files are found, a warning is shown. + """ + file_dates = [] + data_from_zip = {} + unparsed_files = [] + with zipfile.ZipFile(zip_file_path) as zf: + # handle case where it's just a zip file in a zip file + nested_zip_files = [f for f in zf.namelist() if f.endswith(".zip")] + if len(zf.namelist()) == 1 and len(nested_zip_files) == 1: + with zf.open(nested_zip_files[0]) as nested_zip: + return _parse_zip_file(nested_zip, desired_ranks, incl_relaxed) + # handle all other cases + for file_name in zf.namelist(): + check = _zip_file_check( + zf, file_name, desired_ranks, data_from_zip, incl_relaxed + ) + # ok/expected check false for unrelaxed in cases with relaxed ones + if not check and not os.path.basename(file_name).startswith('.') \ + and not (incl_relaxed and "_unrelaxed_" in file_name): + unparsed_files.append(file_name) + else: + file_date = zf.getinfo(file_name).date_time + file_dates.append(datetime.datetime(*file_date)) + # check if complete + exp_keys = [f"rank_{num+1}" for num in range(desired_ranks)] + for png_type in ["coverage", "plddt", "pae"]: + exp_keys.extend([ + f"{png_type}_png_data", f"{png_type}_png_file_name" + ]) + unmatched_keys = set(exp_keys) - set(data_from_zip) + if len(unmatched_keys) != 0: + _warn_msg( + f"Could not find expected files in {zip_file_path}. " \ + f"Missing {sorted(unmatched_keys)}." + ) + exp_mdl_keys = sorted(["mdl_file_base", "mdl_id", "ent", "scores"]) + for num in range(desired_ranks): + if exp_mdl_keys != sorted(data_from_zip[f"rank_{num+1}"].keys()): + raise RuntimeError( + f"Could not find expected files in {zip_file_path} " \ + f"for rank {num + 1}." + ) + extra_stuff = sorted(set(data_from_zip) - set(exp_keys + ["config"])) + extra_stuff += sorted(unparsed_files) + if len(extra_stuff) != 0: + _warn_msg( + f"Extra unexpected content found in {zip_file_path}: " \ + f"{extra_stuff}" + ) + # add date range + data_from_zip["date_range"] = (min(file_dates), max(file_dates)) + return data_from_zip + + +def _check_scores(mdl_data_from_zip, metadata, mdl_rank): + """Check scores JSON. + Bad issues raise exceptions, minor ones are in returned list + (compatible with list returned by _get_entities) + """ + issues = [] + scores_json = mdl_data_from_zip["scores"] + # NOTE: cannot deal with gapped sequences here as we cannot map + # multiple chains to scores + ost_ent = mdl_data_from_zip["ent"] + exp_len = ost_ent.residue_count + assert "ptm" in scores_json + assert len(scores_json["pae"]) == exp_len + assert len(scores_json["pae"][0]) == exp_len + # check actual scores + # b-factor vs pLDDT in expected range? + ent_plddts = [] + for i, res in enumerate(ost_ent.residues): + b_factors = [a.b_factor for a in res.atoms] + assert len(set(b_factors)) == 1 # must all be equal! + ent_plddts.append(b_factors[0]) + scores_plddts = scores_json["plddt"] + assert len(ent_plddts) == len(scores_plddts) + plddt_max_diff = max([ + abs(s1 - s2) for s1, s2 in zip(ent_plddts, scores_plddts) + ]) + # threshold due to 0.01 accuracy in PDB file + numerical rounding + if plddt_max_diff > 0.011: + issues.append(( + metadata['mdl_id'], + "plddt_vs_bf_mismatch", + (mdl_rank, plddt_max_diff), + () + )) + return issues + + +def _get_n_parse_up_entry(up_ac, up_txt_path): + """Get data for an UniProtKB entry and parse it.""" + # This is a simple parser for UniProtKB txt format, instead of breaking it + # up into multiple functions, we just allow many many branches & statements, + # here. + # pylint: disable=too-many-branches,too-many-statements + data = {} + data["up_organism"] = "" + data["up_sequence"] = "" + data["up_ac"] = up_ac + # check if we read from file or URL + if up_txt_path.startswith("http"): + rspns = requests.get(up_txt_path, timeout=180) + lines = rspns.iter_lines(decode_unicode=True) + else: + lines = open(up_txt_path).readlines() + for line_ in lines: + # need to strip trailing characters if reading from file (doesn't hurt) + line = line_.rstrip() + if line.startswith("ID "): + sline = line.split() + if len(sline) != 5: + raise RuntimeError(f"Unusual UniProtKB ID line found:\n" \ + f"'{line}'") + data["up_id"] = sline[1] + elif line.startswith("OX NCBI_TaxID="): + # Following strictly the UniProtKB format: 'OX NCBI_TaxID=<ID>;' + data["up_ncbi_taxid"] = line[len("OX NCBI_TaxID=") : -1] + data["up_ncbi_taxid"] = data["up_ncbi_taxid"].split("{")[0].strip() + elif line.startswith("OS "): + # multiple lines possible; last one ends in "." + if line[-1] == ".": + data["up_organism"] += line[len("OS ") : -1] + else: + data["up_organism"] += line[len("OS ") :] + " " + elif line.startswith("SQ "): + sline = line.split() + if len(sline) != 8: + raise RuntimeError(f"Unusual UniProtKB SQ line found:\n" \ + f"'{line}'") + data["up_seqlen"] = int(sline[2]) + data["up_crc64"] = sline[6] + elif line.startswith(" "): + sline = line.split() + if len(sline) > 6: + raise RuntimeError( + "Unusual UniProtKB sequence data line " + + f"found:\n'{line}'" + ) + data["up_sequence"] += "".join(sline) + elif line.startswith("DT "): + dt_flds = line[len("DT ") :].split(", ") + if dt_flds[1].upper().startswith("SEQUENCE VERSION "): + data["up_last_mod"] = datetime.datetime.strptime( + dt_flds[0], "%d-%b-%Y" + ) + elif dt_flds[1].upper().startswith("ENTRY VERSION "): + data["up_entry_version"] = dt_flds[1][len("ENTRY VERSION ") :] + if data["up_entry_version"][-1] == ".": + data["up_entry_version"] = data["up_entry_version"][:-1] + data["up_entry_version"] = int(data["up_entry_version"]) + elif line.startswith("GN Name="): + data["up_gn"] = line[len("GN Name=") :].split(";")[0] + data["up_gn"] = data["up_gn"].split("{")[0].strip() + + # in UP isoforms are identified in the AC so no need for this... + # -> in PDB (e.g. 8TRE), we see unset _struct_ref.pdbx_db_isoform in such cases + data["up_isoform"] = None + + # NOTE: no gene names in this set (use provided names instead) + if "up_gn" not in data: + _warn_msg( + f"No gene name found for UniProtKB entry '{up_ac}', using " + + "UniProtKB AC instead." + ) + data["up_gn"] = up_ac + if "up_last_mod" not in data: + raise RuntimeError(f"No sequence version found for UniProtKB entry " \ + f"'{up_ac}'.") + if "up_crc64" not in data: + raise RuntimeError(f"No CRC64 value found for UniProtKB entry " \ + f"'{up_ac}'.") + if len(data["up_sequence"]) == 0: + raise RuntimeError(f"No sequence found for UniProtKB entry '{up_ac}'.") + # check that sequence length and CRC64 is correct + if data["up_seqlen"] != len(data["up_sequence"]): + raise RuntimeError( + "Sequence length of SQ line and sequence data differ for " + + f"UniProtKB entry '{up_ac}': {data['up_seqlen']} != " + + f"{len(data['up_sequence'])}" + ) + + if "up_id" not in data: + raise RuntimeError(f"No ID found for UniProtKB entry '{up_ac}'.") + if "up_ncbi_taxid" not in data: + raise RuntimeError(f"No NCBI taxonomy ID found for UniProtKB entry " \ + f"'{up_ac}'.") + if len(data["up_organism"]) == 0: + raise RuntimeError(f"No organism species found for UniProtKB entry " \ + f"'{up_ac}'.") + return data + + +def _fetch_upkb_entry(up_ac): + """Get an UniProtKB entry.""" + return _get_n_parse_up_entry( + up_ac, f"https://rest.uniprot.org/uniprotkb/{up_ac}.txt" + ) + + +def _fetch_unisave_entry(up_ac, version): + """Get an UniSave entry, in contrast to an UniProtKB entry, that allows us + to specify a version.""" + return _get_n_parse_up_entry( + up_ac, + f"https://rest.uniprot.org/unisave/{up_ac}?format=txt&" + + f"versions={version}", + ) + + +# for cache below +upkb_entry_cache = {} # key = (up_ac, up_version, mdl_sequence) +def _fetch_upkb_cached(sqe, up_ac, up_version=None): + """Get best matching UniProtKB entry for sequence sqe. + Get it from cache if already fetched. + up_version defines starting version in UP to check. + Note that the returned UP entry may be a different version than the one in up_version. + Returned UP data includes result of _align_sequences. + If no perfect match found, it prints a warning. + """ + # check if in cache already + cache_key = (up_ac, up_version, sqe) + if cache_key in upkb_entry_cache: + return upkb_entry_cache[cache_key] + # fetch and parse first guess + if up_version is None: + up_data = _fetch_upkb_entry(up_ac) + else: + up_data = _fetch_unisave_entry(up_ac, up_version) + # find best match starting from first guess + min_up_data = None + while True: + mismatches, up_range, mdl_range, covered_aln, mdl_seqres = _align_sequences( + sqe, up_data["up_sequence"], atomseq_aln=False) + + if min_up_data is None or \ + len(mismatches) < len(min_up_data["mismatches"]): + min_up_data = up_data + min_up_data["mismatches"] = mismatches + min_up_data["up_range"] = up_range + min_up_data["mdl_range"] = mdl_range + min_up_data["covered_aln"] = covered_aln + min_up_data["mdl_seqres"] = mdl_seqres + if len(mismatches) == 0: + # found hit; done + break + # fetch next one (skip if exceptions happen) + next_v = up_data["up_entry_version"] - 1 + while next_v > 0: + try: + # note: can fail to parse very old UP versions... + up_data = _fetch_unisave_entry(up_ac, next_v) + # can move on if no exception happened + break + except RuntimeError as ex: + # _warn_msg(f"Error in parsing v{next_v} of {up_ac}:\n{ex}") + # try next one + next_v -= 1 + if next_v == 0: + # warn user about failure to find match and abort + msg = f"Sequences not equal from file: {sqe}, from UniProtKB: " \ + f"{min_up_data['up_sequence']} ({up_ac}), checked entire " \ + f"entry history and best match had following mismatches " \ + f"in v{min_up_data['up_entry_version']} (range " \ + f"{min_up_data['up_range']}): {min_up_data['mismatches']}." + _warn_msg(msg) + upkb_entry_cache[cache_key] = min_up_data + return min_up_data + # keep in cache + upkb_entry_cache[cache_key] = up_data + return min_up_data + + + +def _align_sequences(mdl_sqe, ref_sqe, atomseq_aln=True, ref_fixes=[], + gapped_aa="XOUBJZ"): + """Compare sequence while paying attention on non-standard amino acids. + + Can pass list of tuples for OLCs expected to be changed between ref and mdl. + E.g. Jason set would have ref_fixes=[('B', 'D'), ('J', 'L'), ('Z', 'E')]. + Non-def. AA (listed in gapped_aa) in ref_sqe are assumed to be gaps (-) in + mdl_sqe (this is def. in CF/AF for "XOUBJZ"). + + Returns (mismatches, ref_range, mdl_range, covered_aln, mdl_seqres): + - mismatches = list of (ref_pos, mdl_pos, olc_ref, olc_mdl) + (positions are 1-indexed, None if gap and only if in range) + - ref_range / mdl_range = (start, end) tuples with 1-indexed positions of + start and end of covered range (mdl_range with respect to mdl_sqe!). + Extra non-covered residues in mdl or ref can be counted by comparing + ref_range / mdl_range with lengths of ref_sqe / mdl_sqe. + - covered_aln = alignment (seq. 0 = REF, seq. 1 = MDL) within covered range + (i.e. first and last column have no gaps). If atomseq_aln is True, the + alignment only includes non-gap residues of mdl_sqe. OST seq. offsets are + set with respect to mdl_sqe, ref_sqe (0-indexed). Note that offsets are + only guaranteed to fit ref_range / mdl_range if atomseq_aln is False. + - mdl_seqres = mdl_sqe with gaps (-) replaced with seq. from ref. if + non-def-AA there or with X otherwise (i.e. both have same length!). + Here guaranteed for mdl_seqres to match mdl_sqe if AA in gapped_aa and X + are replaced by gaps (-). + """ + # add fixes if needed + ref_sqe_fixed = ref_sqe + for olc1, olc2 in ref_fixes: + ref_sqe_fixed = ref_sqe_fixed.replace(olc1, olc2) + # put gaps for parts not modelled by AF2 (i.e. any non-def-AA) + ref_sqe_fixed_gapped = ref_sqe_fixed + for olc in gapped_aa: + assert olc not in mdl_sqe + ref_sqe_fixed_gapped = ref_sqe_fixed_gapped.replace(olc, '-') + # easy and preferred case: mdl_sqe is subset of ref_sqe + ref_idx = ref_sqe_fixed_gapped.find(mdl_sqe) + if ref_idx >= 0: + mismatches = [] + ref_range = (ref_idx + 1, ref_idx + len(mdl_sqe)) + mdl_range = (1, len(mdl_sqe)) + mdl_seqres = ref_sqe_fixed[ref_idx : ref_idx + len(mdl_sqe)] + # we handle covered_aln afterwards... + aln_s_ref = ref_sqe[ref_idx : ref_idx + len(mdl_sqe)] + aln_s_mdl = mdl_seqres + else: + # align and report mismatches + ref_seq = seq.CreateSequence("REF", ref_sqe_fixed) + # use X as first guess for gaps in model + mdl_seq = seq.CreateSequence("MDL", mdl_sqe.replace('-', 'x')) + aln = seq.alg.SemiGlobalAlign(ref_seq, mdl_seq, seq.alg.BLOSUM62)[0] + # get range + aligned_indices = [i for i, c in enumerate(aln) \ + if c[0] != '-' and c[1] != '-'] + ref_range = ( + aln.GetResidueIndex(0, aligned_indices[0]) + 1, + aln.GetResidueIndex(0, aligned_indices[-1]) + 1, + ) + mdl_range = ( + aln.GetResidueIndex(1, aligned_indices[0]) + 1, + aln.GetResidueIndex(1, aligned_indices[-1]) + 1, + ) + # build up strings as we go + aln_s_ref = "" + aln_s_mdl = "" + mdl_seqres = "" + # collect mismatches and fix seqs as we go + mismatches = [] + for idx, (olc_ref, olc_mdl) in enumerate(aln): + # fix seqres as needed + if olc_mdl == 'x' and olc_ref in gapped_aa: + olc_mdl = olc_ref + if olc_mdl != '-': + mdl_seqres += olc_mdl + if idx >= aligned_indices[0] and idx <= aligned_indices[-1]: + # fill aln_s_x as needed + if olc_ref != '-': + # must fetch from ref_sqe + ref_idx = aln.GetResidueIndex(0, idx) + aln_s_ref += ref_sqe[ref_idx] + ref_pos = ref_idx + 1 + else: + aln_s_ref += '-' + ref_pos = None + if olc_mdl != '-': + # fetch from mdl_seqres here + # (revert back to mdl_sqe afterwards) + mdl_idx = aln.GetResidueIndex(1, idx) + aln_s_mdl += mdl_seqres[mdl_idx] + mdl_pos = mdl_idx + 1 + else: + aln_s_mdl += '-' + mdl_pos = None + if olc_ref != olc_mdl: + mismatches.append((ref_pos, mdl_pos, olc_ref, olc_mdl)) + # fix remaining x in mdl_seqres + mdl_seqres = mdl_seqres.replace('x', 'X') + # create covered_aln + s_ref_offset = ref_range[0] - 1 + s_mdl_offset = mdl_range[0] - 1 + covered_aln = seq.CreateAlignment( + seq.CreateSequence("REF", aln_s_ref), + seq.CreateSequence("MDL", aln_s_mdl.replace('x', 'X')) + ) + # cut it once again if needed (only for atomseq_aln) + if atomseq_aln: + # revert + new_cols = [ + ( + olc_ref, + '-' if olc_mdl == 'x' or olc_mdl in gapped_aa else olc_mdl + ) for olc_ref, olc_mdl in zip(aln_s_ref, aln_s_mdl) + ] + aligned_indices = [i for i, c in enumerate(new_cols) \ + if c[0] != '-' and c[1] != '-'] + s_ref_offset += covered_aln.GetResidueIndex(0, aligned_indices[0]) + s_mdl_offset += covered_aln.GetResidueIndex(1, aligned_indices[0]) + cut_cols = new_cols[aligned_indices[0]:aligned_indices[-1]+1] + aln_s_ref = "".join([olc_ref for olc_ref, _ in cut_cols]) + aln_s_mdl = "".join([olc_mdl for _, olc_mdl in cut_cols]) + covered_aln = seq.CreateAlignment( + seq.CreateSequence("REF", aln_s_ref), + seq.CreateSequence("MDL", aln_s_mdl) + ) + covered_aln.SetSequenceOffset(0, s_ref_offset) + covered_aln.SetSequenceOffset(1, s_mdl_offset) + # check post assertions (as in docstring) + assert ref_sqe[covered_aln.GetSequenceOffset(0):]\ + .startswith(covered_aln.sequences[0].gapless_string) + if atomseq_aln: + assert mdl_sqe[covered_aln.GetSequenceOffset(1)] \ + == covered_aln.sequences[1].gapless_string[0] + assert mdl_sqe[covered_aln.GetSequenceOffset(1):].replace('-', '')\ + .startswith(covered_aln.sequences[1].gapless_string) + else: + assert covered_aln.sequences[0].gapless_string \ + == ref_sqe[ref_range[0]-1:ref_range[1]] + assert covered_aln.sequences[1].gapless_string \ + == mdl_seqres[mdl_range[0]-1:mdl_range[1]] + assert mdl_seqres[covered_aln.GetSequenceOffset(1):]\ + .startswith(covered_aln.sequences[1].gapless_string) + assert len(mdl_seqres) == len(mdl_sqe) + mdl_sqe_check = mdl_seqres.replace('X', '-') + for olc in gapped_aa: + mdl_sqe_check = mdl_sqe_check.replace(olc, '-') + assert mdl_sqe_check == mdl_sqe.replace('X', '-') + # + return mismatches, ref_range, mdl_range, covered_aln, mdl_seqres + + +def _get_entities(mdl_data_from_zip, metadata): + """Gather data for the mmCIF (target) entities. + Returns (list of cif_ents, list of issues) + """ + issues = [] + # merge info for matching chains + unique_chains = {} # key = sqe_gaps, value = partial cif_ent + chain_info = {ch["chain"]: { + "name": ch["name"], "up_ac": ch["up_ac"], "up_range": ch["up_range"] + } for ch in metadata["chains"]} + ost_ent = mdl_data_from_zip["ent"] + #make sure the chain labels start with A and are consistent with the metadata + if _get_ch_name(ost_ent.chains[0], False) != 'A': + issues.append(( + metadata['mdl_id'], + "shifted_chain_label", + (), + () + )) + editor = ost_ent.EditXCS(mol.EditMode.BUFFERED_EDIT) + for chn in ost_ent.chains: + editor.RenameChain(chn, shift_letter(_get_ch_name(chn, False))) + editor.UpdateICS() + for chn in ost_ent.chains: + pdb_chain_id = _get_ch_name(chn, False) + if pdb_chain_id not in chain_info: + raise RuntimeError( + f"Non-described chain {pdb_chain_id} in " \ + f"{metadata['mdl_id']}/{mdl_data_from_zip['mdl_file_base']}" + ) + sqe_gaps = _get_sequence(chn) + cif_ent = { + "pdb_sequence": sqe_gaps, + "pdb_chain_ids": [_get_ch_name(chn, False)], + "description": chain_info[pdb_chain_id]["name"], + "up_ac": chain_info[pdb_chain_id]["up_ac"], + # expected up range as parsed in metadata + "exp_up_range": chain_info[pdb_chain_id]["up_range"], + } + if sqe_gaps in unique_chains: + other_cif_ent = unique_chains[sqe_gaps] + # sanity checks + for key, value in other_cif_ent.items(): + if key != "pdb_chain_ids" and value != cif_ent[key]: + raise RuntimeError( + f"Inconsistent info {key} for identical chains for " \ + f"chain {pdb_chain_id} vs chains " \ + f"{other_cif_ent['pdb_chain_ids']}." + ) + # add to list of chains + other_cif_ent['pdb_chain_ids'].append(pdb_chain_id) + else: + unique_chains[sqe_gaps] = cif_ent + # sort by model chain name (should ensure same order of chains in mmCIF) + entities = sorted( + unique_chains.values(), + key=lambda x: min(x["pdb_chain_ids"]) + ) + # compare with info from UP and complete data to return + for cif_ent in entities: + sqe_gaps = cif_ent["pdb_sequence"] + up_ac = cif_ent["up_ac"] + if metadata['mdl_id'] == "263-Spombe-Mrc1_like_domain-Mcm2_NTD-Cdc45-H31-H4_tetramer" and cif_ent["pdb_chain_ids"][0] == 'C': + #special case with 5 pdb_extra aa in one chain in one model, the allignment is forced to start at position 6 + position_shift = 5 + up_data_original = _fetch_upkb_cached(sqe_gaps[position_shift:], up_ac) + up_data = up_data_original.copy() + if up_data['mdl_range'][0] == 1: up_data['mdl_range'] = tuple(range + position_shift for range in up_data['mdl_range']) + up_data_full = _fetch_upkb_cached(sqe_gaps, up_ac) + up_data['mismatches']= [tuple([854, 209, 'D', 'E'])] + up_data['mdl_seqres'] = up_data_full['mdl_seqres'] + _warn_msg("mdl_263 handeled with special harcoded alignment for chain 'C'") + else: + up_data = _fetch_upkb_cached(sqe_gaps, up_ac) + num_extra_ref = len(up_data["up_sequence"]) - (up_data["up_range"][1] - up_data["up_range"][0] + 1) + len_mdl_covered = (up_data["mdl_range"][1] - up_data["mdl_range"][0] + 1) + num_extra_mdl = len(sqe_gaps) - len_mdl_covered + if len(up_data["mismatches"]) > 0 or num_extra_ref > 0 or num_extra_mdl > 0: + # ok to cover subset of UP usually (e.g. Ubiquitin), rest big issue + if len(up_data["mismatches"]) > 0 or num_extra_mdl > 0: + issue_type = "up_mismatch" + else: + issue_type = "up_extra" + if cif_ent['exp_up_range'] == None: + cif_up_range = (1, len(up_data["up_sequence"])) + else: + cif_up_range = tuple( + map(int, cif_ent['exp_up_range'].split('-'))) + if (issue_type == "up_extra" and up_data["up_range"] != cif_up_range) or issue_type == "up_mismatch": + chain_names = ",".join(cif_ent["pdb_chain_ids"]) + short_data = ( + mdl_data_from_zip['mdl_file_base'], chain_names, up_ac, + len_mdl_covered, len(up_data["mismatches"]), num_extra_ref, num_extra_mdl + ) + long_data = (up_data["mismatches"], up_data["up_range"], up_data["mdl_range"]) + issues.append( + (metadata['mdl_id'], issue_type, short_data, long_data) + ) + # cannot deal with gapped sequences here as we cannot map to scores + if sqe_gaps != up_data["mdl_seqres"]: + issues.append(( + metadata['mdl_id'], + "gapped_seq", + (cif_ent['pdb_chain_ids']), + (sqe_gaps, up_data["mdl_seqres"]) + )) + cif_ent["seqres"] = up_data["mdl_seqres"] + cif_ent.update(up_data) + return entities, issues + + +def _get_cf_config(cf_config, ur30_db_version=None, tpl_db=None, + tpl_db_version=None): + """Define ColabFold setup. + Extra info needed from depositor for DBs used (depend on MMseqs2 server) + - ur30_db_version options: see dict in _get_sequence_dbs + - tpl_db options: None, "PDB70", "PDB100" + - tpl_db_version options: see dict in _get_sequence_dbs + -> can be set to None if DB not used at all (incl. custom tpls) + Note on versions used over time + - first: 2021_03 version of UniRef30, unclear what PDB70 + - after 13.7.22: updated the UniRef30 to 2022_02 and PDB70 to 220313 + - after 12.6.23: UniRef30 2023_02, PDB100 (instead of PDB70) 230517 + - also to define if DB used at all for tpls or custom tpls + - db versions only relevant if actually used + """ + # NOTES: + # - UP-TO-DATE (as of March 2024) generic parser given a config.json dict + # - custom MSA is assumed to be complemented with extra step (as for Jason) + + # keep version indep. of params (and add commit since versions are meh) + cf_version = cf_config["version"] + if "commit" in cf_config and cf_config["commit"] is not None: + cf_version += f" ({cf_config['commit'][:7]})" + # drop fields which are not relevant for model building + cf_config = cf_config.copy() + for key in ["num_queries", "commit", "version", "user_agent"]: + if key in cf_config: + del cf_config[key] + + # NOTE: following code from + # https://github.com/sokrypton/ColabFold/blob/main/colabfold/batch.py to + # understand config + # -> should be backward compatible with Tara and Niko sets + # -> see also https://github.com/sokrypton/ColabFold/wiki/v1.5.0 + + # deal with old names (some settings changed name in v1.5) + # -> code taken almost verbatim from https://github.com/sokrypton/ColabFold + old_names = {"MMseqs2 (UniRef+Environmental)": "mmseqs2_uniref_env", + "MMseqs2 (UniRef only)": "mmseqs2_uniref", + "unpaired+paired": "unpaired_paired", + "AlphaFold2-multimer-v1": "alphafold2_multimer_v1", + "AlphaFold2-multimer-v2": "alphafold2_multimer_v2", + "AlphaFold2-multimer-v3": "alphafold2_multimer_v3", + "AlphaFold2-ptm": "alphafold2_ptm", + "AlphaFold2": "alphafold2"} + msa_mode = old_names.get(cf_config["msa_mode"], cf_config["msa_mode"]) + if "pair_mode" in cf_config: + pair_mode = old_names.get(cf_config["pair_mode"], cf_config["pair_mode"]) + model_type = old_names.get(cf_config["model_type"], cf_config["model_type"]) + + # fix v1.5 defaults for num_recycles and recycle_early_stop_tolerance + # -> def. (set as "null" in config): + # - num_recycles == 20 if alphafold2_multimer_v3 else 3 + # - recycle_early_stop_tolerance == 0.5 if multimer else 0.0 + # -> valid from 1.5.0 until 1.5.5 (and probably later) + # -> defined in alphafold/model/config.py of steineggerlab/alphafold repo + if "num_recycles" in cf_config and cf_config["num_recycles"] is None: + if "multimer" in model_type and model_type not in [ + "alphafold2_multimer_v1", "alphafold2_multimer_v2" + ]: + cf_config["num_recycles"] = 20 + else: + cf_config["num_recycles"] = 3 + if "recycle_early_stop_tolerance" in cf_config \ + and cf_config["recycle_early_stop_tolerance"] is None: + cf_config["recycle_early_stop_tolerance"] = \ + 0.5 if "multimer" in model_type else 0.0 + + # remove null config entries (ASSUME: None = use default) + cf_config = {k: v for k, v in cf_config.items() if v is not None} + + # fetch relevant data + # -> MSA mode + if msa_mode == "mmseqs2_uniref_env": + seq_dbs = ["UniRef", "Environmental"] + use_mmseqs = True + use_msa = True + elif msa_mode == "mmseqs2_uniref": + seq_dbs = ["UniRef"] + use_mmseqs = True + use_msa = True + elif msa_mode == "single_sequence": + seq_dbs = [] + use_mmseqs = False + use_msa = False + elif msa_mode == "custom": + seq_dbs = [] + use_mmseqs = False + use_msa = True + else: + raise ValueError(f"Unknown msa_mode {cf_config['msa_mode']}") + + # -> model type + if model_type == "alphafold2_multimer_v1": + # AF-Multimer as introduced in AlphaFold v2.1.0 + use_multimer = True + multimer_version = 1 + elif model_type == "alphafold2_multimer_v2": + # AF-Multimer as introduced in AlphaFold v2.2.0 + use_multimer = True + multimer_version = 2 + elif model_type == "alphafold2_multimer_v3": + # AF-Multimer as introduced in AlphaFold v2.3.0 + use_multimer = True + multimer_version = 3 + elif model_type == "alphafold2_ptm": + use_multimer = False + multimer_version = None + else: + raise ValueError(f"Unknown model_type {cf_config['model_type']}") + + # write modeling description + mdl_description = f"Model generated using ColabFold v{cf_version}" + if use_multimer: + mdl_description += f" with AlphaFold-Multimer (v{multimer_version})" + else: + mdl_description += " with AlphaFold" + # early stopping feature of ColabFold + upto_mdl = "" + upto_rec = "" + if cf_config.get("stop_at_score", 100) < 100: + upto_mdl = "up to " + upto_rec = "up to " + if cf_config.get("recycle_early_stop_tolerance", 0) > 0: + upto_rec = "up to " + if cf_config.get("num_seeds", 1) > 1: + mdl_str = f"{cf_config['num_models'] * cf_config['num_seeds']} " \ + f"models ({cf_config['num_seeds']} random seeds per " \ + f"parameter set)" + else: + mdl_str = f"{cf_config['num_models']} models" + mdl_description += f" producing {upto_mdl}{mdl_str} with {upto_rec}" \ + f"{cf_config['num_recycles']} recycles each" + if cf_config.get("use_amber", False) or \ + cf_config.get("num_relax", 0) > 0: + mdl_description += ", with AMBER relaxation" + else: + mdl_description += ", without model relaxation" + if cf_config["use_templates"]: + # tpl_db == None meant to mean that custom templates were used + # -> no need to stress it but just visible in search DBs + mdl_description += ", using templates" + else: + mdl_description += ", without templates" + tpl_db = None + tpl_db_version = None + if cf_config["rank_by"] == "plddt": + mdl_description += ", ranked by pLDDT" + elif cf_config["rank_by"] == "ptmscore": + mdl_description += ", ranked by pTM" + elif cf_config["rank_by"] == "multimer": + mdl_description += ", ranked by 80*ipTM+20*pTM" + else: + raise ValueError(f"Unknown rank_by {cf_config['rank_by']}") + if use_msa: + mdl_description += ", starting from" + if use_mmseqs: + msa_type = "MSA" + else: + msa_type = "custom MSA" + if use_multimer: + if pair_mode == "unpaired_paired": + mdl_description += f" paired and unpaired {msa_type}s" + elif pair_mode == "paired": + mdl_description += f" paired {msa_type}s" + elif pair_mode == "unpaired": + mdl_description += f" unpaired {msa_type}s" + else: + raise ValueError(f"Unknown pair_mode {cf_config['pair_mode']}") + elif msa_type.startswith('M'): + mdl_description += f" an {msa_type}" + else: + mdl_description += f" a {msa_type}" + if use_mmseqs: + mdl_description += f" from MMseqs2 ({'+'.join(seq_dbs)})" + else: + mdl_description += " without an MSA" + mdl_description += "." + + return { + "params": cf_config, + "version": cf_version, + "seq_dbs": seq_dbs, + "use_mmseqs": use_mmseqs, + "use_msa": use_msa, + "ur30_db_version": ur30_db_version, + "tpl_db": tpl_db, + "tpl_db_version": tpl_db_version, + "use_multimer": use_multimer, + "multimer_version": multimer_version, + "description": mdl_description, + } + + +def _get_mmseqs2_software(version=None): + """Get MMseqs2 as a dictionary, suitable to create a modelcif software + object.""" + return { + "name": "MMseqs2", + "classification": "data collection", + "description": "Many-against-Many sequence searching", + "citation": ihm.citations.mmseqs2, + "location": "https://github.com/soedinglab/mmseqs2", + "type": "package", + "version": version, + } + + +def _get_colabfold_software(version=None): + """Get ColabFold as a dictionary, suitable to create a modelcif software + object.""" + return { + "name": "ColabFold", + "classification": "model building", + "description": "Structure prediction", + "citation": ihm.citations.colabfold, + "location": "https://github.com/sokrypton/ColabFold", + "type": "package", + "version": version, + } + + +def _get_af2_software(version=None, is_multimer=False): + """Get AF2 as dictionary, suitable to create a modelcif software object.""" + if is_multimer: + return { + "name": "AlphaFold-Multimer", + "classification": "model building", + "description": "Structure prediction", + "citation": ihm.Citation( + pmid=None, + title="Protein complex prediction with " + + "AlphaFold-Multimer.", + journal="bioRxiv", + volume=None, + page_range=None, + year=2021, + authors=[ + "Evans, R.", + "O'Neill, M.", + "Pritzel, A.", + "Antropova, N.", + "Senior, A.", + "Green, T.", + "Zidek, A.", + "Bates, R.", + "Blackwell, S.", + "Yim, J.", + "Ronneberger, O.", + "Bodenstein, S.", + "Zielinski, M.", + "Bridgland, A.", + "Potapenko, A.", + "Cowie, A.", + "Tunyasuvunakool, K.", + "Jain, R.", + "Clancy, E.", + "Kohli, P.", + "Jumper, J.", + "Hassabis, D.", + ], + doi="10.1101/2021.10.04.463034", + ), + "location": "https://github.com/deepmind/alphafold", + "type": "package", + "version": version, + } + else: + return { + "name": "AlphaFold", + "classification": "model building", + "description": "Structure prediction", + "citation": ihm.citations.alphafold2, + "location": "https://github.com/deepmind/alphafold", + "type": "package", + "version": version, + } + + +def _get_protocol_steps_and_software(cf_config): + """Create the list of protocol steps with software and parameters used.""" + protocol = [] + + # build up SW + sw_plus_params = [ + ( + _get_colabfold_software(cf_config["version"]), cf_config["params"] + ) + ] + if cf_config["use_mmseqs"]: + sw_plus_params.append((_get_mmseqs2_software(), {})) + sw_plus_params.append(( + _get_af2_software(is_multimer=cf_config["use_multimer"]), {} + )) + + # modelling step + protocol.append({ + "method_type": "modeling", + "name": None, + "details": cf_config["description"], + "input": "target_sequences_and_ref_DBs", + "output": "model", + "software_plus_params": sw_plus_params, + }) + + return protocol + + +def _get_title(metadata): + """Get a title for this modelling experiment.""" + return metadata["title"].strip() + + +def _get_model_details(metadata): + """Get the model description.""" + return metadata["abstract"].strip() +################################################################################ + +################################################################################ +# ModelCIF HANDLING +################################################################################ +# pylint: disable=too-few-public-methods +class _GlobalPTM(modelcif.qa_metric.Global, modelcif.qa_metric.PTM): + """Predicted accuracy according to the TM-score score in [0,1]""" + + name = "pTM" + software = None + + +class _GlobalIpTM(modelcif.qa_metric.Global, modelcif.qa_metric.IpTM): + """Predicted protein-protein interface score based on TM-score in [0,1]""" + + name = "ipTM" + software = None + + +class _GlobalPLDDT(modelcif.qa_metric.Global, modelcif.qa_metric.PLDDT): + """Predicted accuracy according to the CA-only lDDT in [0,100]""" + name = "pLDDT" + software = None + + +class _LocalPLDDT(modelcif.qa_metric.Local, modelcif.qa_metric.PLDDT): + """Predicted accuracy according to the CA-only lDDT in [0,100]""" + name = "pLDDT" + software = None + + +class _LocalPairwisePAE(modelcif.qa_metric.LocalPairwise, modelcif.qa_metric.PAE): + """Predicted aligned error (in Angstroms)""" + + name = "PAE" + software = None + + +class _LPeptideAlphabetWithXO(ihm.LPeptideAlphabet): + """Have the default amino acid alphabet plus 'X' for unknown residues + and 'O' as allowed non-def. AA (U already in alphabet).""" + + # extra entry added according to LPeptideAlphabet def. in + # https://python-ihm.readthedocs.io/en/latest/_modules/ihm.html + # and https://files.rcsb.org/view/1NTH.cif for values for 'O'. + + def __init__(self): + """Create the alphabet.""" + super().__init__() + self._comps["X"] = self._comps["UNK"] + self._comps['O'] = ihm.LPeptideChemComp( + "PYL", "O", "O", "PYRROLYSINE", "C12 H21 N3 O3" + ) +# pylint: enable=too-few-public-methods + + +class _OST2ModelCIF(modelcif.model.AbInitioModel): + """Map OST entity elements to ihm.model""" + + def __init__(self, *args, **kwargs): + """Initialise a model""" + for i in ["ost_entity", "asym", "scores_json", "incl_pae"]: + if i not in kwargs: + raise TypeError(f"Required keyword argument '{i}' not found.") + self.ost_entity = kwargs.pop("ost_entity") + self.asym = kwargs.pop("asym") + self.scores_json = kwargs.pop("scores_json") + self.incl_pae = kwargs.pop("incl_pae") + + # use auth IDs for res. nums and chain names + self.use_auth = False + # what accuracy to use for PAE? (writer uses 3 anyway) + self.pae_digits = 3 + + super().__init__(*args, **kwargs) + + def get_atoms(self): + # ToDo [internal]: Take B-factor out since its not a B-factor? + # NOTE: this assumes that _get_res_num maps residue to pos. in seqres + # within asym + for atm in self.ost_entity.atoms: + yield modelcif.model.Atom( + asym_unit=self.asym[_get_ch_name(atm.chain, self.use_auth)], + seq_id=_get_res_num(atm.residue, self.use_auth), + atom_id=atm.name, + type_symbol=atm.element, + x=atm.pos[0], + y=atm.pos[1], + z=atm.pos[2], + het=atm.is_hetatom, + biso=atm.b_factor, + occupancy=atm.occupancy, + ) + + def add_scores(self): + """Add QA metrics from AF2 scores.""" + # global scores + self.qa_metrics.extend( + ( + _GlobalPLDDT(self.scores_json["plddt_global"]), + _GlobalPTM(self.scores_json["ptm"]), + ) + ) + if "iptm" in self.scores_json: + self.qa_metrics.extend( + ( + _GlobalIpTM(self.scores_json["iptm"]), + ) + ) + + # NOTE: none of the below expected top work if we have unmodelled gaps! + + # local scores + lpae = [] + i = 0 + for chn_i in self.ost_entity.chains: + ch_name_i = _get_ch_name(chn_i, self.use_auth) + for res_i in chn_i.residues: + # local pLDDT + res_num_i = _get_res_num(res_i, self.use_auth) + self.qa_metrics.append( + _LocalPLDDT( + self.asym[ch_name_i].residue(res_num_i), + self.scores_json["plddt"][i], + ) + ) + + # pairwise alignment error + if self.incl_pae: + j = 0 + for chn_j in self.ost_entity.chains: + ch_name_j = _get_ch_name(chn_j, self.use_auth) + for res_j in chn_j.residues: + res_num_j = _get_res_num(res_j, self.use_auth) + pae_ij = self.scores_json["pae"][i][j] + lpae.append( + _LocalPairwisePAE( + self.asym[ch_name_i].residue(res_num_i), + self.asym[ch_name_j].residue(res_num_j), + round(pae_ij, self.pae_digits), + ) + ) + j += 1 + + i += 1 + + if self.incl_pae: + self.qa_metrics.extend(lpae) + + +def _get_modelcif_entities(target_ents, asym_units, system): + """Create ModelCIF entities and asymmetric units.""" + alphabet = _LPeptideAlphabetWithXO() + for cif_ent in target_ents: + # collect references + up_ref = modelcif.reference.UniProt( + code=cif_ent["up_id"], + accession=cif_ent["up_ac"], + isoform=cif_ent["up_isoform"], + ncbi_taxonomy_id=cif_ent["up_ncbi_taxid"], + organism_scientific=cif_ent["up_organism"], + sequence_version_date=cif_ent["up_last_mod"], + sequence_crc64=cif_ent["up_crc64"], + sequence=cif_ent["up_sequence"], + ) + # ASSUME: full model covered w/o mismatches + # -> NOTE: sequence passed above is cut based on alignments! + up_ref.alignments.append(modelcif.reference.Alignment( + db_begin=cif_ent["up_range"][0], + db_end=cif_ent["up_range"][1], + entity_begin=cif_ent["mdl_range"][0], + entity_end=cif_ent["mdl_range"][1], + seq_dif=[ + ihm.reference.SeqDif( + mismatch[1], + alphabet[mismatch[2]], + alphabet[mismatch[3]], + ) for mismatch in cif_ent["mismatches"] + ] + )) + # + references = [up_ref] + # combine into ModelCIF entity + mdlcif_ent = modelcif.Entity( + cif_ent["seqres"], + description=cif_ent["description"], + alphabet=alphabet, + source=ihm.source.Natural( + ncbi_taxonomy_id=cif_ent["up_ncbi_taxid"], + scientific_name=cif_ent["up_organism"], + ), + references=references, + ) + # NOTE: this assigns (potentially new) alphabetic chain names + for pdb_chain_id in cif_ent["pdb_chain_ids"]: + asym_units[pdb_chain_id] = modelcif.AsymUnit( + mdlcif_ent, strand_id=pdb_chain_id, + ) + system.entities.append(mdlcif_ent) + + +def _get_assoc_pae_file(entry_id, mdl_name): + """Generate a associated file object to extract PAE to extra file.""" + return modelcif.associated.LocalPairwiseQAScoresFile( + f"{mdl_name}_local_pairwise_qa.cif", + categories=["_ma_qa_metric_local_pairwise"], + copy_categories=["_ma_qa_metric"], + entry_id=entry_id, + entry_details="This file is an associated file consisting " + + "of local pairwise QA metrics. This is a partial mmCIF " + + "file and can be validated by merging with the main " + + "mmCIF file containing the model coordinates and other " + + "associated data.", + details="Predicted aligned error", + ) + + +def _get_assoc_png_file(fle_path, png_type): + """Generate a modelcif.associated.File object pointing to PNG file + with content defined by png_type (coverage, plddt, or pae). + """ + details = { + "coverage": "PNG file showing number of sequences in the MSA covering " + + "each position in the target sequences", + "plddt": "PNG file showing pLDDT at each residue position for each " + + "of the 5 models produced", + "pae": "PNG file showing the PAE matrices for each of the 5 models " + + "produced", + } + afile = modelcif.associated.File( + fle_path, + details=details[png_type], + ) + # NOTE: file_format can be set to "png" in future ModelCIF versions + # (i.e. when https://github.com/ihmwg/ModelCIF/issues/17 is resolved) + afile.file_format = "other" + afile.file_content = "other" + return afile + + +def _get_assoc_mdl_file(fle_path, data_json): + """Generate a modelcif.associated.File object that looks like a CIF file. + The dedicated CIFFile functionality in modelcif would also try to write it. + """ + details=f"#{data_json['mdl_rank']} ranked model; " + f"pLDDT {round(data_json['plddt_global'], 1)}, " + f"pTM {round(data_json['ptm'], 3)}" + if 'iptm' in data_json: + details = details + f", ipTM {round(data_json['iptm'], 3)}" + cfile = modelcif.associated.File(fle_path, details) + cfile.file_format = "cif" + return cfile + + +def _get_assoc_zip_file(fle_path, data_json): + """Create a modelcif.associated.File object that looks like a ZIP file. + This is NOT the archive ZIP file for the PAEs but to store that in the + ZIP archive of the selected model.""" + zfile = modelcif.associated.File( + fle_path, + details="archive with multiple files for " + + f"#{data_json['mdl_rank']} ranked model", + ) + zfile.file_format = "other" + return zfile + + +def _get_associated_files(mdl_name, arc_files): + """Create entry for associated files.""" + # package all into zip file + return modelcif.associated.Repository( + "", + [modelcif.associated.ZipFile(f"{mdl_name}.zip", files=arc_files)], + ) + # NOTE: by convention MA expects zip file with same name as model-cif + + +def _get_sequence_dbs(config_data): + """Get ColabFold seq. DBs.""" + # Uses HC list of known DBs used in ColabFold + # -> see also notes in _get_config + db_dict = { + "UniRef_2021_03": modelcif.ReferenceDatabase( + "UniRef30", + "https://wwwuser.gwdg.de/~compbiol/colabfold/uniref30_2103.tar.gz", + version="2021_03", + ), + "UniRef_2022_02": modelcif.ReferenceDatabase( + "UniRef30", + "https://wwwuser.gwdg.de/~compbiol/colabfold/uniref30_2202.tar.gz", + version="2022_02", + ), + "UniRef_2023_02": modelcif.ReferenceDatabase( + "UniRef30", + "https://wwwuser.gwdg.de/~compbiol/colabfold/uniref30_2302.tar.gz", + version="2023_02", + ), + "Environmental": modelcif.ReferenceDatabase( + "ColabFold DB", + "https://wwwuser.gwdg.de/~compbiol/colabfold/" + + "colabfold_envdb_202108.tar.gz", + version="2021_08", + ), + "PDB100_230517": modelcif.ReferenceDatabase( + "PDB100", + "https://wwwuser.gwdg.de/~compbiol/data/hhsuite/databases/" + + "hhsuite_dbs/pdb100_foldseek_230517.tar.gz", + release_date=datetime.datetime(2023, 5, 17) + ), + "PDB70_211027": modelcif.ReferenceDatabase( + "PDB70", + "https://wwwuser.gwdg.de/~compbiol/data/hhsuite/databases/" + + "hhsuite_dbs/pdb70_from_mmcif_211027.tar.gz", + release_date=datetime.datetime(2021, 10, 27) + ), + "PDB70_211117": modelcif.ReferenceDatabase( + "PDB70", + "https://wwwuser.gwdg.de/~compbiol/data/hhsuite/databases/" + + "hhsuite_dbs/pdb70_from_mmcif_211117.tar.gz", + release_date=datetime.datetime(2021, 11, 17) + ), + "PDB70_220313": modelcif.ReferenceDatabase( + "PDB70", + "https://wwwuser.gwdg.de/~compbiol/data/hhsuite/databases/" + + "hhsuite_dbs/pdb70_from_mmcif_220313.tar.gz", + release_date=datetime.datetime(2022, 3, 13) + ), + } + # fill list of DBs + seq_dbs = [] + search_keys = [] + for seq_db in config_data["seq_dbs"]: + if seq_db == "UniRef": + if config_data['ur30_db_version'] is None: + raise ValueError("Cannot use UniRef without version") + search_key = f"UniRef_{config_data['ur30_db_version']}" + else: + search_key = seq_db + search_keys.append(search_key) + if config_data["tpl_db"] is not None: + if config_data["tpl_db_version"] is None: + raise ValueError("Cannot have tpl DB without version") + search_keys.append( + f"{config_data['tpl_db']}_{config_data['tpl_db_version']}" + ) + for search_key in search_keys: + if search_key not in db_dict: + raise ValueError(f"Unknown seq. DB {search_key}") + seq_dbs.append(db_dict[search_key]) + return seq_dbs + + +def _assemble_modelcif_software(soft_dict, params_dict): + """Create a modelcif.SoftwareWithParameters instance from dictionaries.""" + # create SW object + sw = modelcif.Software( + soft_dict["name"], + soft_dict["classification"], + soft_dict["description"], + soft_dict["location"], + soft_dict["type"], + soft_dict["version"], + citation=soft_dict["citation"], + ) + # assemble parameters + params = [] + for key, val in params_dict.items(): + params.append(modelcif.SoftwareParameter(key, val)) + # put them together + return modelcif.SoftwareWithParameters(sw, params) + + +def _get_modelcif_protocol_software(js_step): + """Assemble software entries for a ModelCIF protocol step.""" + # new setup in python-modelcif (as of late 2023): params with each SW + sw_list = [] + for sw, sw_params in js_step["software_plus_params"]: + sw_list.append(_assemble_modelcif_software(sw, sw_params)) + # group and done... + if sw_list: + return modelcif.SoftwareGroup(sw_list) + else: + return None + + +def _get_modelcif_protocol_data(data_label, target_entities, model, ref_dbs): + """Assemble data for a ModelCIF protocol step.""" + if data_label == "target_sequences_and_ref_DBs": + data = modelcif.data.DataGroup(target_entities) + data.extend(ref_dbs) + elif data_label == "model": + data = model + else: + raise RuntimeError(f"Unknown protocol data: '{data_label}'") + return data + + +def _get_modelcif_protocol(protocol_steps, target_entities, model, ref_dbs): + """Create the protocol for the ModelCIF file.""" + protocol = modelcif.protocol.Protocol() + for js_step in protocol_steps: + sftwre = _get_modelcif_protocol_software(js_step) + input_data = _get_modelcif_protocol_data( + js_step["input"], target_entities, model, ref_dbs + ) + output_data = _get_modelcif_protocol_data( + js_step["output"], target_entities, model, ref_dbs + ) + + protocol.steps.append( + modelcif.protocol.Step( + input_data=input_data, + output_data=output_data, + name=js_step["name"], + details=js_step["details"], + software=sftwre, + ) + ) + protocol.steps[-1].method_type = js_step["method_type"] + return protocol + + +def _compress_cif_file(cif_file): + """Compress cif file and delete original.""" + with open(cif_file, "rb") as f_in: + with gzip.open(cif_file + ".gz", "wb") as f_out: + shutil.copyfileobj(f_in, f_out) + os.remove(cif_file) + + +def _package_associated_files(repo): + """Compress associated files into single zip file and delete original.""" + # zip settings tested for good speed vs compression + for archive in repo.files: + with zipfile.ZipFile(archive.path, "w", zipfile.ZIP_BZIP2) as cif_zip: + for zfile in archive.files: + cif_zip.write(zfile.path, arcname=zfile.path) + os.remove(zfile.path) + + +def _store_as_modelcif(data_json, ost_ent, out_dir, mdl_name, compress, + add_pae, add_pngs, add_files): + """Mix all the data into a ModelCIF file.""" + print(" generating ModelCIF objects...", end="") + pstart = timer() + # create system to gather all the data + system = modelcif.System( + title=data_json["title"], + id=data_json["mdl_id"].upper(), + model_details=data_json["model_details"], + ) + + # create an asymmetric unit and an entity per target sequence + asym_units = {} + _get_modelcif_entities(data_json["target_entities"], asym_units, system) + + # audit_authors + system.authors.extend(data_json["audit_authors"]) + + # set up the model to produce coordinates + model = _OST2ModelCIF( + assembly=modelcif.Assembly(asym_units.values()), + asym=asym_units, + ost_entity=ost_ent, + scores_json=data_json, + name=data_json["mdl_name"], + incl_pae=add_pae, + ) + print(f" ({timer()-pstart:.2f}s)") + print(" processing QA scores...", end="", flush=True) + pstart = timer() + model.add_scores() + print(f" ({timer()-pstart:.2f}s)") + + model_group = modelcif.model.ModelGroup([model]) + system.model_groups.append(model_group) + + # handle additional files + arc_files = [] + if add_pae: + arc_files.append(_get_assoc_pae_file(system.id, mdl_name)) + if add_pngs: + for png_type in ["coverage", "plddt", "pae"]: + arc_files.append(_get_assoc_png_file( + data_json[f"{png_type}_png_file_name"], png_type + )) + arc_files.extend(add_files) + if arc_files: + system.repositories.append(_get_associated_files(mdl_name, arc_files)) + + # get data and steps + ref_dbs = _get_sequence_dbs(data_json["cf_config"]) + protocol = _get_modelcif_protocol( + data_json["protocol"], system.entities, model, ref_dbs, + ) + system.protocols.append(protocol) + + # write modelcif System to file + print(" write to disk...", end="", flush=True) + pstart = timer() + # NOTE: this will dump PAE on path provided in add_scores + # -> hence we cheat by changing path and back while being exception-safe... + oldpwd = os.getcwd() + os.chdir(out_dir) + mdl_fle = f"{mdl_name}.cif" + try: + with open(mdl_fle, "w", encoding="ascii") as mmcif_fh: + modelcif.dumper.write(mmcif_fh, [system]) + if add_pngs: + for png_type in ["coverage", "plddt", "pae"]: + with open(data_json[f"{png_type}_png_file_name"], "wb") as fh: + fh.write(data_json[f"{png_type}_png_data"]) + if arc_files: + _package_associated_files(system.repositories[0]) + if compress: + _compress_cif_file(mdl_fle) + mdl_fle += ".gz" + finally: + os.chdir(oldpwd) + print(f" ({timer()-pstart:.2f}s)") + assoc_files = [_get_assoc_mdl_file(mdl_fle, data_json)] + if arc_files: + assoc_files.append( + _get_assoc_zip_file(system.repositories[0].files[0].path, data_json) + ) + return assoc_files +################################################################################ + +################################################################################ +# HANDLE FULL DATA SET +################################################################################ +def _get_cf_db_versions(dt): + """Returns the DB versions and name used to generate the model in colabfold for a given date""" + # logic: newest first, tuple with ur30_db_version, tpl_db, tpl_db_version + # -> set to UNK if +-1 day or for unknown tpl_db_version + switch_dates = [ + (datetime.datetime(2023, 6, 12), ("2023_02", "PDB100", "230517")), + (datetime.datetime(2022, 7, 13), ("2022_02", "PDB70", "220313")), + (datetime.datetime(2021, 1, 1), ("2021_03", "PDB70", "UNK")), + ] + for switch_dt, dbs in switch_dates: + dd = (dt - switch_dt).days + if abs(dd) <= 1: + return ("UNK", "UNK", "UNK") + elif dd > 1: + return dbs + +def _translate2modelcif_single( + metadata, opts, mdl_id, data_from_zip, entities, mdl_rank, add_files=[] +): + """Convert a single model with its accompanying data to ModelCIF.""" + mdl_data_from_zip = data_from_zip[f"rank_{mdl_rank}"] + mdl_af_id = mdl_data_from_zip["mdl_id"] + if mdl_rank > 1: + mdl_id += f"_rank_{mdl_rank}_{mdl_af_id}" + + print(f" translating {mdl_id}...") + pdb_start = timer() + + # gather data into JSON-like structure + print(" preparing data...", end="") + pstart = timer() + + mdlcf_json = {} + config_dict = metadata["config_dict"].copy() + ur30_db_version = _get_cf_db_versions(metadata["Final_date"]) + cf_config = _get_cf_config(config_dict, ur30_db_version[0]) + mdlcf_json["audit_authors"] = _get_audit_authors() + mdlcf_json["protocol"] = _get_protocol_steps_and_software(cf_config) + mdlcf_json["cf_config"] = cf_config + mdlcf_json["mdl_id"] = mdl_id # used for entry ID + mdlcf_json["mdl_rank"] = mdl_rank + if mdl_rank == 1: + mdlcf_json["mdl_name"] = f"Top ranked model ({mdl_af_id})" + else: + mdlcf_json["mdl_name"] = f"#{mdl_rank} ranked model ({mdl_af_id})" + mdlcf_json["target_entities"] = entities + for scores_key in ["plddt", "pae"]: + mdlcf_json[scores_key] = mdl_data_from_zip["scores"][scores_key] + # override global scores with higher accuracy ones in metadata + mdlcf_json["plddt_global"] = np.mean(mdl_data_from_zip["scores"][f"plddt"]) + mdlcf_json["ptm"] = mdl_data_from_zip["scores"][f"ptm"] + if "iptm" in mdl_data_from_zip["scores"]: + mdlcf_json["iptm"] = mdl_data_from_zip["scores"][f"iptm"] + # + mdlcf_json["title"] = _get_title(metadata) + if mdl_rank != 1: + mdlcf_json["title"] += f" (#{mdl_rank} ranked model)" + mdlcf_json["model_details"] = _get_model_details(metadata) + # fill PNG data + for png_type in ["coverage", "plddt", "pae"]: + mdlcf_json[f"{png_type}_png_file_name"] = f"{mdl_id}_{png_type}.png" + mdlcf_json[f"{png_type}_png_data"] = data_from_zip[ + f"{png_type}_png_data" + ] + + print(f" ({timer()-pstart:.2f}s)") + + # save ModelCIF + assoc_files = _store_as_modelcif( + data_json=mdlcf_json, + ost_ent=mdl_data_from_zip["ent"], + out_dir=opts.out_dir, + mdl_name=mdl_id, + compress=(mdl_rank == 1 and opts.compress), + add_pae=(mdl_rank == 1 and not opts.no_extra_files), + add_pngs=(mdl_rank == 1 and not opts.no_extra_files), + add_files=add_files + ) + + # check if result can be read and has expected seq. + mdl_path = os.path.join(opts.out_dir, assoc_files[0].path) + ent, ss = io.LoadMMCIF(mdl_path, seqres=True) + exp_seqs = [] + for trg_ent in mdlcf_json["target_entities"]: + exp_seqs += [trg_ent["pdb_sequence"]] * len(trg_ent["pdb_chain_ids"]) + assert ent.chain_count == len(exp_seqs), f"Bad chain count {mdl_id}" + # NOTE: here we expect auth = label IDs + ent_seq = "".join([_get_sequence(chn, False) for chn in ent.chains]) + ent_seq_a = "".join([_get_sequence(chn, True) for chn in ent.chains]) + assert ent_seq == ent_seq_a + assert ent_seq == "".join(exp_seqs), f"Bad seq. {mdl_id}" + ent_seqres = [ss.FindSequence(chn.name).string for chn in ent.chains] + exp_seqres = [] + for trg_ent in mdlcf_json["target_entities"]: + exp_seqres += [trg_ent["seqres"]] * len(trg_ent["pdb_chain_ids"]) + assert ent_seqres == exp_seqres, f"Bad seqres {mdl_id}" + + print(f" ... done with {mdl_id} ({timer()-pdb_start:.2f}s).") + + return assoc_files + + +def _translate2modelcif(metadata, opts, desired_ranks=3): + """Convert a model with its accompanying data to ModelCIF.""" + mdl_id = metadata["mdl_id"] + # skip if done already (done later here due to info to be returned) + if opts.compress: + cifext = "cif.gz" + else: + cifext = "cif" + mdl_path = os.path.join(opts.out_dir, f"{mdl_id}.{cifext}") + + # prepare data for models to convert (also gets all issues) + issues = [] + ent_dict = {} + data_from_zip = _parse_zip_file( + metadata["zip_file_path"], desired_ranks=desired_ranks + ) + for num in range(desired_ranks): + mdl_rank = num + 1 + mdl_data_from_zip = data_from_zip[f"rank_{mdl_rank}"] + entities, ent_issues = _get_entities(mdl_data_from_zip, metadata) + issues.extend(ent_issues) + ent_dict[mdl_rank] = entities + scores_issues = _check_scores(mdl_data_from_zip, metadata, mdl_rank) + issues.extend(scores_issues) + + # abort here if already done + if opts.checks_only: + return issues + if os.path.exists(mdl_path): + print(f" {mdl_id} already done...") + return issues + + # convert models if needed starting from lower ranked ones + add_files = [] + if not opts.no_extra_files: + for num in range(1, desired_ranks): + mdl_rank = num + 1 + entities = ent_dict[mdl_rank] + assoc_files = _translate2modelcif_single( + metadata, opts, mdl_id, data_from_zip, entities, mdl_rank + ) + add_files.extend(assoc_files) + # do top ranked one with assoc. files + _translate2modelcif_single( + metadata, opts, mdl_id, data_from_zip, ent_dict[1], + mdl_rank=1, add_files=add_files + ) + return issues + + +def _get_metadata(input_data_path, single_model=None): + """Read various metadata files and prepare for next steps. + Returns dict with key = mdl_num (int) and value = dict with: + - mdl_id, zip_file_path, abstract, title, config_dict + - chains (list of dict with chain, name, up_ac) + """ + # fetch data of models generation (used for DB version) + dates_json_path = os.path.join(input_data_path, "Annotations", "dates.json") + dates_data = json.load(open(dates_json_path)) + # fetch and check JSON data + json_path = os.path.join(input_data_path, "Annotations", "Annotations.json") + metadata_json_dict = json.load(open(json_path)) + num_models = len(metadata_json_dict) + exp_mdl_nums = list(range(1, num_models + 1)) + assert [int(k) for k in metadata_json_dict] == exp_mdl_nums + for key, data in metadata_json_dict.items(): + # sanity checks on data + if sorted(data.keys()) != ["abstract", "chains", "title"]: + raise RuntimeError(f"Wrong dict-keys observed for mdl {key}") + for chain in data["chains"]: + if sorted(chain.keys()) != ["chain", "name", "up_ac"]: + raise RuntimeError( + f"Wrong dict-keys observed in chain for mdl {key}" + ) + # If the model is a subset of UP sequence, parse the range + paren_index = chain['up_ac'].find('(') + if paren_index != -1: + # Split the string into accession code and range + chain['up_range'] = chain['up_ac'][paren_index+1:-1].strip() # Remove parentheses + chain['up_ac'] = chain['up_ac'][:paren_index].strip() + else: + # No range present; the whole string is the accession code + chain['up_range'] = None + data.update({"Final_date": datetime.datetime.strptime(dates_data[key]['final_date'], '%Y-%m-%d')}) + # fetch and check zip file paths + zip_files = {} # key = int(X) in [X]-[NAME].zip, value = dict + zip_path = os.path.join(input_data_path, "Zip_files") + for f_name in sorted(os.listdir(zip_path)): + if f_name.endswith(".zip"): + # assume name [X]-[NAME].zip + mdl_id = os.path.splitext(f_name)[0] + ss = mdl_id.split("-", 1) + zip_dict = { + "mdl_id": mdl_id, + "zip_file_path": os.path.join(zip_path, f_name) + } + mdl_num = int(ss[0]) + # sanity check to make sure mdl_id works for alphabetic sorting + assert f"{mdl_num:03d}" == ss[0] + assert mdl_num not in zip_files + zip_files[mdl_num] = zip_dict + # check that there are no gaps (assures we can safely regenerate numbers during import) + # get configs + configs_path = os.path.join(input_data_path, "Configs") + config_dicts = {} # key = mdl_num, value = JSON content + # (default to mdl_num=1 for ones not in dict) + for f_name in sorted(os.listdir(configs_path)): + if f_name.endswith(".json"): + # file names: config_[NUM].json or config_[NUM1]_[NUM2].json + nums = os.path.splitext(f_name)[0].split('_')[1:] + config_data = json.load(open(os.path.join(configs_path, f_name))) + for num in nums: + assert int(num) not in config_dicts + assert num in metadata_json_dict + config_dicts[int(num)] = config_data + # combine data + metadata_all = {} # key = mdl_num (int), value = dict + for key, data in metadata_json_dict.items(): + # restrict if needed + if single_model is not None and key != single_model: + continue + mdl_num = int(key) + metadata_all[mdl_num] = data + data.update(zip_files[mdl_num]) + data["config_dict"] = config_dicts.get(mdl_num, config_dicts[1]) + return metadata_all + +def _main(): + """Run as script.""" + + # parse/fetch global data + opts = _parse_args() + metadata_all = _get_metadata(opts.input_data_path, opts.single_model) + + # iterate over models + print(f"Working on models in {opts.input_data_path}...") + issues = [] + for metadata in metadata_all: + new_issues = _translate2modelcif(metadata_all[metadata], opts) + issues.extend(new_issues) + print(f"... done with models in {opts.input_data_path}.") + + if opts.single_model is None: + # dump issues + issues_file_path = os.path.join(opts.out_dir, "issues.json") + json.dump(issues, open(issues_file_path, "w")) + # dump info on which ones to export to 3D-Beacons + to_export_file_path = os.path.join(opts.out_dir, "MA_to_export.json") + to_export = { + metadata_all[metadata]["mdl_id"]: ("known interaction" in metadata_all[metadata]["abstract"]) \ + for metadata in metadata_all + } + json.dump(to_export, open(to_export_file_path, "w")) + + # TEST: to judge res. needed on cluster + import resource + print('mem', resource.getrusage(resource.RUSAGE_SELF).ru_maxrss / 1000) + + +if __name__ == "__main__": + _main() diff --git a/projects/2024-09-ma-taas/README.md b/projects/2024-09-ma-taas/README.md new file mode 100644 index 0000000000000000000000000000000000000000..51f99dcd9328ba2c363e3d8a0d5fc137a1905262 --- /dev/null +++ b/projects/2024-09-ma-taas/README.md @@ -0,0 +1,44 @@ +# Modelling of full-length trimeric autotransporter adhesins (TAAs) + +[Link to project in ModelArchive](https://www.modelarchive.org/doi/10.5452/ma-taas) (incl. background on project itself) + +Modelling setup: +- Step 1: AlphaFold-Multimer (AF) using v2.1.2 or v2.3.1 with different parameters for overlapping ranges of full-length sequence +- Step 2 (if needed): AF-models truncated and superposed and assembled using MODELLER +- Step 3 (if multiple models combined): custom QE calculated combining pLDDT and MODELLER energy score +- Step 4 (for assemblies): domain annotation performed using a combination of DSSP, SamCC-Turbo, Foldseek, and US-align +- Single domain models only included step 1 and a truncation step (usually a subset of another model in the same set) +- Full-length sequences taken from NCBI. + +Input files for conversion: +- Single ZIP file for each model +- Each ZIP file contains: + - assembly_info.json or domain_info.json with metadata on model + - assembly.cif or domain.cif with the model coordinates + - accompanying_data.zip or accompanying_data sub-folder with additional files as listed in the metadata + - image.png with an overview image for use in ModelArchive + +Special features here compared to past projects: +- NCBI data loaded in bulk before doing conversion (helpful to process large set in parallel). +- Generic `ZipFileHandler` class to simplify processing ZIP files. +- Fills `not_modeled_residue_ranges` which was recently added to python-modelcif. +- Enable packing of associated files without having to dump data to disk (avoids having to extract files from provided ZIP files). +- Used software includes AlphaFold-Multimer, MODELLER, DSSP, Foldseek, SamCC-Turbo, and US-align. +- QE metrics linked to software used (fetched from protocol steps to avoid duplicated entries in ModelCIF). +- Complex modelling pipeline with different variants of the same software (AF) being used to produce distinct intermediate files. Global definition of objects was needed to avoid duplicated entries in ModelCIF. Data input/output for protocol steps handled with lists of keywords. +- Provided assembly.cif files contain additional mmCIF data to keep (categories pdbx_domain, pdbx_feature_domain, pdbx_domain_range kept and added using gemmi). +- Intermediate models kept as accompanying data in PDB format. AF-models additionally with scores in JSON format (processed from pickle files into more portable JSON) and PAE matrix as PNG file. +- Use of modelcif.reference.SeqDif to handle mismatches between NCBI and entity sequence (simplified compared to 2024-08-ma-dm-hisrep). +- Similar checking for issues as in 2024-04-ma-dm-prc. + +Content: +- translate2modelcif.py : script to do conversion (run in virtual environment with same setup as Docker container here; using OST 2.8, python-modelcif 1.1 and python-ihm 1.6) +- fetch_ncbi_entries.py : script used to obtain info from NCBI stored in ncbi_data.json +- ncbi_data.json.gz : version of ncbi_data.json used for this conversion (from 24.9.2024); to be unpacked to use translate2modelcif.py +- ma-taas-0272.zip : example for a small assembly in the set +- ma-taas-9037.zip : example for a small domain in the set +- example_modelcif.zip : output from running conversion of examples with the commands below: +``` +ost translate2modelcif.py ma-taas-0272.zip modelcif +ost translate2modelcif.py ma-taas-9037.zip modelcif +``` diff --git a/projects/2024-09-ma-taas/example_modelcif.zip b/projects/2024-09-ma-taas/example_modelcif.zip new file mode 100644 index 0000000000000000000000000000000000000000..6c3992766e4ab6c36e0203299fb94be184308f27 Binary files /dev/null and b/projects/2024-09-ma-taas/example_modelcif.zip differ diff --git a/projects/2024-09-ma-taas/fetch_ncbi_entries.py b/projects/2024-09-ma-taas/fetch_ncbi_entries.py new file mode 100644 index 0000000000000000000000000000000000000000..16bdf577ccd021dd0f8d8cbaac0b6a337b7e479b --- /dev/null +++ b/projects/2024-09-ma-taas/fetch_ncbi_entries.py @@ -0,0 +1,335 @@ +# get info from NCBI entries for the full set + +import requests, json, os, re, zipfile, time +import pandas as pd +import numpy as np +from xml.etree import ElementTree as ET +from ost import io + + +# HC setup +out_json_ncbi = "./ncbi_data.json" +out_json_data = "./summary_data.json" # collection of all assembly/domain.json for testing +assemblies_path = "./data_assemblies/taa_export_MA_180724" +domains_path = "./data_domains/domain_export_merged_200624" +# + +def _warn_msg(msg): + """Write a warning message to stdout.""" + print(f"WARNING: {msg}") + + +def _get_ncbi_base_url(): + return "https://eutils.ncbi.nlm.nih.gov/entrez/eutils/" + + +def _get_ncbi_docsums(params): + # params = parameters to pass to esummary + response = requests.get(_get_ncbi_base_url() + "esummary.fcgi", + params=params) + root = ET.fromstring(response.text) + docsums = root.findall(".//DocSum") + ncbi_dicts = [] + for docsum in docsums: + ncbi_dict = {} + for cn in docsum: + if cn.tag == "Item": + cn_name = cn.get("Name") + cn_type = cn.get("Type") + if cn.text: + d = cn.text + if cn_type == "String": + ncbi_dict[cn_name] = d + elif cn_type == "Integer": + ncbi_dict[cn_name] = int(d) + elif cn_type == "Date": + # kept as string + ncbi_dict[cn_name] = d + else: + raise RuntimeError( + f"Unknown type {cn_type} for {ncbi_ac}" + ) + else: + ncbi_dict[cn_name] = None + ncbi_dicts.append(ncbi_dict) + return ncbi_dicts + + +def _get_ncbi_data(ncbi_acs): + """Fetch data from NCBI in bulk.""" + max_num_ids = 2000 # recommended to do max. 5000 at once + ncbi_seq = {} + ncbi_info = {} + sel_ids = list(ncbi_acs) + print(f"FETCHING NCBI INFO FOR {len(sel_ids)} PROTEIN ACs") + while len(sel_ids) > 0: + # SOURCE: https://www.ncbi.nlm.nih.gov/books/NBK25501/ + db = "protein" + # Upload the list of IDs using the epost endpoint + sel_ids = sel_ids[:max_num_ids] + params = {"db": db, "id": ",".join(sel_ids)} + response = requests.post(_get_ncbi_base_url() + "epost.fcgi", + data=params) + xml_string = response.text + root = ET.fromstring(xml_string) + query_key = root.find(".//QueryKey").text + web_env = root.find(".//WebEnv").text + # Fetch all sequences using the efetch endpoint via epost info + params = {"db": db, "query_key": query_key, "WebEnv": web_env, + "rettype": "fasta", "retmode": "text"} + response = requests.get(_get_ncbi_base_url() + "efetch.fcgi", + params=params) + ss = io.SequenceListFromString(response.text, "fasta") + for s in ss: + ncbi_ac = s.name.split()[0] + if ncbi_ac not in sel_ids: + # assume that we store them without version + ncbi_ac = ncbi_ac.rsplit('.', 1)[0] + assert ncbi_ac in sel_ids + ncbi_seq[ncbi_ac] = s + # Fetch all infos using the esummary endpoint via epost info + params = {"db": db, "query_key": query_key, "WebEnv": web_env} + for ncbi_dict in _get_ncbi_docsums(params): + ncbi_ac = ncbi_dict["AccessionVersion"] + if ncbi_ac not in sel_ids: + # assume that we store them without version + ncbi_ac = ncbi_ac.rsplit('.', 1)[0] + assert ncbi_ac in sel_ids + ncbi_info[ncbi_ac] = ncbi_dict + assert ncbi_info.keys() == ncbi_seq.keys() + # what's left? + # (random ones get dropped; reason unknown; so we try until all done) + sel_ids = [ncbi_ac for ncbi_ac in ncbi_acs \ + if ncbi_ac not in ncbi_info] + print(f"- {len(ncbi_info)} DONE; {len(sel_ids)} TODO") + assert sorted(ncbi_acs) == sorted(ncbi_info) + + # combine nicely for further use + ncbi_data = { + ncbi_ac: { + "seq_name": ncbi_seq[ncbi_ac].name, + "seq_str": ncbi_seq[ncbi_ac].string, + "info": ncbi_info[ncbi_ac] + } \ + for ncbi_ac in ncbi_acs + } + return ncbi_data + + +def _get_ncbi_data_cached(ncbi_acs, ncbi_metadata_file): + """Fetch dict with info for all NCBI proteins from file or web.""" + if ncbi_metadata_file: + if os.path.exists(ncbi_metadata_file): + return json.load(open(ncbi_metadata_file)) + else: + ncbi_data = _get_ncbi_data(ncbi_acs) + json.dump(ncbi_data, open(ncbi_metadata_file, "w")) + return ncbi_data + else: + return _get_ncbi_data(ncbi_acs) + + +def _get_ncbi_tax_info(tax_ids): + """Fetch data from NCBI species in bulk.""" + max_num_ids = 2000 # recommended to do max. 5000 at once + ncbi_info = {} + sel_ids = list(tax_ids) + print(f"FETCHING NCBI INFO FOR {len(sel_ids)} TAX IDs") + while len(sel_ids) > 0: + # keep track of TODOs to abort if no progress made + last_num_todo = len(sel_ids) + # SOURCE: https://www.ncbi.nlm.nih.gov/books/NBK25501/ + db = "taxonomy" + # Upload the list of IDs using the epost endpoint + sel_ids = sel_ids[:max_num_ids] + params = {"db": db, "id": ",".join(sel_ids)} + response = requests.post(_get_ncbi_base_url() + "epost.fcgi", + data=params) + xml_string = response.text + root = ET.fromstring(xml_string) + query_key = root.find(".//QueryKey").text + web_env = root.find(".//WebEnv").text + # Fetch all infos using the esummary endpoint via epost info + params = {"db": db, "query_key": query_key, "WebEnv": web_env} + for ncbi_dict in _get_ncbi_docsums(params): + tax_id = str(ncbi_dict["TaxId"]) + assert tax_id in sel_ids + ncbi_info[tax_id] = ncbi_dict + # what's left? + # (random ones get dropped; reason unknown; so we try until all done) + sel_ids = [tax_id for tax_id in tax_ids \ + if tax_id not in ncbi_info] + print(f"- {len(ncbi_info)} DONE; {len(sel_ids)} TODO") + if last_num_todo == len(sel_ids): + print(f"ABORTING...failed to get {sel_ids}") + return ncbi_info + assert sorted(tax_ids) == sorted(ncbi_info) + return ncbi_info + + +def _check_ncbi_data(ncbi_data): + """Run sanity checks on fetched data.""" + non_live = [] + outdated = [] # (AC, replaced-by) + for ncbi_ac, ncbi_item in ncbi_data.items(): + ncbi_info = ncbi_item["info"] + if ncbi_info["Status"] != "live": + non_live.append(ncbi_ac) + if ncbi_info["ReplacedBy"]: + outdated.append((ncbi_ac, ncbi_info['ReplacedBy'])) + if ncbi_info["AccessionVersion"] != ncbi_ac: + ncbi_info_ac = ncbi_info["AccessionVersion"].rsplit('.', 1)[0] + if ncbi_info_ac != ncbi_ac: + raise RuntimeError(f"NCBI AC is not AC") + mmcif_regex = "[][ \t_(),.;:\"&<>/\\\\{}'`~!@#$%?+=*A-Za-z0-9|^-]*" + description = ncbi_info["Title"] + if not re.fullmatch(mmcif_regex, description): + raise RuntimeError(f"Illegal characters found in title of " \ + f"{ncbi_ac}: {description}") + tax_id = str(ncbi_info["TaxId"]) + organism_scientific = ncbi_info["SpeciesName"] + if len(non_live) > 0: + msg = f"{len(non_live)} NCBI entries not live" + if len(non_live) < 20: + msg += f": {non_live}" + _warn_msg(msg) + if len(outdated) > 0: + ncbi_acs = [v[0] for v in outdated] + ncbi_rep = [v[1] for v in outdated] + msg = f"{len(ncbi_acs)} outdated NCBI entries" + if len(outdated) < 20: + msg += f": {ncbi_acs} replaced by {ncbi_rep}" + _warn_msg(msg) + return non_live, outdated + + +def _check_summary_data(json_data): + # collect info for further large scale checks + json_dicts = {} + df_dict_list = [] + for item in json_data: + # keep JSON dict for easy access + json_dicts[item["ma_id"]] = item["json_dict"] + # prepare data for DataFrame + keys_to_keep = [ + "ma_id", "mdl_type", "has_accompanying_data", "has_cif", "has_image" + ] + new_item = {k: item[k] for k in keys_to_keep} + new_item["ncbi_ac"] = item["json_dict"]["source_sequence_RefSeq_ID"] + new_item["no_unparsed_files"] = len(item["unparsed_files"]) == 0 + df_dict_list.append(new_item) + # keep data in DataFrame for further checks + df = pd.DataFrame(df_dict_list) + assert len(set(df.ma_id)) == len(df.ma_id) + df.set_index("ma_id", inplace=True) + for k, v in df.isna().any().to_dict().items(): + if v: + _warn_msg(f"TO CHECK: NA found in summary data key {k}") + for k, v in df.all().to_dict().items(): + if not v: + _warn_msg(f"TO CHECK: False values found in summary data key {k}") + + +def _parse_zip_file(zip_file_path): + data_from_zip = { + "ma_id": os.path.splitext(os.path.basename(zip_file_path))[0], + "zip_file_path": zip_file_path, + "mdl_type": None, + "json_dict": None, + "has_accompanying_data": False, + "has_cif": False, + "has_image": False, + "unparsed_files": [], + } + with zipfile.ZipFile(zip_file_path) as zf: + for file_name in sorted(zf.namelist()): + base_name = os.path.basename(file_name) + if base_name == "": + # we ignore entries for paths + continue + elif base_name == "accompanying_data.zip" and not data_from_zip["has_accompanying_data"]: + data_from_zip["has_accompanying_data"] = True + elif "accompanying_data/" in file_name: + data_from_zip["has_accompanying_data"] = True + elif base_name == "assembly_info.json" and data_from_zip["json_dict"] is None and data_from_zip["mdl_type"] in [None, "assembly"]: + data_from_zip["mdl_type"] = "assembly" + data_from_zip["json_dict"] = json.load(zf.open(file_name)) + elif base_name == "assembly.cif" and not data_from_zip["has_cif"] and data_from_zip["mdl_type"] in [None, "assembly"]: + data_from_zip["mdl_type"] = "assembly" + data_from_zip["has_cif"] = True + elif base_name == "domain_info.json" and data_from_zip["json_dict"] is None and data_from_zip["mdl_type"] in [None, "domain"]: + data_from_zip["mdl_type"] = "domain" + data_from_zip["json_dict"] = json.load(zf.open(file_name)) + elif base_name == "domain.cif" and not data_from_zip["has_cif"] and data_from_zip["mdl_type"] in [None, "domain"]: + data_from_zip["mdl_type"] = "domain" + data_from_zip["has_cif"] = True + elif base_name == "image.png" and not data_from_zip["has_image"]: + data_from_zip["has_image"] = True + else: + data_from_zip["unparsed_files"].append(file_name) + return data_from_zip + + +def _parse_path(path, exp_type): + t0 = time.time() + data_from_zips = [] + for file_name in sorted(os.listdir(path)): + data_from_zip = _parse_zip_file(os.path.join(path, file_name)) + if data_from_zip["mdl_type"] != exp_type: + print("WARNING:", path, file_name, data_from_zip["mdl_type"], exp_type) + data_from_zips.append(data_from_zip) + if len(data_from_zips) % 200 == 0: + print("DONE ZIP", path, len(data_from_zips), time.time() - t0) + return data_from_zips + + +def main(): + # get all JSONs + if os.path.exists(out_json_data): + data_from_zips = json.load(open(out_json_data)) + else: + data_from_zips = _parse_path(assemblies_path, "assembly") \ + + _parse_path(domains_path, "domain") + json.dump(data_from_zips, open(out_json_data, "w")) + + # collect all NCBI ACs + ncbi_acs = [item["json_dict"]["source_sequence_RefSeq_ID"] \ + for item in data_from_zips] + unique_acs = sorted(set(ncbi_acs)) + # fetch NCBI data for each + ncbi_data = _get_ncbi_data_cached(unique_acs, out_json_ncbi) + + # fetch all needed tax. infos (excluding overridden ones) + tax_ids = sorted(set( + str(ncbi_item["info"]["TaxId"]) \ + for ncbi_item in ncbi_data.values() + )) + ncbi_tax_infos = _get_ncbi_tax_info(tax_ids) + # some checks (expect to have all active tax. ids) + for item, exp_vals in [("Status", ["active"])]: + observed_values = sorted(set( + ncbi_tax_info[item] \ + for ncbi_tax_info in ncbi_tax_infos.values() + )) + if observed_values != exp_vals: + print(f"UNEXPECTED observed '{item}' values: {observed_values}") + # extract/keep scientific names + ncbi_species_names = { + ncbi_tax_id: ncbi_tax_info["ScientificName"] \ + for ncbi_tax_id, ncbi_tax_info in ncbi_tax_infos.items() + } + + # apply to data + for ncbi_ac, ncbi_item in ncbi_data.items(): + tax_id = str(ncbi_item["info"]["TaxId"]) + ncbi_item["info"]["SpeciesName"] = ncbi_species_names[tax_id] + # dump file + json.dump(ncbi_data, open(out_json_ncbi, "w")) + + # do some checks at the end + _check_summary_data(data_from_zips) + _check_ncbi_data(ncbi_data) + + +if __name__ == "__main__": + main() diff --git a/projects/2024-09-ma-taas/ma-taas-0272.zip b/projects/2024-09-ma-taas/ma-taas-0272.zip new file mode 100644 index 0000000000000000000000000000000000000000..a2990dbee6630dca60eef8eae6876ac9568e7bee Binary files /dev/null and b/projects/2024-09-ma-taas/ma-taas-0272.zip differ diff --git a/projects/2024-09-ma-taas/ma-taas-9037.zip b/projects/2024-09-ma-taas/ma-taas-9037.zip new file mode 100644 index 0000000000000000000000000000000000000000..42630098690d245ea4d8f5ab5f8f138057742813 Binary files /dev/null and b/projects/2024-09-ma-taas/ma-taas-9037.zip differ diff --git a/projects/2024-09-ma-taas/ncbi_data.json.gz b/projects/2024-09-ma-taas/ncbi_data.json.gz new file mode 100644 index 0000000000000000000000000000000000000000..eaa31245f2e0008b5aed9b721f60dfbf03355bc7 Binary files /dev/null and b/projects/2024-09-ma-taas/ncbi_data.json.gz differ diff --git a/projects/2024-09-ma-taas/translate2modelcif.py b/projects/2024-09-ma-taas/translate2modelcif.py new file mode 100644 index 0000000000000000000000000000000000000000..ca6b41bc1248c0353e5206ed504d4b3492f68a5e --- /dev/null +++ b/projects/2024-09-ma-taas/translate2modelcif.py @@ -0,0 +1,2327 @@ +#! /usr/local/bin/ost +# -*- coding: utf-8 -*- + +"""Translate TAA models for ma-taas. + +Example for running: +ost translate2modelcif.py ./sample_files/ma-taas-0272.zip ./modelcif +-> due to size of the set, each model in the set is converted individually +-> needs ncbi_data.json from 2_fetch_ncbi_entries.py + +Expected output in ./modelcif for example above: +- ma-taas-0272.cif as ModelCIF file +- ma-taas-0272.zip with accompanying data +- ma-taas-0272-image.png as image to use in ModelArchive +- ma-taas-0272-issues.json listing issues for that conversion (if any) +""" + +import argparse +import datetime +import gzip +import os +import shutil +import sys +import zipfile +import pickle +import re +import traceback + +from io import StringIO +from timeit import default_timer as timer +import numpy as np +import requests +import ujson as json + +import ihm +import ihm.citations + +import modelcif +import modelcif.associated +import modelcif.dumper +import modelcif.model +import modelcif.protocol +import modelcif.reference +import gemmi + +import pandas as pd +from ost import io, seq + +################################################################################ +# GENERAL HELPER FUNCTIONS +################################################################################ +def _abort_msg(msg, exit_code=1): + """Write error message and exit with exit_code.""" + print(f"{msg}\nAborting.", file=sys.stderr) + sys.exit(exit_code) + + +def _warn_msg(msg): + """Write a warning message to stdout.""" + print(f"WARNING: {msg}") + + +def _check_file(file_path): + """Make sure a file exists and is actually a file.""" + if not os.path.exists(file_path): + _abort_msg(f"File not found: '{file_path}'.") + if not os.path.isfile(file_path): + _abort_msg(f"File path does not point to file: '{file_path}'.") + + +def _check_folder(dir_path): + """Make sure a file exists and is actually a file.""" + if not os.path.exists(dir_path): + _abort_msg(f"Path not found: '{dir_path}'.") + if not os.path.isdir(dir_path): + _abort_msg(f"Path does not point to a directory: '{dir_path}'.") + + +def _check_opts_folder(dir_path): + """Remove trailing '/' (return fixed one) and check if path valid.""" + if dir_path.endswith("/"): + dir_path = dir_path[:-1] + _check_folder(dir_path) + return dir_path + + +def _get_res_num(r, use_auth=False): + """Get res. num. from auth. IDs if reading from mmCIF files.""" + if use_auth: + return int(r.GetStringProp("pdb_auth_resnum")) + return r.number.num + + +def _get_ch_name(ch, use_auth=False): + """Get chain name from auth. IDs if reading from mmCIF files.""" + if use_auth: + return ch.GetStringProp("pdb_auth_chain_name") + return ch.name + + +def _get_sequence(chn, use_auth=False): + """Get the sequence out of an OST chain incl. '-' for gaps in resnums.""" + # initialise (add gaps if first is not at num. 1) + lst_rn = _get_res_num(chn.residues[0], use_auth) + idx = 1 + sqe = "-" * (lst_rn - 1) + chn.residues[0].one_letter_code + + for res in chn.residues[idx:]: + lst_rn += 1 + while lst_rn != _get_res_num(res, use_auth): + sqe += "-" + lst_rn += 1 + sqe += res.one_letter_code + return sqe + + +class ZipFileHandler: + """ + A class to handle ZIP files, including nested ZIP files, allowing for + listing files and extracting or reading files on demand. + File names are represented as tuples where all besides the last item + are expected to be nested paths to ZIP files. + + Attributes: + zip_path (str): The path to the main ZIP file. + file_list (list): List of tuples representing the hierarchical file structure. + """ + + def __init__(self, zip_path): + """ + Initializes the ZipFileHandler with the path to the main ZIP file. + + Args: + zip_path (str): The path to the main ZIP file. + """ + self.zip_path = zip_path + self.file_list = self._build_file_list() + + def _build_file_list(self, zip_file=None, base_path=()): + """ + Recursively builds a list representing the hierarchical file structure + within the ZIP file. + + Args: + zip_file (zipfile.ZipFile, optional): The current ZIP file object being processed. + If None, the main ZIP file is processed. Default is None. + base_path (tuple): The base path within the hierarchical structure. + + Returns: + list of tuples representing the hierarchical file structure. + """ + file_list = [] + if zip_file is None: + with zipfile.ZipFile(self.zip_path, 'r') as zip_ref: + file_list.extend(self._build_file_list(zip_ref)) + else: + for file_info in zip_file.infolist(): + file_name = file_info.filename + if file_name.endswith('/'): + continue # Skip directories + if file_name.endswith('.zip'): + with zip_file.open(file_name) as nested_zip: + with zipfile.ZipFile(nested_zip) as nested_zip_ref: + nested_base_path = base_path + (file_name,) + file_list.extend(self._build_file_list(nested_zip_ref, nested_base_path)) + else: + file_list.append(base_path + (file_name,)) + return file_list + + def _read_all_files(self, zip_file=None, base_path=()): + """ + Recursively builds a list representing the hierarchical file structure + within the ZIP file and reads all files. + + Args: + zip_file (zipfile.ZipFile, optional): The current ZIP file object being processed. + If None, the main ZIP file is processed. Default is None. + base_path (tuple): The base path within the hierarchical structure. + + Returns: + dictionary with key matching tuples in self.file_list and + value being raw read content of file as in self.read_file. + """ + file_list = {} + if zip_file is None: + with zipfile.ZipFile(self.zip_path, 'r') as zip_ref: + file_list.update(self._read_all_files(zip_ref)) + else: + for file_name in zip_file.namelist(): + if file_name.endswith('/'): + continue # Skip directories + if file_name.endswith('.zip'): + with zip_file.open(file_name) as nested_zip: + with zipfile.ZipFile(nested_zip) as nested_zip_ref: + nested_base_path = base_path + (file_name,) + file_list.update(self._read_all_files(nested_zip_ref, nested_base_path)) + else: + file_list[base_path + (file_name,)] = zip_file.open(file_name).read() + return file_list + + def _zip_parser(self, current_zip, filename_tuple, callback): + """ + Hierarchical parsing of ZIP file through filename_tuple. + Calls callback(zip_ref, name) at final nesting level with + zip_ref being the open ZipFile and name the filename within it. + """ + with zipfile.ZipFile(current_zip, 'r') as zip_ref: + name = filename_tuple[0] + if len(filename_tuple) == 1: + return callback(zip_ref, name) + else: + with zip_ref.open(name) as nested_zip: + return self._zip_parser( + nested_zip, filename_tuple[1:], callback + ) + + def extract_file(self, filename_tuple, extract_to='.'): + """ + Extracts a specific file from the ZIP file based on the hierarchical path. + Note that the extracted file path will be according to filename_tuple[-1] + within the extract_to folder. + + Args: + filename_tuple (tuple): A tuple representing the hierarchical path to the file. + extract_to (str): The directory to extract the file to. Default is current directory ('.'). + + Raises: + FileNotFoundError: If the specified file is not found in the ZIP file. + """ + self._zip_parser( + self.zip_path, + filename_tuple, + lambda zip_ref, name: zip_ref.extract(name, path=extract_to) + ) + + def extract_file_to_path(self, filename_tuple, output_path): + """ + Extracts a specific file from the ZIP file based on the hierarchical path and + saves it directly to the specified output path. + + Args: + filename_tuple (tuple): A tuple representing the hierarchical path to the file. + output_path (str): The desired output path for the extracted file. + + Raises: + FileNotFoundError: If the specified file is not found in the ZIP file. + """ + def _extract_to_path(zip_ref, name): + with zip_ref.open(name) as source, open(output_path, 'wb') as target: + shutil.copyfileobj(source, target) + + self._zip_parser(self.zip_path, filename_tuple, _extract_to_path) + + def read_file(self, filename_tuple): + """ + Reads and returns the content of a specific file from the ZIP file based on the hierarchical path. + + Args: + filename_tuple (tuple): A tuple representing the hierarchical path to the file. + + Returns: + bytes: The content of the specified file. + + Raises: + FileNotFoundError: If the specified file is not found in the ZIP file. + """ + return self._zip_parser( + self.zip_path, + filename_tuple, + lambda zip_ref, name: zip_ref.open(name).read() + ) + + def read_all_files(self): + """ + Recursively builds a list representing the hierarchical file structure + within the ZIP file and reads all files. + + NOTE: for large zip files with data in nested zip file, this is much, + much faster than calling read_file individually! + + Returns: + dictionary with key matching tuples in self.file_list and + value being raw read content of file as in self.read_file. + """ + return self._read_all_files() + + def process_file(self, filename_tuple, func): + """ + Opens a specific file from the ZIP file based on the hierarchical path + and returns the result of func(fo) where fo is a file-like object. + + Args: + filename_tuple (tuple): A tuple representing the hierarchical path to the file. + func (function): Function to call on file-like object. + + Returns: + Result of func. + + Raises: + FileNotFoundError: If the specified file is not found in the ZIP file. + """ + return self._zip_parser( + self.zip_path, + filename_tuple, + lambda zip_ref, name: func(zip_ref.open(name)) + ) +################################################################################ + +################################################################################ +# DATA HANDLING +################################################################################ +def _parse_args(): + """Parse command line arguments.""" + parser = argparse.ArgumentParser( + formatter_class=argparse.RawDescriptionHelpFormatter, + description=__doc__, + ) + + parser.add_argument( + "input_zip_path", + type=str, + metavar="<INPUT ZIP PATH>", + help="Path to ZIP file provided by depositors. Expected to contain " + + "assembly_info.json or domain_info.json with metadata on model " \ + + "assembly.cif or domain.cif with the model coordinates " \ + + "accompanying_data.zip with additional files as listed in the " \ + + " metadata and image.png with an image for ModelArchive.", + ) + parser.add_argument( + "out_dir", + type=str, + metavar="<OUTPUT DIR>", + help="Path to directory to store results. For a ZIP file named X.zip " \ + + "the output will contain: X.cif[.gz] as ModelCIF file, X.zip with " \ + + "accompanying data, X-image.png as image to use in ModelArchive, " \ + + "and X-issues.json listing issues for that conversion." + ) + parser.add_argument( + "--ncbi-json", + type=str, + default="./ncbi_data.json", + help="Pre-fetched NCBI entries (from 2_fetch_ncbi_entries.py).", + ) + parser.add_argument( + "--compress", + default=False, + action="store_true", + help="Compress ModelCIF file with gzip.", + ) + parser.add_argument( + "--checks-only", + default=False, + action="store_true", + help="Only check for issues without producing ModelCIF files.", + ) + parser.add_argument( + "--no-extra-files", + default=False, + action="store_true", + help="Skip writing accompanying data (for testing).", + ) + + opts = parser.parse_args() + + # check input + _check_file(opts.input_zip_path) + if opts.out_dir.endswith("/"): + opts.out_dir = opts.out_dir[:-1] + if not os.path.exists(opts.out_dir): + os.makedirs(opts.out_dir, exist_ok=True) + _check_file(opts.ncbi_json) + return opts + + +def _get_audit_authors(): + """Return the list of authors that produced this model.""" + return ( + "Dobbelstein, Adrian", + "Alva, Vikram", + ) + + +def _get_file_list_dict(file_list): + """ + Get dictionary for access to files for conversion. + + Args: + file_list (list): As returned by ZipFileHandler. + + Returns: + dict with following keys / values: + - "mdl_type" / "assembly" or "domain" + - "json_file" / filename_tuple for access in ZipFileHandler + - "cif_file" / filename_tuple for access in ZipFileHandler + - "image_file" / filename_tuple for access in ZipFileHandler + - "accompanying_data" / dict with + filename without path as key and + filename_tuple for access in ZipFileHandler as value + - "unparsed_files" / list of filename_tuple + """ + result = { + "mdl_type": None, + "json_file": None, + "cif_file": None, + "image_file": None, + "accompanying_data": {}, + "unparsed_files": [], + } + for filename_tuple in file_list: + basename = os.path.basename(filename_tuple[-1]) + fs = filename_tuple[0].split(os.path.sep) + if fs[-2] == "accompanying_data" or fs[-1] == "accompanying_data.zip": + assert basename not in result["accompanying_data"] + result["accompanying_data"][basename] = filename_tuple + elif basename == "assembly_info.json": + assert result["mdl_type"] in [None, "assembly"] + assert result["json_file"] is None + result["mdl_type"] = "assembly" + result["json_file"] = filename_tuple + elif basename == "assembly.cif": + assert result["mdl_type"] in [None, "assembly"] + assert result["cif_file"] is None + result["mdl_type"] = "assembly" + result["cif_file"] = filename_tuple + elif basename == "domain_info.json": + assert result["mdl_type"] in [None, "domain"] + assert result["json_file"] is None + result["mdl_type"] = "domain" + result["json_file"] = filename_tuple + elif basename == "domain.cif": + assert result["mdl_type"] in [None, "domain"] + assert result["cif_file"] is None + result["mdl_type"] = "domain" + result["cif_file"] = filename_tuple + elif basename == "image.png": + assert result["image_file"] is None + result["image_file"] = filename_tuple + else: + result["unparsed_files"].append(filename_tuple) + for k, v in result.items(): + has_content = bool(v) + if k != "unparsed_files" and not has_content: + raise RuntimeError(f"{k} not set in zip file.") + return result + + +def _get_zip_content(zip_file_path): + """Same as _get_file_list_dict but reads zip file content.""" + zip_handler = ZipFileHandler(zip_file_path) + zip_dict = _get_file_list_dict(zip_handler.file_list) + zip_content = zip_handler.read_all_files() + for k, v in zip_dict.items(): + if k == "accompanying_data": + v = {ak: zip_content[av] for ak, av in v.items()} + elif k in ["json_file", "cif_file", "image_file"]: + v = zip_content[v] + zip_dict[k] = v + return zip_dict + + +def _get_modeller_software(version=None): + """Get MODELLER as a dictionary, suitable to create a modelcif software + object.""" + # Note: depositor had suggested use of Webb and Sali 2016 but decided + # against in given input from Ben Webb (1993 citation is still the canoncial + # one: https://salilab.org/modeller/manual/node8.html). + return { + "name": "MODELLER", + "classification": "model building", + "description": "Comparative modeling by satisfaction of spatial " \ + "restraints", + "citation": ihm.citations.modeller, + "location": "https://salilab.org/modeller/", + "type": "program", + "version": version, + } + + +# global definition to avoid duplicated entries in ModelCIF +ihm.citations.af2_multimer = ihm.Citation( + pmid=None, + title="Protein complex prediction with AlphaFold-Multimer.", + journal="bioRxiv", + volume=None, + page_range=None, + year=2021, + authors=[ + "Evans, R.", + "O'Neill, M.", + "Pritzel, A.", + "Antropova, N.", + "Senior, A.", + "Green, T.", + "Zidek, A.", + "Bates, R.", + "Blackwell, S.", + "Yim, J.", + "Ronneberger, O.", + "Bodenstein, S.", + "Zielinski, M.", + "Bridgland, A.", + "Potapenko, A.", + "Cowie, A.", + "Tunyasuvunakool, K.", + "Jain, R.", + "Clancy, E.", + "Kohli, P.", + "Jumper, J.", + "Hassabis, D.", + ], + doi="10.1101/2021.10.04.463034", +) + + +def _get_af2_software(version=None, is_multimer=False): + """Get AF2 as dictionary, suitable to create a modelcif software object.""" + if is_multimer: + return { + "name": "AlphaFold-Multimer", + "classification": "model building", + "description": "Structure prediction", + "citation": ihm.citations.af2_multimer, + "location": "https://github.com/deepmind/alphafold", + "type": "package", + "version": version, + } + else: + return { + "name": "AlphaFold", + "classification": "model building", + "description": "Structure prediction", + "citation": ihm.citations.alphafold2, + "location": "https://github.com/deepmind/alphafold", + "type": "package", + "version": version, + } + + +def _get_dssp_software(version=None): + """Get DSSP SW as dictionary, suitable to create a modelcif software + object.""" + # NOTE: this is using the old repo and old citation (ok for v2.3) + return { + "name": "DSSP", + "classification": "data collection", + "description": "Secondary structure assignment", + "citation": ihm.Citation( + pmid="6667333", + title="Dictionary of protein secondary structure: pattern " + + "recognition of hydrogen-bonded and geometrical features.", + journal="Biopolymers", + volume=22, + page_range=(2577, 2637), + year=1983, + authors=['Kabsch, W.', 'Sander, C.'], + doi="10.1002/bip.360221211", + ), + "location": "https://github.com/cmbi/dssp", + "type": "program", + "version": version, + } + + +def _get_foldseek_software(version=None): + """Get Foldseek SW as dictionary, suitable to create a modelcif software + object.""" + return { + "name": "Foldseek", + "classification": "data collection", + "description": "Protein structure search", + "citation": ihm.Citation( + pmid="37156916", + title="Fast and accurate protein structure search with Foldseek.", + journal="Nat Biotechnol", + volume=42, + page_range=(243, 246), + year=2024, + authors=[ + 'van Kempen, M.', 'Kim, S.S.', 'Tumescheit, C.', 'Mirdita, M.', + 'Lee, J.', 'Gilchrist, C.L.M.', 'Soeding, J.', 'Steinegger, M.' + ], + doi="10.1038/s41587-023-01773-0", + ), + "location": "https://search.foldseek.com/search", + "type": "package", + "version": version, + } + + +def _get_sam_cc_software(version=None): + """Get SamCC-Turbo SW as dictionary, suitable to create a modelcif software + object.""" + return { + "name": "SamCC-Turbo", + "classification": "data collection", + "description": "Detection and measurement of coiled coils", + "citation": ihm.Citation( + pmid="33325494", + title="A library of coiled-coil domains: from regular bundles to " + + "peculiar twists.", + journal="Bioinformatics", + volume=36, + page_range=(5368, 5376), + year=2020, + authors=[ + 'Szczepaniak, K.', 'Bukala, A.', 'da Silva Neto, A.M.', + 'Ludwiczak, J.', 'Dunin-Horkawicz, S.' + ], + doi="10.1093/bioinformatics/btaa1041", + ), + "location": "https://github.com/labstructbioinf/samcc_turbo", + "type": "package", + "version": version, + } + + +def _get_us_align_software(version=None): + """Get US-align SW as dictionary, suitable to create a modelcif software + object.""" + return { + "name": "US-align", + "classification": "data collection", + "description": "Universal Structural alignment of macromolecules", + "citation": ihm.Citation( + pmid="36038728", + title="US-align: universal structure alignments of proteins, " + + "nucleic acids, and macromolecular complexes.", + journal="Nat Methods", + volume=19, + page_range=(1109, 1115), + year=2022, + authors=['Zhang, C.', 'Shine, M.', 'Pyle, A.M.', 'Zhang, Y.'], + doi="10.1038/s41592-022-01585-1", + ), + "location": "https://zhanggroup.org/US-align/", + "type": "package", + "version": version, + } + + +def _get_af2_description(pred): + """Get description text based on parameters used.""" + if pred["version"] == "2.1.2": + num_models_str = "producing 5 models with 3 recycles" + elif pred["version"] == "2.3.1": + num_models_str = "producing 10 models (2 random seeds per parameter " \ + "set) with up to 20 recycles" + else: + raise RuntimeError(f"Unexpected AF2 version: '{pred['version']}'") + return f"using AlphaFold v{pred['version']} {num_models_str} and " \ + f"{pred['num_ensemble']} ensemble each, with AMBER relaxation, " \ + f"using templates up to date {pred['tmplate_cutoff_date']}, " \ + f"ranked by ipTM*0.8+pTM*0.2, starting from MSAs with reduced_dbs " \ + f"setting" + + +def _get_af2_merge_dict(pred): + """Get N/C-terminal padding lengths for artificial coiled-coils.""" + pred_name = os.path.splitext(pred["prediction_p"])[0] + ps = pred_name.split('_') + pred_merge = {"N": 0, "C": 0} + if "mergeN" in ps: + pred_merge["N"] = int(ps[ps.index("mergeN") + 1]) + if "mergeC" in ps: + pred_merge["C"] = int(ps[ps.index("mergeC") + 1]) + return pred_merge + + +def _get_af2_range_string(pred): + """Get string repr. of modelled range incl. coiled-coil padding.""" + range_str = f"{pred['prediction_start']}-{pred['prediction_end']}" + pred_merge = _get_af2_merge_dict(pred) + if pred_merge["N"] or pred_merge["C"]: + if pred_merge["N"] and pred_merge["C"]: + seq_str = "sequences" + else: + seq_str = "sequence" + pad_strs = [] + if pred_merge["N"]: + pad_strs.append(f"{pred_merge['N']} N-terminal residues") + if pred_merge["C"]: + pad_strs.append(f"{pred_merge['C']} C-terminal residues") + pad_str = ", ".join(pad_strs) + range_str += f" padded with artificial coiled-coil {seq_str} " \ + f"({pad_str}) to improve oligomerization" + return range_str + + +def _get_preds_and_parts(json_dict, mdl_type): + """Get list of AF2 pred. and trucated parts in a consistent way. + For domains, the final model is the truncated part and gets "model" + as truncated_prediction_path. + """ + if mdl_type == "assembly": + af2_preds = json_dict["AF_predictions"] + trunc_parts = json_dict["assembly_parts"] + elif mdl_type == "domain": + af2_pred = json_dict["source_AF_prediction"] + af2_preds = [af2_pred] + trunc_parts = [{ + "part_start": json_dict["domain_sequence_start"], + "part_end": json_dict["domain_sequence_end"], + "truncated_prediction_path": "model", + "source_prediction": os.path.splitext(af2_pred["prediction_p"])[0] + }] + else: + raise RuntimeError(f"Unexpected mdl_type: '{mdl_type}'") + # sort them by range + af2_preds.sort( + key=lambda pred: ( + int(pred["prediction_start"]), int(pred["prediction_end"]) + ) + ) + trunc_parts.sort( + key=lambda part: (part["part_start"], part["part_end"]) + ) + return af2_preds, trunc_parts + + +def _get_af2_files(af2_path): + """Return names for PDB file, PKL file, and PAE file.""" + af_base = os.path.splitext(af2_path)[0] + return af2_path, f"{af_base}_info.pkl", f"PAE_{af_base}.png" + + +def _get_protocol_steps_and_software(json_dict, mdl_type, mdl_id): + """Create the list of protocol steps with software and parameters used.""" + protocol = [] + + # LOGIC: + # - First steps are AF2 predictions with different settings + # -> each setting has a number of ranges + # -> if only one AF pred., the text gets simplified + # - Next step: models truncated (and superposed) into parts + # -> one step for all ranges + # -> no superpos. if only one part + # - For assemblies with more than one part: + # -> MODELLER step + # -> QE step combining pLDDT and MODELLER energy score + # Note: sanity checks all done in _do_sanity_checks and not here! + + # collect info on predictions and parts + af2_preds, trunc_parts = _get_preds_and_parts(json_dict, mdl_type) + + # group AF2 predictions by SW config + af2_config_preds = {} # dict = version, tmplate_cutoff_date, num_ensemble + for pred in af2_preds: + config_tuple = ( + pred["version"], pred["tmplate_cutoff_date"], pred["num_ensemble"] + ) + if config_tuple not in af2_config_preds: + af2_config_preds[config_tuple] = [pred] + else: + af2_config_preds[config_tuple].append(pred) + + # create AF2 steps + for config_tuple in sorted(af2_config_preds): + version, max_template_date, num_ensemble = config_tuple + sw_params = { + "model_preset": "multimer", + "db_preset": "reduced_dbs", + "max_template_date": max_template_date, + "num_ensemble": num_ensemble, + } + if version == "2.3.1": + sw_params["num_multimer_predictions_per_model"] = 2 + sw_plus_params = [( + _get_af2_software(version=version, is_multimer=True), sw_params + )] + # build up text + cur_preds = af2_config_preds[config_tuple] + ranges_txt = ", ".join([ + _get_af2_range_string(pred) for pred in cur_preds + ]) + af_txt = _get_af2_description(cur_preds[0]) # all cur_preds the same + # shortened text for single pred. + if len(af2_preds) == 1: + details = f"AlphaFold-Multimer prediction {af_txt} for range " \ + f"{ranges_txt}." + else: + details = f"AlphaFold-Multimer prediction of overlapping, " \ + f"partial sequences {af_txt}. The following residue " \ + f"ranges were predicted: {ranges_txt}." + # collect output files + output = [] + for pred in cur_preds: + output.extend(_get_af2_files(pred["prediction_p"])) + protocol.append({ + "method_type": "modeling", + "name": "AF2", + "details": details, + "input": [f"target_sequences_{mdl_id}", f"ref_dbs_AF2_{version}"], + "output": output, + "software_plus_params": sw_plus_params, + }) + + # create truncation step + ranges_txt = ", ".join([ + f"{part['part_start']}-{part['part_end']}" for part in trunc_parts + ]) + if len(trunc_parts) == 1: + details = f"Model truncated to predicted domain range {ranges_txt}." + else: + details = f"Models truncated and superposed based on the following " \ + f"overlapping residue ranges: {ranges_txt}." + trunc_data = [part["truncated_prediction_path"] for part in trunc_parts] + # add MDL ID suffix if needed + trunc_data = [ + (f"model_{mdl_id}" if td == "model" else td) for td in trunc_data + ] + protocol.append({ + "method_type": "other", + "name": "Truncation", + "details": details, + "input": sorted(set([ + f"{part['source_prediction']}.pdb" for part in trunc_parts + ])), + "output": trunc_data, + "software_plus_params": [], + }) + + # create other steps to combine truncated parts + if len(trunc_parts) > 1: + # MODELLER step + protocol.append({ + "method_type": "modeling", + "name": "MODELLER", + "details": "Assembly of the truncated models using MODELLER " \ + "executed using the \"AutoModel\" configuration.", + "input": trunc_data, + "output": [f"model_{mdl_id}"], + "software_plus_params": [(_get_modeller_software("10.4"), {})], + }) + # QE step + protocol.append({ + "method_type": "other", + "name": "QE", + "details": "Calculation of combined per-residue confidence " \ + "metric (range 0-100, higher is better) based on " \ + "per-residue AlphaFold confidence score (pLDDT) " \ + "(range 0-100, higher is better, for residues with " \ + "overlapping predictions, the higher pLDDT score is " \ + "used) and per-residue MODELLER energy score (>0, " \ + "lower is better) using the following formula: " \ + "confidence = clip(0,100, pLDDT - 60 / (1 + exp((150 " \ + "- energy)/20))). This score reduces the pLDDT score " \ + "by max. 60 (for energy values >> 150). For energy " \ + "values < 100, the score approximately follows the " \ + "pLDDT score.", + "input": [f"model_{mdl_id}"], + "output": [f"model_{mdl_id}"], + "software_plus_params": [], + }) + if "software_used_for_domain_annotation" in json_dict: + # domain annotation step + protocol.append({ + "method_type": "other", + "name": "domain annotation", + "details": "Domain annotation performed using a combination of " \ + "the following tools: DSSP, SamCC-Turbo, Foldseek, " \ + "US-align.", + "input": [f"model_{mdl_id}"], + "output": [f"model_{mdl_id}"], + "software_plus_params": [ + (_get_dssp_software("2.3.0"), {}), + (_get_sam_cc_software(), {}), + (_get_foldseek_software(), {}), + (_get_us_align_software(), {}) + ], + }) + + return protocol + + +def _process_ent(ent): + """Helper to process OST entities for sanity checks. + Returns: + - ch_names: list of chain names in order as appearing in file + - mdlsqe: atomseq (no gaps) for model + -> incl. assertion that all chains have same seq. + - resnum_range: (min. res. num., max. res. num.) + -> incl. assertion that res. num. are continuous and w/o gaps + - ent_bfs: b-factor for each residue + -> incl. assertion that all atoms of res. have same + -> list of length len(ch_names) * len(mdlsqe) + """ + # chain names in order + ch_names = [ch.name for ch in ent.chains] + # sequence + unique_atomseqs = sorted(set([ + "".join(res.one_letter_code for res in ch.residues) \ + for ch in ent.chains + ])) + assert len(unique_atomseqs) == 1 + mdlsqe = unique_atomseqs[0] + # res. nums (required to be continuous) + resnum_ranges = [] + for ch in ent.chains: + res_nums = [res.number.num for res in ch.residues] + assert res_nums == list(range(min(res_nums), max(res_nums) + 1)) + resnum_ranges.append((min(res_nums), max(res_nums))) + unique_resnum_ranges = sorted(set(resnum_ranges)) + assert len(unique_resnum_ranges) == 1 + resnum_range = unique_resnum_ranges[0] + # b-factors + ent_bfs = [] + for res in ent.residues: + b_factors = [a.b_factor for a in res.atoms] + assert len(set(b_factors)) == 1 # must all be equal! + ent_bfs.append(b_factors[0]) + assert len(ent_bfs) == len(ch_names) * len(mdlsqe) + return ch_names, mdlsqe, resnum_range, ent_bfs + + +def _load_cif_file(zip_dict): + """Read CIF file for given entry. + Returns OST entity and dictionary of categories to put back into resulting + ModelCIF file with gemmi (categories not supported by python-modelcif). + """ + # fetch file and fix for OST + cif_str = zip_dict["cif_file"] + doc = gemmi.cif.read_string(cif_str) + block = doc.sole_block() + cat_dict = block.get_mmcif_category('_struct_ref.') + if cat_dict and "db_code" not in cat_dict: + if "pdbx_db_accession" in cat_dict: + cat_dict["db_code"] = cat_dict["pdbx_db_accession"] + else: + lengths = [len(v) for v in cat_dict.values()] + assert len(set(lengths)) == 1 + cat_dict["db_code"] = [""] * lengths[0] + block.set_mmcif_category('_struct_ref.', cat_dict) + # kept mmCIF data: to check consistency later + kept_cif_cats = {} + for cat in ["_pdbx_domain.", "_pdbx_feature_domain.", "_pdbx_domain_range."]: + cat_dict = block.get_mmcif_category(cat) + if cat_dict: + kept_cif_cats[cat] = cat_dict + # fix data + if len(kept_cif_cats) == 3: + # fix IDs if possible + pd_ids = kept_cif_cats["_pdbx_domain."]["id"] + pfd_ids = kept_cif_cats["_pdbx_feature_domain."]["id"] + pfd_dom_ids = kept_cif_cats["_pdbx_feature_domain."]["domain_id"] + pdr_dom_ids = kept_cif_cats["_pdbx_domain_range."]["domain_id"] + if pdr_dom_ids != pd_ids and pfd_ids == pd_ids \ + and pfd_dom_ids == pdr_dom_ids: + kept_cif_cats["_pdbx_feature_domain."]["domain_id"] = pd_ids + kept_cif_cats["_pdbx_domain_range."]["domain_id"] = pd_ids + # fix characters in expected places + exp_cat_items = ['_pdbx_domain.details', '_pdbx_feature_domain.feature'] + for cat, cat_dict in kept_cif_cats.items(): + for item in cat_dict.keys(): + if f"{cat}{item}" in exp_cat_items: + cat_dict[item] = [ + value.replace('°-', ' degree ') \ + .replace('°', ' degree') \ + .replace('αβ', 'alpha/beta') \ + .replace('α', 'alpha') \ + .replace('β', 'beta') \ + .replace('“', '"') \ + .replace('→', ' to ') \ + for value in cat_dict[item] + ] + # get OST ent. + cif_ent, info, ss = io.MMCifStrToEntity( + doc.as_string(), profile=io.profiles["DEFAULT"], process=True + ) + return cif_ent, kept_cif_cats + + +def _do_sanity_checks(mdl_id, json_dict, zip_dict, cif_ent, kept_cif_cats): + """Check if everything in order and return issues for weird cases.""" + issues = [] + + # get some general info depending on model type + mdl_type = zip_dict["mdl_type"] + plddt_len = len(json_dict["pLDDT"]) + if mdl_type == "assembly": + mdl_seq = json_dict["predicted_sequence"] + seq_range = ( + json_dict["predicted_sequence_start"], + json_dict["predicted_sequence_end"], + ) + ent_desc = json_dict["source_sequence_description"] + assert "software_used_for_domain_annotation" in json_dict + assert len(json_dict["per_res_conf"]) == plddt_len + else: + assert "predicted_sequence" not in json_dict + mdl_seq = json_dict["domain_sequence"] + seq_range = ( + json_dict["domain_sequence_start"], + json_dict["domain_sequence_end"], + ) + label = f'domain/motif "{json_dict["domain_name"]}"' + assert label in json_dict["abstract"] + ent_desc = f'{json_dict["source_sequence_description"]} ({label})' + assert "merge" not in json_dict["source_AF_prediction"]["prediction_p"] + assert "software_used_for_domain_annotation" not in json_dict + assert "per_res_conf" not in json_dict + + # global checks + assert sum((c in "XOUBJZ") for c in mdl_seq) == 0 + src_seq = json_dict["source_sequence"] + assert mdl_seq == src_seq[seq_range[0] - 1:seq_range[1]] + if len(mdl_seq) != plddt_len: + short_data = (len(mdl_seq), plddt_len) + long_data = (mdl_seq, json_dict["pLDDT"]) + issues.append(("plddt_size_mismatch", short_data, long_data)) + assert seq_range[1] > seq_range[0] + if seq_range[1] > len(src_seq): + short_data = (seq_range, len(src_seq)) + issues.append(("range_source_mismatch", short_data, [])) + for key in ["percent_confident_residues", "mean_per_res_conf"]: + assert key in json_dict + extra_keys = set(json_dict.keys()) - set([ + 'AF_predictions', + 'MODELLER_energy_score', + 'abstract', + 'assembly_parts', + 'domain_name', + 'domain_sequence', + 'domain_sequence_end', + 'domain_sequence_start', + 'mean_per_res_conf', + 'pLDDT', + 'per_res_conf', + 'percent_confident_residues', + 'predicted_sequence', + 'predicted_sequence_end', + 'predicted_sequence_start', + 'software_used_for_domain_annotation', + 'software_used_for_prediction', + 'source_AF_prediction', + 'source_sequence', + 'source_sequence_RefSeq_ID', + 'source_sequence_description', + 'title', + 'source_sequence_download_date', + 'domain_id', + 'coiled_coil_annot', + ]) + if len(extra_keys) > 0: + issues.append(("extra_keys", sorted(extra_keys), [])) + # unused/unknown coiled_coil_annot (expect to be None if there) + assert json_dict.get("coiled_coil_annot") is None + + # for valid mmCIF... + mmcif_regex_ent = "[][ \t_(),.;:\"&<>/\\\\{}'`~!@#$%?+=*A-Za-z0-9|^-]*" + mmcif_regex_desc = "[][ \n\t()_,.;:\"&<>/\\\\{}'`~!@#$%?+=*A-Za-z0-9|^-]*" + assert bool(re.fullmatch(mmcif_regex_ent, ent_desc)) + assert bool(re.fullmatch(mmcif_regex_desc, json_dict["title"])) + assert bool(re.fullmatch(mmcif_regex_desc, json_dict["abstract"])) + + # collect info on predictions and parts + af2_preds, trunc_parts = _get_preds_and_parts(json_dict, mdl_type) + + # check AF pred. + exp_pred_keys = set([ + "prediction_start", "prediction_end", "prediction_p", "version", + "tmplate_cutoff_date", "num_ensemble", "pTM", "ipTM" + ]) + opt_pred_keys = set([ + "merged_GCN4linker_len_N", "merged_GCN4linker_len_C" + ]) + pred_ranges = {} + sw_kws_to_check = set() + for pred in af2_preds: + # check keys + extra_keys = set(pred.keys()) - exp_pred_keys - opt_pred_keys + assert len(extra_keys) == 0 + noncovered_keys = exp_pred_keys - pred.keys() + assert len(noncovered_keys) == 0 + assert pred["version"] in pred["prediction_p"] + # keep track of files and ranges for checks of parts + pred_name = os.path.splitext(pred["prediction_p"])[0] + assert pred_name not in pred_ranges + pred_range = ( + int(pred["prediction_start"]), int(pred["prediction_end"]) + ) + pred_ranges[pred_name] = pred_range + # keep track of AF2 versions + sw_kws_to_check.add(("AlphaFold", pred["version"])) + # check merge stuff + pred_merge = _get_af2_merge_dict(pred) + dict_pred_merge = {"N": 0, "C": 0} + if "merged_GCN4linker_len_N" in pred.keys(): + dict_pred_merge["N"] = pred["merged_GCN4linker_len_N"] + if "merged_GCN4linker_len_C" in pred.keys(): + dict_pred_merge["C"] = pred["merged_GCN4linker_len_C"] + assert pred_merge == dict_pred_merge + # check acc. file content + pdb_file, pkl_file, pae_file = _get_af2_files(pred["prediction_p"]) + pdb_ent = io.PDBStrToEntity( + zip_dict["accompanying_data"][pdb_file], + profile=io.profiles["DEFAULT"], + process=True + ) + pkl_data = pickle.loads(zip_dict["accompanying_data"][pkl_file]) + ( + pdb_ch_names, pdb_mdlsqe, pdb_resnum_range, pdb_ent_bfs + ) = _process_ent(pdb_ent) + # pdb_ch_names can have random order and we don't care... + # we ignore pdb_resnum_range and just check seq. + exp_mdlsqe = src_seq[pred_range[0] - 1:pred_range[1]] + cut_end = len(pdb_mdlsqe) - pred_merge["C"] + pdb_mdlsqe_cut = pdb_mdlsqe[pred_merge["N"]:cut_end] + assert pdb_mdlsqe_cut == exp_mdlsqe + # check QE + exp_qe_len = len(pdb_ch_names) * len(pdb_mdlsqe) + assert len(pkl_data["pLDDT"]) == exp_qe_len + assert len(pkl_data["PAE"]) == exp_qe_len + assert len(pkl_data["PAE"][0]) == exp_qe_len + assert "pTM" in pkl_data + assert "ipTM" in pkl_data + qe_max_diff = np.max(abs(np.asarray(pkl_data["pLDDT"]) - pdb_ent_bfs)) + if qe_max_diff > 0.01: + # 2nd option: chain names in PDB were reordered compared to pkl + bfs_arr = np.asarray(pdb_ent_bfs).reshape(len(pdb_ch_names), -1) + ch_indices = [pdb_ch_names.index(ch) for ch in ['A', 'C', 'B']] + bfs_new = bfs_arr[ch_indices,:].flatten() + qe_max_diff = np.max(abs(np.asarray(pkl_data["pLDDT"]) - bfs_new)) + if qe_max_diff > 0.01: + short_data = [qe_max_diff, pkl_file] + long_data = [list(pkl_data["pLDDT"]), pdb_ent_bfs] + issues.append(("pkl_plddt_diff", short_data, long_data)) + # check redundant data + assert str(pred["prediction_start"]) == pkl_data["pred_start"] + assert str(pred["prediction_end"]) == pkl_data["pred_end"] + assert pred["tmplate_cutoff_date"] == pkl_data["tmpl_max_date"] + assert pred["version"] == pkl_data["version"] + assert pred["num_ensemble"] == pkl_data["num_ensemble"] + if pred["pTM"] != pkl_data["pTM"]: + long_data = [pred["pTM"], float(pkl_data["pTM"])] + short_data = [abs(long_data[0] - long_data[1]), pkl_file] + issues.append(("pkl_ptm_diff", short_data, long_data)) + if pred["ipTM"] != pkl_data["ipTM"]: + long_data = [pred["ipTM"], float(pkl_data["ipTM"])] + short_data = [abs(long_data[0] - long_data[1]), pkl_file] + issues.append(("pkl_iptm_diff", short_data, long_data)) + if "merged_GCN4linker_len_N" in pred: + if "merge_len_N" in pkl_data.keys(): + other_len = pkl_data["merge_len_N"] + else: + # HACK for 0898 + assert "merge_len" in pkl_data.keys() + other_len = pkl_data["merge_len"] + assert pred["merged_GCN4linker_len_N"] == other_len + if "merged_GCN4linker_len_C" in pred: + assert pred["merged_GCN4linker_len_C"] == pkl_data["merge_len_C"] + # check expected fixed data + assert pkl_data["max_msa_size"] == 10000 + assert pkl_data["db_preset"] == "reduced_dbs" + assert pkl_data["amber_relax"] == True + + # check truncated parts + if len(trunc_parts) > 1: + assert len(json_dict["MODELLER_energy_score"]) == plddt_len + sw_kws_to_check.add(("MODELLER", "10.4")) + else: + assert "MODELLER_energy_score" not in json_dict + exp_part_keys = sorted([ + "part_start", "part_end", "truncated_prediction_path", + "source_prediction" + ]) + part_names = [] + source_predictions = [] + for part in trunc_parts: + # check metadata + assert sorted(part.keys()) == exp_part_keys + source_predictions.append(part["source_prediction"]) + part_names.append(part["truncated_prediction_path"]) + pred_range = pred_ranges[part["source_prediction"]] + assert part["part_start"] >= pred_range[0] + assert part["part_end"] <= pred_range[1] + # check acc. file content + part_path = part["truncated_prediction_path"] + if part_path != "model": + pdb_ent = io.PDBStrToEntity( + zip_dict["accompanying_data"][part_path], + profile=io.profiles["DEFAULT"], + process=True + ) + ( + pdb_ch_names, pdb_mdlsqe, pdb_resnum_range, pdb_ent_bfs + ) = _process_ent(pdb_ent) + part_range = (part["part_start"], part["part_end"]) + assert pdb_resnum_range == part_range # not really important + exp_mdlsqe = src_seq[part_range[0] - 1:part_range[1]] + assert exp_mdlsqe == pdb_mdlsqe + + # check CIF file + ( + cif_ch_names, cif_mdlsqe, cif_resnum_range, cif_ent_bfs + ) = _process_ent(cif_ent) + assert seq_range == cif_resnum_range # NOTE: critical for kept_cif_cats + assert cif_mdlsqe == mdl_seq + if cif_ch_names != ['A', 'B', 'C']: + issues.append(("cif_ch_names", cif_ch_names, [])) + # check b-factors (assume to match average or first chain) + if mdl_id == "ma-taas-2867": + # known to have faulty b-factors in CIF file; should use ones from PDB + bfs_arr = np.asarray(pdb_ent_bfs).reshape(len(cif_ch_names), -1) + bfs_avg = bfs_arr[0,:] + else: + bfs_arr = np.asarray(cif_ent_bfs).reshape(len(cif_ch_names), -1) + bfs_avg = bfs_arr.mean(axis=0) + assert len(bfs_avg) == len(cif_mdlsqe) + if "per_res_conf" in json_dict: + # for assemblies + jd_sc = json_dict["per_res_conf"] + max_diff = np.max(abs(bfs_avg - jd_sc)) + if max_diff >= 0.01: + long_data = [bfs_avg.tolist(), jd_sc] + issues.append(("per_res_conf", max_diff, long_data)) + else: + # for domains + jd_sc = json_dict["pLDDT"] + max_diff = np.max(abs(bfs_arr[0,:] - jd_sc)) + # b-factors known to be faulty for some models... + if max_diff >= 0.01 and mdl_id[-4:] not in ["9293", "9344"]: + long_data = [bfs_arr[0,:].tolist(), jd_sc] + issues.append(("cif_b_factors", max_diff, long_data)) + + # make sure prediction covers everything + min_start = min(part["part_start"] for part in trunc_parts) + max_end = max(part["part_end"] for part in trunc_parts) + if min_start != seq_range[0] or max_end != seq_range[1]: + short_data = [(min_start, max_end), seq_range] + issues.append(("pred_range_mismatch", short_data, [])) + assert len(set(part_names)) == len(trunc_parts) + assert sorted(set(source_predictions)) == sorted(pred_ranges) + + # all files there? + exp_files = [] + for pred in af2_preds: + exp_files.extend(_get_af2_files(pred["prediction_p"])) + for part in trunc_parts: + if part["truncated_prediction_path"] != "model": + exp_files.append(part["truncated_prediction_path"]) + acc_files = sorted(zip_dict["accompanying_data"].keys()) + assert len(set(exp_files) - set(acc_files)) == 0 + extra_files = set(acc_files) - set(exp_files) + if len(extra_files) > 0: + long_data = [sorted(exp_files), acc_files] + issues.append(("extra_acc_files", sorted(extra_files), long_data)) + + # check SW + sw_checked = set() + claimed_sw = json_dict["software_used_for_prediction"] + for kw_to_check in sw_kws_to_check: + matching_sw = [ + sw for sw in claimed_sw \ + if all(kw in sw for kw in kw_to_check) + ] + assert len(matching_sw) == 1 + sw_checked.add(matching_sw[0]) + assert sorted(sw_checked) == sorted(claimed_sw) + if "software_used_for_domain_annotation" in json_dict: + claimed_sw = json_dict["software_used_for_domain_annotation"] + exp_sw = [ + 'DSSP 2.3.0 (Kabsch and Sander 1983)', + 'Foldseek (van Kempen et al. 2024)', + 'SamCC-Turbo (Szczepaniak et al. 2020)', + 'US-align (Zhang et al. 2022)' + ] + assert sorted(claimed_sw) == exp_sw + + # QE checks + plddts = json_dict["pLDDT"] + plddt_range = (min(plddts), max(plddts)) + if plddt_range[0] < 0 or plddt_range[1] > 100: + issues.append(("plddt_range_mismatch", plddt_range, plddts)) + if "MODELLER_energy_score" in json_dict: + energy = json_dict["MODELLER_energy_score"] + if min(energy) <= 0: + issues.append(("energy_range_mismatch", min(energy), energy)) + if "per_res_conf" in json_dict: + per_res_confs = json_dict["per_res_conf"] + prc_range = (min(per_res_confs), max(per_res_confs)) + if prc_range[0] < 0 or prc_range[1] > 100: + issues.append(("prc_range_mismatch", prc_range, per_res_confs)) + if "MODELLER_energy_score" not in json_dict \ + and "per_res_conf" in json_dict: + v1 = np.asarray(plddts) + v2 = np.asarray(per_res_confs) + qe_max_diff = np.max(abs(v1 - v2)) + if qe_max_diff > 0.05: + long_data = [plddts, per_res_confs] + issues.append(("prc_plddt_diff", qe_max_diff, long_data)) + + # check domains + if mdl_type == "assembly": + exp_num_kept_cif_cats = 3 + else: + exp_num_kept_cif_cats = 0 + if exp_num_kept_cif_cats != len(kept_cif_cats): + short_data = (exp_num_kept_cif_cats, sorted(kept_cif_cats.keys())) + issues.append(("num_kept_cif_cats", short_data, kept_cif_cats)) + # check categories + if len(kept_cif_cats) == 3: + # the following should all match after fixes appled in _load_cif_file + pd_ids = sorted(kept_cif_cats["_pdbx_domain."]["id"]) + if len(set(pd_ids)) != len(pd_ids): + issues.append(("non_unique_pd_ids", [], pd_ids)) + pfd_ids = kept_cif_cats["_pdbx_feature_domain."]["id"] + if len(set(pfd_ids)) != len(pfd_ids): + issues.append(("non_unique_pfd_ids", [], pfd_ids)) + pfd_dom_ids = sorted( + kept_cif_cats["_pdbx_feature_domain."]["domain_id"] + ) + if pd_ids != pfd_dom_ids: + issues.append(("pfd_dom_ids", [], [pd_ids, pfd_dom_ids])) + pdr_dom_ids = sorted(kept_cif_cats["_pdbx_domain_range."]["domain_id"]) + if pd_ids != pdr_dom_ids: + issues.append(("pdr_dom_ids", [], [pd_ids, pdr_dom_ids])) + # special characters? + for cat, cat_dict in kept_cif_cats.items(): + for item, values in cat_dict.items(): + for value in values: + # I use the most permissive one + if not re.fullmatch(mmcif_regex_desc, str(value)): + invalid_chars = re.sub(mmcif_regex_desc, '', str(value)) + issues.append(( + "cif_invalid_chars", + (f"{cat}{item}", invalid_chars), + value + )) + # matches the range? + cat_dict = kept_cif_cats["_pdbx_domain_range."] + res_list = list(zip( + cat_dict["beg_label_asym_id"], + cat_dict["beg_label_comp_id"], + cat_dict["beg_label_seq_id"], + )) + list(zip( + cat_dict["end_label_asym_id"], + cat_dict["end_label_comp_id"], + cat_dict["end_label_seq_id"], + )) + for ch_name, res_name, res_num in res_list: + res = cif_ent.FindResidue(ch_name, int(res_num)) + if not res.IsValid() or not res.name == res_name: + issues.append(( + "invalid_dom_range_res", [ch_name, res_name, res_num], [] + )) + + return issues + + +def _get_entities(json_dict, ncbi_data, cif_ent, mdl_type): + """Gather data for the mmCIF (target) entities. + Returns (list of target entities, list of issues) + """ + issues = [] + # get NCBI data + ncbi_ac = json_dict["source_sequence_RefSeq_ID"] + ncbi_item = ncbi_data[ncbi_ac] + ncbi_info = ncbi_item["info"] + assert ncbi_ac == ncbi_info["AccessionVersion"] + ncbi_seq = ncbi_item["seq_str"] + # get json_dict data + if mdl_type == "assembly": + ent_desc = json_dict["source_sequence_description"] + # full seq. + seqres = json_dict["source_sequence"] + ncbi_range = (1, len(ncbi_seq)) + mdl_range = ncbi_range + else: + label = f'domain/motif "{json_dict["domain_name"]}"' + ent_desc = f'{json_dict["source_sequence_description"]} ({label})' + # cut seq. + seqres = json_dict["domain_sequence"] + ncbi_range = ( + json_dict["domain_sequence_start"], + json_dict["domain_sequence_end"], + ) + mdl_range = (1, len(seqres)) + auth_seq_id_map = { + pdb_idx: ncbi_idx for pdb_idx, ncbi_idx in zip( + range(mdl_range[0], mdl_range[1] + 1), + range(ncbi_range[0], ncbi_range[1] + 1) + ) + } + # compare with NCBI + ncbi_seq_cut = ncbi_seq[ncbi_range[0] - 1:ncbi_range[1]] + if seqres != ncbi_seq_cut: + # one case exists with a single mismatch + assert len(seqres) == len(ncbi_seq_cut) + mismatches = [ + (idx + 1, c1, c2) \ + for idx, (c1, c2) in enumerate(zip(seqres, ncbi_seq_cut)) \ + if c1 != c2 + ] + issues.append(("mismatches", mismatches, [seqres, ncbi_seq_cut])) + else: + mismatches = [] + # fill data + cif_ch_names, cif_mdlsqe, _, _ = _process_ent(cif_ent) + entity = { + "seqres": seqres, + "ncbi_seq": ncbi_seq, + "sqe_no_gaps": cif_mdlsqe, + "mismatches": mismatches, + "ncbi_range": ncbi_range, + "mdl_range": mdl_range, + "auth_seq_id_map": auth_seq_id_map, + "pdb_chain_ids": cif_ch_names, + "description": ent_desc, + "ncbi_acv": ncbi_info["AccessionVersion"], + "ncbi_gi": str(ncbi_info["Gi"]), + "ncbi_taxid": str(ncbi_info["TaxId"]), + "ncbi_organism": ncbi_info["SpeciesName"], + "ncbi_last_mod": datetime.datetime.strptime( + ncbi_info["UpdateDate"], "%Y/%m/%d" + ) + } + return [entity], issues +################################################################################ + +################################################################################ +# ModelCIF HANDLING +################################################################################ +# pylint: disable=too-few-public-methods +class _GlobalPCR(modelcif.qa_metric.Global, modelcif.qa_metric.NormalizedScore): + """Percentage of confident (score > 70) residues in [0,1]""" + name = "percent confident residues" + software = None + + +class _GlobalPRC(modelcif.qa_metric.Global, modelcif.qa_metric.PLDDT): + """Average per-residue confidence (pLDDT-like) in [0,1]""" + name = "average per-residue confidence (pLDDT-like)" + software = None + + +class _GlobalPRC_pLDDT(modelcif.qa_metric.Global, modelcif.qa_metric.PLDDT): + """Average per-residue confidence (pLDDT) in [0,1]""" + name = "average per-residue confidence (pLDDT)" + software = None + + +class _LocalPLDDT(modelcif.qa_metric.Local, modelcif.qa_metric.PLDDT): + """Predicted accuracy according to the CA-only lDDT in [0,100]""" + name = "pLDDT" + software = None + + +class _LocalMOD(modelcif.qa_metric.Local, modelcif.qa_metric.Energy): + """Per-residue MODELLER energy score (>0, lower is better).""" + name = "MODELLER energy score" + software = None + + +class _LocalPRC(modelcif.qa_metric.Local, modelcif.qa_metric.PLDDT): + """Per-residue confidence score in [0,100].""" + name = "confidence score (pLDDT-like)" + software = None + + +class _NcbiTrgRef(modelcif.reference.TargetReference): + """NCBI as target reference.""" + name = "NCBI" + other_details = None + + +class _LPeptideAlphabetWithXO(ihm.LPeptideAlphabet): + """Have the default amino acid alphabet plus 'X' for unknown residues + and 'O' as allowed non-def. AA (U already in alphabet).""" + + # extra entry added according to LPeptideAlphabet def. in + # https://python-ihm.readthedocs.io/en/latest/_modules/ihm.html + # and https://files.rcsb.org/view/1NTH.cif for values for 'O'. + + def __init__(self): + """Create the alphabet.""" + super().__init__() + self._comps["X"] = self._comps["UNK"] + self._comps['O'] = ihm.LPeptideChemComp( + "PYL", "O", "O", "PYRROLYSINE", "C12 H21 N3 O3" + ) +# pylint: enable=too-few-public-methods + + +# TEST: _get_not_modeled_residues from modelcif/util/make_mmcif.py +def _get_not_modeled_residues(model): + """Yield NotModeledResidueRange objects for all residue ranges in the + Model that are not referenced by Atom, Sphere, or pre-existing + NotModeledResidueRange objects""" + for assem in model.assembly: + asym = assem.asym if hasattr(assem, 'asym') else assem + if not asym.entity.is_polymeric(): + continue + # Make a set of all residue indices of this asym "handled" either + # by being modeled (with Atom or Sphere objects) or by being + # explicitly marked as not-modeled + handled_residues = set() + for rr in model.not_modeled_residue_ranges: + if rr.asym_unit is asym: + for seq_id in range(rr.seq_id_begin, rr.seq_id_end + 1): + handled_residues.add(seq_id) + for atom in model.get_atoms(): + if atom.asym_unit is asym: + handled_residues.add(atom.seq_id) + # Convert set to a list of residue ranges + handled_residues = ihm.util._make_range_from_list( + sorted(handled_residues)) + # Return not-modeled for each non-handled range + for r in ihm.util._invert_ranges(handled_residues, + end=assem.seq_id_range[1], + start=assem.seq_id_range[0]): + yield modelcif.model.NotModeledResidueRange(asym, r[0], r[1]) + + +class _OST2ModelCIF(modelcif.model.AbInitioModel): + """Map OST entity elements to ihm.model""" + + def __init__(self, *args, **kwargs): + """Initialise a model""" + # process kwargs + for i in ["ost_entity", "asym", "scores_json"]: + if i not in kwargs: + raise TypeError(f"Required keyword argument '{i}' not found.") + self.ost_entity = kwargs.pop("ost_entity") + self.asym = kwargs.pop("asym") + self.scores_json = kwargs.pop("scores_json") + + # call parent init (needs to pop new kwargs first!) + super().__init__(*args, **kwargs) + + # use auth IDs for res. nums and chain names + self.use_auth = False + + # fetch b-factors for all residues (for get_atoms) + self.res_bfactors = { + r.qualified_name: self.scores_json["b_factors"][i] \ + for i, r in enumerate(self.ost_entity.residues) + } + + # need reverse mapping from chain name + res num to seq. id + self.seq_id_map = { + ch_id: {v: k for k, v in asym_unit.auth_seq_id_map.items()} \ + for ch_id, asym_unit in self.asym.items() + } + + # explicitly add parts which were not modelled + for ch in self.ost_entity.chains: + # get covered range + asym_unit = self.asym[_get_ch_name(ch, self.use_auth)] + res_nums = [res.number.num for res in ch.residues] + res_first = min(res_nums) + res_last = max(res_nums) + # assertion true in this set (no gaps in modelled res. nums) + assert res_nums == list(range(res_first, res_last + 1)) + # compare with ent seq. + mdl_seq_first = self.seq_id_map[ch.name][res_first] + mdl_seq_last = self.seq_id_map[ch.name][res_last] + ent_seq_first, ent_seq_last = asym_unit.seq_id_range + if mdl_seq_first != ent_seq_first: + assert mdl_seq_first > ent_seq_first + self.not_modeled_residue_ranges.append( + modelcif.model.NotModeledResidueRange( + asym_unit, ent_seq_first, mdl_seq_first - 1,\ + ) + ) + # NOTE: the case below doesn't actually happen in this set... + if mdl_seq_last != ent_seq_last: + assert mdl_seq_last < ent_seq_last + self.not_modeled_residue_ranges.append( + modelcif.model.NotModeledResidueRange( + asym_unit, mdl_seq_last + 1, ent_seq_last,\ + ) + ) + # -> note: could be auto-filled as in modelcif/util/make_mmcif.py + # (see commented _get_not_modeled_residues TEST here) + # -> see https://github.com/ihmwg/python-modelcif/issues/37 + + + def get_atoms(self): + # ToDo [internal]: Take B-factor out since its not a B-factor? + # NOTE: this assumes that _get_res_num maps residue to pos. in seqres + # within asym + for atm in self.ost_entity.atoms: + yield modelcif.model.Atom( + asym_unit=self.asym[_get_ch_name(atm.chain, self.use_auth)], + seq_id=self.seq_id_map[atm.chain.name][atm.residue.number.num], + atom_id=atm.name, + type_symbol=atm.element, + x=atm.pos[0], + y=atm.pos[1], + z=atm.pos[2], + het=atm.is_hetatom, + biso=self.res_bfactors[atm.residue.qualified_name], + occupancy=atm.occupancy, + ) + + def add_scores(self): + """Add QA metrics from AF2 scores.""" + # global scores + for score, score_class in [ + ("percent_confident_residues", _GlobalPCR), + ("mean_per_res_conf", _GlobalPRC), + ("mean_per_res_conf_plddt", _GlobalPRC_pLDDT) + ]: + if score in self.scores_json: + self.qa_metrics.append(score_class(self.scores_json[score])) + + # local scores + for score, score_class in [ + ("pLDDT", _LocalPLDDT), + ("MODELLER_energy_score", _LocalMOD), + ("per_res_conf", _LocalPRC), + ]: + if score in self.scores_json: + i = 0 + for chn in self.ost_entity.chains: + for res in chn.residues: + seq_id = self.seq_id_map[chn.name][res.number.num] + asym = self.asym[chn.name].residue(seq_id) + self.qa_metrics.append( + score_class(asym, self.scores_json[score][i]) + ) + i += 1 + + +def _get_modelcif_entities(target_ents, asym_units, system): + """Create ModelCIF entities and asymmetric units.""" + alphabet = _LPeptideAlphabetWithXO() + for cif_ent in target_ents: + # collect references + ncbi_ref = _NcbiTrgRef( + code=cif_ent["ncbi_acv"], + accession=cif_ent["ncbi_gi"], + ncbi_taxonomy_id=cif_ent["ncbi_taxid"], + organism_scientific=cif_ent["ncbi_organism"], + sequence_version_date=cif_ent["ncbi_last_mod"], + sequence=cif_ent["ncbi_seq"], + ) + # add alignment incl. mismatches + ncbi_ref.alignments.append(modelcif.reference.Alignment( + db_begin=cif_ent["ncbi_range"][0], + db_end=cif_ent["ncbi_range"][1], + entity_begin=cif_ent["mdl_range"][0], + entity_end=cif_ent["mdl_range"][1], + seq_dif=[ + ihm.reference.SeqDif( + seq_id, + alphabet[olc_db], + alphabet[olc] + ) for seq_id, olc, olc_db in cif_ent["mismatches"] + ] + )) + # + references = [ncbi_ref] + # combine into ModelCIF entity + mdlcif_ent = modelcif.Entity( + cif_ent["seqres"], + description=cif_ent["description"], + alphabet=alphabet, + source=ihm.source.Natural( + ncbi_taxonomy_id=cif_ent["ncbi_taxid"], + scientific_name=cif_ent["ncbi_organism"], + ), + references=references, + ) + # NOTE: this assigns (potentially new) alphabetic chain names + for pdb_chain_id in cif_ent["pdb_chain_ids"]: + asym_units[pdb_chain_id] = modelcif.AsymUnit( + mdlcif_ent, strand_id=pdb_chain_id, + auth_seq_id_map=cif_ent["auth_seq_id_map"] + ) + system.entities.append(mdlcif_ent) + + +def _get_assoc_af2_pdb_data(pred): + """Generate a modelcif.data.Data object for an AF2 prediction.""" + return modelcif.data.Data( + name=f"AlphaFold-Multimer (v{pred['version']}) model for range " + + f"{_get_af2_range_string(pred)}; pTM {pred['pTM']}, " + + f"ipTM {pred['ipTM']}", + ) + + +def _get_assoc_af2_json_data(af2_path): + """Generate a modelcif.data.Data object for JSON file + with extra QE for PDB file in af2_path. + """ + return modelcif.data.Data( + name=f"Detailed quality estimates (pLDDT, PAE) for {af2_path}", + ) + + +def _get_assoc_pae_png_data(af2_path): + """Generate a modelcif.data.Data object for PNG file + with PAE plot for PDB file in af2_path. + """ + return modelcif.data.Data( + name=f"Plot showing PAE matrix for {af2_path}", + ) + + +def _get_assoc_trunc_pdb_data(part, is_single): + """Generate a modelcif.associated.File object pointing to PDB file + for truncated model an AF2 prediction. + """ + if is_single: + mdl_txt = "model" + else: + mdl_txt = "and superposed model" + return modelcif.data.Data( + name=f"Truncated {mdl_txt} for range {part['part_start']}-" + + f"{part['part_end']} derived from AlphaFold-Multimer model " + + f"{part['source_prediction']}.pdb", + ) + + +def _get_associated_file(fle_path, data, file_format="other", + file_content="other"): + """Generate a modelcif.associated.File object for given data.""" + afile = modelcif.associated.File( + fle_path, + details=data.name, + data=data, + ) + afile.file_format = file_format + afile.file_content = file_content + return afile + + +def _get_associated_files(mdl_name, arc_files): + """Create entry for associated files.""" + # package all into zip file + return modelcif.associated.Repository( + "", + [modelcif.associated.ZipFile(f"{mdl_name}.zip", files=arc_files)], + ) + # NOTE: by convention MA expects zip file with same name as model-cif + + +global_ref_dbs = {} +def _get_ref_db_object(name, url, version=None, release_date=None): + """Cached access to modelcif.ReferenceDatabase objects. + Needed to remove duplicates in ModelCIF. + """ + key = (name, url, version, release_date) + if key not in global_ref_dbs: + global_ref_dbs[key] = modelcif.ReferenceDatabase( + name, url, version, release_date + ) + return global_ref_dbs[key] + + +def _get_sequence_dbs(config_data): + """Get AF seq. DBs.""" + # hard coded UniProt release (see https://www.uniprot.org/release-notes) + # (TO BE UPDATED FOR EVERY DEPOSITION!) + pdb_rel_date = config_data["pdb_rel_date"] + up_version = config_data["up_version"] + up_version_dates = { + "2021_03": datetime.datetime(2021, 6, 2), + "2022_05": datetime.datetime(2022, 12, 14), + } + up_rel_date = up_version_dates[up_version] + # fill list of DBs + seq_dbs = [] + if config_data["use_small_bfd"]: + seq_dbs.append(_get_ref_db_object( + "Reduced BFD", + "https://storage.googleapis.com/alphafold-databases/" + + "reduced_dbs/bfd-first_non_consensus_sequences.fasta.gz" + )) + else: + seq_dbs.append(_get_ref_db_object( + "BFD", + "https://storage.googleapis.com/alphafold-databases/" + + "casp14_versions/" + + "bfd_metaclust_clu_complete_id30_c90_final_seq.sorted_opt.tar.gz", + version="6a634dc6eb105c2e9b4cba7bbae93412", + )) + if config_data["af_version"] < "2.3.0": + seq_dbs.append(_get_ref_db_object( + "MGnify", + "https://storage.googleapis.com/alphafold-databases/" + + "casp14_versions/mgy_clusters_2018_12.fa.gz", + version="2018_12", + release_date=datetime.datetime(2018, 12, 6), + )) + seq_dbs.append(_get_ref_db_object( + "Uniclust30", + "https://storage.googleapis.com/alphafold-databases/" + + "casp14_versions/uniclust30_2018_08_hhsuite.tar.gz", + version="2018_08", + release_date=None, + )) + else: + # NOTE: release date according to https://ftp.ebi.ac.uk/pub/databases/metagenomics/peptide_database/2022_05/ + seq_dbs.append(_get_ref_db_object( + "MGnify", + "https://storage.googleapis.com/alphafold-databases/" + + "v2.3/mgy_clusters_2022_05.fa.gz", + version="2022_05", + release_date=datetime.datetime(2022, 5, 6), + )) + seq_dbs.append(_get_ref_db_object( + "UniRef30", + "https://storage.googleapis.com/alphafold-databases/" + + "v2.3/UniRef30_2021_03.tar.gz", + version="2021_03", + release_date=None, + )) + if config_data["use_multimer"]: + seq_dbs.append(_get_ref_db_object( + "TrEMBL", + "ftp://ftp.ebi.ac.uk/pub/databases/uniprot/current_release/" + + "knowledgebase/complete/uniprot_trembl.fasta.gz", + version=up_version, + release_date=up_rel_date, + )) + seq_dbs.append(_get_ref_db_object( + "Swiss-Prot", + "ftp://ftp.ebi.ac.uk/pub/databases/uniprot/current_release/" + + "knowledgebase/complete/uniprot_sprot.fasta.gz", + version=up_version, + release_date=up_rel_date, + )) + seq_dbs.append(_get_ref_db_object( + "UniRef90", + "ftp://ftp.uniprot.org/pub/databases/uniprot/uniref/uniref90/" + + "uniref90.fasta.gz", + version=up_version, + release_date=up_rel_date, + )) + if config_data["use_templates"]: + if config_data["use_multimer"]: + # uses whatever is latest set of PDB sequences + # see AF2 scripts/download_pdb_seqres.sh + seq_dbs.append(_get_ref_db_object( + "PDB seqres", + "https://ftp.wwpdb.org/pub/pdb/derived_data/pdb_seqres.txt", + release_date=pdb_rel_date + )) + else: + # fixed version used in AF2 scripts/download_pdb70.sh + seq_dbs.append(_get_ref_db_object( + "PDB70", + "http://wwwuser.gwdg.de/~compbiol/data/hhsuite/databases/" + + "hhsuite_dbs/old-releases/pdb70_from_mmcif_200401.tar.gz", + release_date=datetime.datetime(2020, 4, 1) + )) + return seq_dbs + + +def _assemble_modelcif_software(soft_dict, params_dict): + """Create a modelcif.SoftwareWithParameters instance from dictionaries.""" + # create SW object + sw = modelcif.Software( + soft_dict["name"], + soft_dict["classification"], + soft_dict["description"], + soft_dict["location"], + soft_dict["type"], + soft_dict["version"], + citation=soft_dict["citation"], + ) + # assemble parameters + params = [] + for key, val in params_dict.items(): + params.append(modelcif.SoftwareParameter(key, val)) + # put them together + return modelcif.SoftwareWithParameters(sw, params) + + +def _get_modelcif_protocol_software(js_step): + """Assemble software entries for a ModelCIF protocol step.""" + # new setup in python-modelcif (as of late 2023): params with each SW + sw_list = [] + for sw, sw_params in js_step["software_plus_params"]: + sw_list.append(_assemble_modelcif_software(sw, sw_params)) + # group and done... + if sw_list: + return modelcif.SoftwareGroup(sw_list) + else: + return None + + +data_group_cache = {} +def _get_modelcif_protocol_data(data_labels, target_entities, model, acc_data): + """Assemble data for a ModelCIF protocol step. + Cached access to objects needed to remove duplicates in ModelCIF. + """ + cache_key = '|'.join(sorted(data_labels)) + if cache_key in data_group_cache: + return data_group_cache[cache_key] + data = modelcif.data.DataGroup() + for data_label in data_labels: + if data_label.startswith("target_sequences_"): + data.extend(target_entities) + elif data_label == "ref_dbs_AF2_2.1.2": + # HC for that version + data.extend(_get_sequence_dbs({ + "pdb_rel_date": datetime.datetime(2021, 11, 5), + "up_version": "2021_03", + "use_small_bfd": True, + "af_version": "2.1.2", + "use_multimer": True, + "use_templates": True, + })) + elif data_label == "ref_dbs_AF2_2.3.1": + # HC for that version + data.extend(_get_sequence_dbs({ + "pdb_rel_date": datetime.datetime(2022, 12, 9), + "up_version": "2022_05", + "use_small_bfd": True, + "af_version": "2.3.1", + "use_multimer": True, + "use_templates": True, + })) + elif data_label.startswith("model_"): + data.append(model) + elif data_label in acc_data: + data.append(acc_data[data_label]) + else: + raise RuntimeError(f"Unknown protocol data: '{data_label}'") + data_group_cache[cache_key] = data + return data + + +def _get_modelcif_protocol(protocol_steps, target_entities, model, acc_data): + """Create the protocol for the ModelCIF file.""" + protocol = modelcif.protocol.Protocol() + for js_step in protocol_steps: + sftwre = _get_modelcif_protocol_software(js_step) + input_data = _get_modelcif_protocol_data( + js_step["input"], target_entities, model, acc_data + ) + output_data = _get_modelcif_protocol_data( + js_step["output"], target_entities, model, acc_data + ) + + protocol.steps.append( + modelcif.protocol.Step( + input_data=input_data, + output_data=output_data, + name=js_step["name"], + details=js_step["details"], + software=sftwre, + ) + ) + protocol.steps[-1].method_type = js_step["method_type"] + return protocol + + +def _compress_cif_file(cif_file): + """Compress cif file and delete original.""" + with open(cif_file, "rb") as f_in: + with gzip.open(cif_file + ".gz", "wb") as f_out: + shutil.copyfileobj(f_in, f_out) + os.remove(cif_file) + + +def _package_associated_files(repo, dict_data): + """Compress associated files into single zip file and delete original. + Data can either be on disk or available in dict_data (key = path). + """ + # zip settings tested for good speed vs compression + for archive in repo.files: + with zipfile.ZipFile(archive.path, "w", zipfile.ZIP_BZIP2) as cif_zip: + for zfile in archive.files: + zpath = zfile.path + if zpath in dict_data: + cif_zip.writestr(zpath, dict_data[zpath]) + else: + cif_zip.write(zpath, arcname=zpath) + os.remove(zpath) + + +def _pkl_to_json(pkl_data, pae_handling="keep"): + """Make pkl_data JSON writable. + Options for pae_handling: + - keep = values kept as they are + - round = values kept as closest integers + - drop = drop whole table + """ + pkl_json_data = {} + for k, v in pkl_data.items(): + if k == "PAE": + if pae_handling == "keep": + pkl_json_data["PAE"] = pkl_data["PAE"].tolist() + elif pae_handling == "round": + pae = pkl_data["PAE"].round().astype(int).tolist() + pkl_json_data["PAE"] = pae + elif pae_handling == "drop": + pass + else: + raise RuntimeError( + f"Unknown pae_handling value {pae_handling}" + ) + elif k != "pLDDT_chains": + if type(v) == np.ndarray: + # note: scalars become scalars! + v = v.tolist() + pkl_json_data[k] = v + return pkl_json_data + + +def _get_sw_for_qe(steps, step_name): + """Fetch suitable SW objects from protocol steps to use in QE.""" + # to maximally reduce duplicates we reuse single groups + # otherwise new group created using same SoftwareWithParameters objects + sw_groups = [step.software for step in steps if step.name == step_name] + if len(sw_groups) == 0: + return None + elif len(sw_groups) == 1: + return sw_groups[0] + else: + # each sw_group is a list (SoftwareGroup) of SoftwareWithParameters + # ...and we remove duplicates...just in case + sw_dict = {} + for sw_group in sw_groups: + sw_dict.update({hash(swp): swp for swp in sw_group}) + return modelcif.SoftwareGroup(sw_dict.values()) + + +def _store_as_modelcif(data_json, ost_ent, zip_dict, kept_cif_cats, out_dir, + mdl_name, compress, add_extra_files): + """Mix all the data into a ModelCIF file.""" + print(" generating ModelCIF objects...", end="") + pstart = timer() + # create system to gather all the data + system = modelcif.System( + title=data_json["title"], + id=data_json["mdl_id"].upper(), + model_details=data_json["model_details"], + ) + + # create an asymmetric unit and an entity per target sequence + asym_units = {} + _get_modelcif_entities(data_json["target_entities"], asym_units, system) + + # audit_authors + system.authors.extend(data_json["audit_authors"]) + + # set up the model to produce coordinates + model = _OST2ModelCIF( + assembly=modelcif.Assembly(asym_units.values()), + asym=asym_units, + ost_entity=ost_ent, + scores_json=data_json["scores"], + name=None, + ) + # TEST + # model.not_modeled_residue_ranges.extend(_get_not_modeled_residues(model)) + # + print(f" ({timer()-pstart:.2f}s)") + print(" processing QA scores...", end="", flush=True) + pstart = timer() + model.add_scores() + print(f" ({timer()-pstart:.2f}s)") + + model_group = modelcif.model.ModelGroup([model]) + system.model_groups.append(model_group) + + # handle additional files + acc_data = {} + acc_files = [] + dict_data = {} + for pred in data_json["af2_preds"]: + # get data objects + pdb_file, pkl_file, pae_file = _get_af2_files(pred["prediction_p"]) + acc_data[pdb_file] = _get_assoc_af2_pdb_data(pred) + # NOTE: we keep data object for mismatching PKL file even though we + # do not dump file itself + acc_data[pkl_file] = _get_assoc_af2_json_data(pdb_file) + acc_data[pae_file] = _get_assoc_pae_png_data(pdb_file) + if add_extra_files: + # get file objects + acc_files.append(_get_associated_file(pdb_file, acc_data[pdb_file])) + if pkl_file not in data_json["pkl_files_to_skip"]: + json_file = f"{os.path.splitext(pkl_file)[0]}.json" + acc_files.append(_get_associated_file( + json_file, acc_data[pkl_file], file_format="json" + )) + acc_files.append(_get_associated_file(pae_file, acc_data[pae_file])) + # get file content + dict_data[pdb_file] = zip_dict["accompanying_data"][pdb_file] + if pkl_file not in data_json["pkl_files_to_skip"]: + pkl_data = pickle.loads(zip_dict["accompanying_data"][pkl_file]) + pkl_json_data = _pkl_to_json(pkl_data, pae_handling="round") + dict_data[json_file] = json.dumps(pkl_json_data) + dict_data[pae_file] = zip_dict["accompanying_data"][pae_file] + for part in data_json["trunc_parts"]: + part_path = part["truncated_prediction_path"] + if part_path != "model": + acc_data[part_path] = _get_assoc_trunc_pdb_data( + part, len(data_json["trunc_parts"]) == 1 + ) + if add_extra_files: + acc_files.append(_get_associated_file( + part_path, acc_data[part_path] + )) + dict_data[part_path] = zip_dict["accompanying_data"][part_path] + if acc_files: + system.repositories.append(_get_associated_files(mdl_name, acc_files)) + + # get data and steps + protocol = _get_modelcif_protocol( + data_json["protocol"], system.entities, model, acc_data, + ) + system.protocols.append(protocol) + + # set SW for QE + _LocalPLDDT.software = _get_sw_for_qe(protocol.steps, "AF2") + _GlobalPRC_pLDDT.software = _LocalPLDDT.software + _LocalMOD.software = _get_sw_for_qe(protocol.steps, "MODELLER") + + # write modelcif System to file + print(" write to disk...", end="", flush=True) + pstart = timer() + # NOTE: this will dump PAE on path provided in add_scores + # -> hence we cheat by changing path and back while being exception-safe... + oldpwd = os.getcwd() + os.chdir(out_dir) + mdl_fle = f"{mdl_name}.cif" + try: + # dump to string and fix with gemmi + with StringIO() as mmcif_fh: + modelcif.dumper.write(mmcif_fh, [system]) + modelcif_str = mmcif_fh.getvalue() + doc = gemmi.cif.read_string(modelcif_str) + block = doc.sole_block() + # HACK: set all label_alt_id to '-' + # -> see https://github.com/wwpdb-dictionaries/mmcif_pdbx/issues/60 + if kept_cif_cats: + cat_dict = block.get_mmcif_category("_atom_site.") + cat_dict["label_alt_id"] = ['-'] * len(cat_dict["label_alt_id"]) + block.set_mmcif_category("_atom_site.", cat_dict) + cat_dict = kept_cif_cats["_pdbx_domain_range."] + new_alt_ids = ['-'] * len(cat_dict["beg_label_alt_id"]) + cat_dict["beg_label_alt_id"] = new_alt_ids + new_alt_ids = ['-'] * len(cat_dict["end_label_alt_id"]) + cat_dict["end_label_alt_id"] = new_alt_ids + # + for cat, cat_dict in kept_cif_cats.items(): + block.set_mmcif_category(cat, cat_dict) + doc.write_file(mdl_fle) + # extra files + with open(f"{mdl_name}-image.png", "wb") as fh: + fh.write(zip_dict["image_file"]) + if acc_files: + _package_associated_files(system.repositories[0], dict_data) + if compress: + _compress_cif_file(mdl_fle) + mdl_fle += ".gz" + finally: + os.chdir(oldpwd) + print(f" ({timer()-pstart:.2f}s)") +################################################################################ + +################################################################################ +# HANDLE FULL DATA SET +################################################################################ +def _translate2modelcif(ncbi_data, opts): + """Convert a model with its accompanying data to ModelCIF.""" + # get main names + mdl_id = os.path.splitext(os.path.basename(opts.input_zip_path))[0] + if opts.compress: + cifext = "cif.gz" + else: + cifext = "cif" + mdl_path = os.path.join(opts.out_dir, f"{mdl_id}.{cifext}") + # skip if done already (disabled here) + # if os.path.exists(mdl_path): + # print(f" {mdl_id} already done...") + + # prepare data for model to convert (also gets all issues) + issues = [] + abort_after_checks = opts.checks_only + zip_dict = _get_zip_content(opts.input_zip_path) + mdl_type = zip_dict["mdl_type"] + if zip_dict["unparsed_files"]: + file_list = zip_dict["unparsed_files"] + issues.append(("unparsed_files", len(file_list), file_list)) + json_dict = json.loads(zip_dict["json_file"]) + # apply hard-coded fixes for some models + if mdl_id == "ma-taas-2330": + # find entry to fix + for pred in json_dict["AF_predictions"]: + if pred["prediction_p"] == "AFv2.3.1_389-515_mergeN_21.pdb": + pred["prediction_p"] = "AFv2.3.1_389-515.pdb" + del pred["merged_GCN4linker_len_N"] + del pred["merged_GCN4linker_len_C"] + elif mdl_id == "ma-taas-9424": + # len(json_dict["pLDDT"]) was 53 instead of 52 + # -> also in wrong abstract... + json_dict["pLDDT"] = json_dict["pLDDT"][:52] + json_dict["abstract"] = json_dict["abstract"].replace( + "53 residues", "52 residues" + ) + cif_ent, kept_cif_cats = _load_cif_file(zip_dict) + try: + issues.extend(_do_sanity_checks( + mdl_id, json_dict, zip_dict, cif_ent, kept_cif_cats + )) + entities, ent_issues = _get_entities( + json_dict, ncbi_data, cif_ent, mdl_type + ) + issues.extend(ent_issues) + except Exception as ex: + short_txt = f"{type(ex).__name__}: {ex}" + long_txt = traceback.format_exc() + issues.append(("exception", short_txt, long_txt)) + abort_after_checks = True + + # dump issues + issues_file_path = os.path.join(opts.out_dir, f"{mdl_id}-issues.json") + json.dump(issues, open(issues_file_path, "w")) + + # abort if needed + if abort_after_checks: + return + + # do the actual conversion + print(f" translating {mdl_id}...") + pdb_start = timer() + + # gather data into JSON-like structure + print(" preparing data...", end="") + pstart = timer() + af2_preds, trunc_parts = _get_preds_and_parts(json_dict, mdl_type) + mdlcf_json = { + "title": json_dict["title"].strip(), + "model_details": json_dict["abstract"].strip(), + "audit_authors": _get_audit_authors(), + "af2_preds": af2_preds, + "trunc_parts": trunc_parts, + "protocol": _get_protocol_steps_and_software( + json_dict, mdl_type, mdl_id + ), + "mdl_id": mdl_id, # used for entry ID + "target_entities": entities, + "scores": { + score: json_dict[score] \ + for score in [ + "pLDDT", "MODELLER_energy_score", "per_res_conf", + "percent_confident_residues", "mean_per_res_conf" + ] if score in json_dict + }, + "pkl_files_to_skip": [], + } + # fix percentage to [0,1] + mdlcf_json["scores"]["percent_confident_residues"] /= 100.0 + # enlarge local scores to be for each residue + for score in ["pLDDT", "MODELLER_energy_score", "per_res_conf"]: + if score in mdlcf_json["scores"]: + mdlcf_json["scores"][score] *= cif_ent.chain_count + # note: we overwrite b-factors from file! + # -> also choose which global score to use for mean_per_res_conf + if "per_res_conf" in mdlcf_json["scores"]: + mdlcf_json["scores"]["b_factors"] = mdlcf_json["scores"]["per_res_conf"] + else: + mdlcf_json["scores"]["b_factors"] = mdlcf_json["scores"]["pLDDT"] + mdlcf_json["scores"]["mean_per_res_conf_plddt"] = \ + mdlcf_json["scores"].pop("mean_per_res_conf") + # check for inconsistent pkl files + for issue_type, short_data, long_data in issues: + if issue_type.startswith("pkl_") and issue_type.endswith("_diff"): + mdlcf_json["pkl_files_to_skip"].append(short_data[1]) + print(f" ({timer()-pstart:.2f}s)") + + # save ModelCIF + _store_as_modelcif( + data_json=mdlcf_json, + ost_ent=cif_ent, + zip_dict=zip_dict, + kept_cif_cats=kept_cif_cats, + out_dir=opts.out_dir, + mdl_name=mdl_id, + compress=opts.compress, + add_extra_files=(not opts.no_extra_files), + ) + + # check if result can be read and has expected seq. + ent, ss = io.LoadMMCIF(mdl_path, seqres=True) + ent_seqres = [ss.FindSequence(chn.name).string for chn in ent.chains] + exp_seqres = [] + for trg_ent in mdlcf_json["target_entities"]: + exp_seqres += [trg_ent["seqres"]] * len(trg_ent["pdb_chain_ids"]) + assert ent_seqres == exp_seqres, f"Bad seqres {mdl_id}" + # check model itself + ch_names, mdlsqe, resnum_range, ent_bfs = _process_ent(ent) + assert ch_names == ['A', 'B', 'C'] + target_entity = mdlcf_json["target_entities"][0] + assert mdlsqe == target_entity["sqe_no_gaps"] + assert max(resnum_range) <= len(target_entity["seqres"]) + if mdl_type == "assembly": + seq_range = ( + json_dict["predicted_sequence_start"], + json_dict["predicted_sequence_end"], + ) + assert resnum_range == seq_range + else: + assert resnum_range == (1, len(mdlsqe)) + exp_bfs = mdlcf_json["scores"]["b_factors"] + bfs_max_diff = np.max(abs(np.asarray(ent_bfs) - np.asarray(exp_bfs))) + assert bfs_max_diff < 0.01 + + print(f" ... done with {mdl_id} ({timer()-pdb_start:.2f}s).") + + +def _main(): + """Run as script.""" + + # parse/fetch global data + opts = _parse_args() + ncbi_data = json.load(open(opts.ncbi_json)) + + # handle single entry (print statements kept consistent with other sets) + print(f"Working on models...") + _translate2modelcif(ncbi_data, opts) + print(f"... done with models.") + + # TEST: to judge res. needed on cluster + import resource + print('mem', resource.getrusage(resource.RUSAGE_SELF).ru_maxrss / 1000) + + +if __name__ == "__main__": + _main() diff --git a/projects/2024-12-ma-denv/README.md b/projects/2024-12-ma-denv/README.md new file mode 100644 index 0000000000000000000000000000000000000000..d4dbcf32848824fd2b7ce939dfc56a10feda56d1 --- /dev/null +++ b/projects/2024-12-ma-denv/README.md @@ -0,0 +1,19 @@ +# Modelling AVITHRAID + +[Link to project in ModelArchive](https://www.modelarchive.org/doi/10.5452/ma-denv) (incl. background on project itself and link to the publication) + +Modelling setup: +- SWISS-MODEL web server +- all jobs ran in auto-mode + +Input files for conversion: +- SWISS-MODEL ModelCIF files + +Special features here: +- None, just plain SWISS-MODEL ModelCIF files + +Content: +- translate2modelcif.py: Script to enrich ModelCIF files +- uniprotkb.py: Module to deal with UNP entries +- macif.py: Module to aid editing ModelCIF files +- example*: Example files diff --git a/projects/2024-12-ma-denv/example_index.csv.gz b/projects/2024-12-ma-denv/example_index.csv.gz new file mode 100755 index 0000000000000000000000000000000000000000..23d5092b7f3babf8a77f1c6d7fa886c7adaed01c Binary files /dev/null and b/projects/2024-12-ma-denv/example_index.csv.gz differ diff --git a/projects/2024-12-ma-denv/example_result.cif.gz b/projects/2024-12-ma-denv/example_result.cif.gz new file mode 100644 index 0000000000000000000000000000000000000000..1d078534efb68ebe6516ac4f3f61fe0e7a050c6a Binary files /dev/null and b/projects/2024-12-ma-denv/example_result.cif.gz differ diff --git a/projects/2024-12-ma-denv/example_swissmodelcif.cif.gz b/projects/2024-12-ma-denv/example_swissmodelcif.cif.gz new file mode 100755 index 0000000000000000000000000000000000000000..f76354699f4fe37b517a5ffe546a363b1071f9de Binary files /dev/null and b/projects/2024-12-ma-denv/example_swissmodelcif.cif.gz differ diff --git a/projects/2024-12-ma-denv/example_unp_data.json.gz b/projects/2024-12-ma-denv/example_unp_data.json.gz new file mode 100644 index 0000000000000000000000000000000000000000..35af190837c95979968b3e726939782ce26d4d9b Binary files /dev/null and b/projects/2024-12-ma-denv/example_unp_data.json.gz differ diff --git a/projects/2024-12-ma-denv/macif.py b/projects/2024-12-ma-denv/macif.py new file mode 100644 index 0000000000000000000000000000000000000000..da5b9c46cd2e8a773d717bb46fc38a23c6571e14 --- /dev/null +++ b/projects/2024-12-ma-denv/macif.py @@ -0,0 +1,186 @@ +"""mmCIF editing for ModelArchive depositions. + +Supposed to look like gemmi.cif but with some convenience on top. Not meant to +show high-performance but help preparing ModelCIF files for a MA deposition. +Use for 'one time' jobs, not as frequently run tool in a service/ pipeline.""" + +import gzip + +# For whatever reason, 'no-name-in-module' can not be silenced by config atm +# pylint: disable=no-name-in-module +from gemmi import cif + +# pylint: enable=no-name-in-module +import gemmi + + +def _gemmi_quote(value): + """Quote string values when necessary.""" + if ( + isinstance(value, str) + and " " in value + and not (value.startswith("'") and value.endswith("'")) + and not (value.startswith('"') and value.endswith('"')) + ): + return cif.quote(value) + return value + + +class MABlock: + """gemmi.cif wrapper that skips reading/ documents and jumps right into + gemmi.cif.Block objects. You have all the gemmi.cif.Block functionality + available plus our own convenience functions on top.""" + + def __init__(self, model_data): + """Initialise a single Block from a file.""" + self.source = model_data + self.doc = cif.read(model_data) + self.block = self.doc.sole_block() + + self._targets = None + self._polymer_targets = None + + def __getattr__(self, name): + """If an attribute is not found, try self.block before exception.""" + # The catch here is: when asking for self.foo, + # self.__getattribute__(self, "foo") is executed first. If "foo" is + # not found in self, self.__getattr__(self, "foo") is called. So here + # we already know, that "foo" is not there and we can check the + # original block. + if hasattr(self.block, name): + return getattr(self.block, name) + raise AttributeError( + f"'{type(self).__name__}' object has no attribute '{name}'" + ) + + @property + def targets(self): + """Info about targets.""" + if self._targets is not None: + return self._targets + self._targets = {} + table = self.find("_ma_target_entity.", ["entity_id"]) + for row in table: + if row["entity_id"] in self._targets: + raise RuntimeError( + f"Target with entity_id '{row['entity_id']}' is duplicated." + ) + self._targets[row["entity_id"]] = { + "entity_id": row["entity_id"], # makes live easier for singles + "sequence": self.get_sequence(row["entity_id"]), + } + table = self.find("_entity.", ["id", "type"]) + for row in table: + self._targets[row["id"]]["type"] = row["type"] + + return self._targets + + @property + def polymer_targets(self): + """Only targets of entity type 'polymer'.""" + if self._polymer_targets is not None: + return self._polymer_targets + self._polymer_targets = [] + for target in self.targets.values(): + if target["type"] == "polymer": + self._polymer_targets.append(target) + + return self._polymer_targets + + def find(self, name, columns): + """Get a table with defined colums. Throws an exception if table is not + found.""" + table = self.block.find(name, columns) + if len(table) == 0: + raise RuntimeError( + f"""Table '{name}' with columns '{"', '".join(columns)}' """ + + "not found." + ) + + return table + + def get_sequence(self, entity): + """Get the sequence of a 'polymer' entity. `entity` is the numeric ID + of the entity. + Reading sequences is inefficient atm, it reads the whole table for + every sequence w/o any caching.""" + table = self.find("_entity_poly_seq.", ["entity_id", "num", "mon_id"]) + sequence = "" + num = 0 + for row in table: + if row["entity_id"] != entity: + continue + num += 1 + assert int(row["num"]) == num + sequence += gemmi.find_tabulated_residue( + row["mon_id"] + ).one_letter_code + + return sequence + + def write_file(self, filename, compress=False, style=cif.Style.Simple): + """Write ModelCIF file to disk, compress upon request. + Will compress anyways if file ends with '.gz'.""" + if compress or filename.endswith(".gz"): + if not filename.endswith(".gz"): + filename += ".gz" + with gzip.open(filename, mode="wt", compresslevel=9) as gfh: + gfh.write(self.doc.as_string(style)) + else: + self.doc.write_file(filename, style) + + def add_to_category(self, category, match=None, **kwargs): + """Add item values to a category. + Keyword arguments are reserved for item names.""" + if category[0] != "_": + category = "_" + category + if category[-1] != ".": + category += "." + items = list(kwargs.keys()) + row = None + if match is not None: + table = self.find(category, items + [match[0]]) + for row in table: + if row[match[0]] == match[1]: + break + if row is None: + raise RuntimeError( + f"No item {match[0]}=={match[1]} found in category " + + f"{category}." + ) + else: + table = self.find(category, items) + assert len(table) == 1 + row = table[0] + for itm, val in kwargs.items(): + if row[itm] not in [".", "?"]: + print( + f" replacing '{cif.as_string(row[itm])}' with " + + f"'{val}' ({itm})" + ) + row[itm] = _gemmi_quote(val) + + def add_category(self, category, after=None, **kwargs): + """Add a new category to the block with only 1 set of values, 1 row + thinking of categories as tables. kwargs are reserved for item/ value + pairs. after is a special keyword parameter to locate the new category + inside the block.""" + if not category.startswith("_"): + category = "_" + category + # handle quoting + for values in kwargs.values(): + for i, val in enumerate(values): + values[i] = _gemmi_quote(val) + self.block.set_mmcif_category(category, kwargs, raw=True) + + if after is None: + return + if not after.startswith("_"): + after = "_" + after + if not after.endswith("."): + after += "." + table = self.block.find_mmcif_category(after) + idx = self.block.get_index(table.tags[-1]) + # be careful with move_item: loops are 1 item and move as a whole, + # single line values move per item/ value par. + self.block.move_item(-1, idx + 1) diff --git a/projects/2024-12-ma-denv/translate2modelcif.py b/projects/2024-12-ma-denv/translate2modelcif.py new file mode 100644 index 0000000000000000000000000000000000000000..5215a1034b2af18841f577a37dc9c5f3f79332f7 --- /dev/null +++ b/projects/2024-12-ma-denv/translate2modelcif.py @@ -0,0 +1,293 @@ +# -*- coding: utf-8 -*- + +"""Check & extend SWISS-MODEL models for ma-denv + +Example for running: +python translate2modelcif.py --unp-json example_unp_data.json \ + example_swissmodelcif.cif.gz \ + example_index.csv \ + done + +This call uses the example data from this directory. You may need to decompress +some of the files. The resulting './done/example_swissmodelcif.cif.gz' should +be identical to './example_swissmodelcif.cif.gz'. + +Expected output in ./done for example above: +- example_swissmodelcif.cif.gz as ModelCIF file with extended annotation +""" + +from datetime import date +import argparse +import csv +import os +import sys + +from macif import MABlock +from uniprotkb import UniProtKBEntryCache + + +################################################################################ +# GENERAL HELPER FUNCTIONS +################################################################################ +def _abort_msg(msg, exit_code=1): + """Write error message and exit with exit_code.""" + print(f"{msg}\nAborting.", file=sys.stderr) + sys.exit(exit_code) + + +def _check_file(file_path): + """Make sure a file exists and is actually a file.""" + if not os.path.exists(file_path): + _abort_msg(f"File not found: '{file_path}'.") + if not os.path.isfile(file_path): + _abort_msg(f"File path does not point to file: '{file_path}'.") + + +def _check_folder(dir_path, create): + """Make sure a file exists and is actually a file.""" + if not os.path.exists(dir_path): + if not create: + _abort_msg(f"Path not found: '{dir_path}'.") + os.makedirs(dir_path, exist_ok=True) + if not os.path.isdir(dir_path): + _abort_msg(f"Path does not point to a directory: '{dir_path}'.") + + +def _check_opts_folder(dir_path, create=False): + """Remove trailing '/' (return fixed one) and check if path valid.""" + if dir_path.endswith("/"): + dir_path = dir_path[:-1] + _check_folder(dir_path, create=create) + return dir_path + + +################################################################################ + + +################################################################################ +# DATA HANDLING +################################################################################ +def _parse_args(): + """Parse command line arguments.""" + parser = argparse.ArgumentParser( + formatter_class=argparse.RawDescriptionHelpFormatter, + description=__doc__, + ) + + parser.add_argument( + "modelcif", + type=str, + metavar="<INPUT MODELCIF FILE>", + help="Path to a SWISS-MODEL ModelCIF file provided by depositors.", + ) + parser.add_argument( + "model_info_csv", + type=str, + metavar="<MODEL INFO CSV FILE>", + help="Path to a CSV file with information about each model. Format: " + + "<FILENAME>,<UNP AC>,<UNP ALIGNMENT START>,<UNP ALIGNMENT END>," + + "<ENTITY DESCRIPTION>,<MODEL TITLE>,<MODEL DESCRIPTION>", + ) + parser.add_argument( + "out_dir", + type=str, + metavar="<OUTPUT DIR>", + help="Path to directory to store results. The updated ModelCIF file " + + "will be stored in this directory. If the ModelCIF file already " + + "exists in the output directory, it will not be overwritten and an " + + "error thrown.", + ) + parser.add_argument( + "--unp-json", + type=str, + default="./unp_data.json", + help="UniProtKB entries data. Information for each UNP entry fetched " + + "will be stored here. Serves as a cache. To update (e.g. after an " + + "UNP release), just delete the file and it will be recreated.", + ) + parser.add_argument( + "--compress", + "-c", + default=False, + action="store_true", + help="Compress ModelCIF file with gzip.", + ) + parser.add_argument( + "--checks-only", + default=False, + action="store_true", + help="Only check for issues without producing ModelCIF files.", + ) + opts = parser.parse_args() + + # check input + _check_file(opts.modelcif) + _check_file(opts.model_info_csv) + _check_opts_folder(opts.out_dir, create=True) + # check if the ModelCIF file already exists in out_dir + out_file = os.path.join( + opts.out_dir, + f"{os.path.basename(opts.modelcif)}{'.gz' if opts.compress else ''}", + ) + if os.path.exists(out_file): + _abort_msg(f"'{out_file}' already exists, will *not* be overwritten.") + # check if UNP is a file if it already exists + if os.path.exists(opts.unp_json): + _check_file(opts.unp_json) + + return opts + + +def _read_csv(mdl_file, mdl_info_csv): + """Get info of a specific model from CSV file""" + mdl_file = os.path.basename(mdl_file) + + mdl_info = {} + with open(mdl_info_csv, newline="", encoding="ascii") as csvh: + info_r = csv.reader(csvh) + for row in info_r: + assert len(row) == 7 + if row[0].endswith(mdl_file): + # <UNP AC> + mdl_info["unp_ac"] = row[1] + # <UNP ALIGNMENT START> + mdl_info["unp_start"] = int(row[2]) + # <UNP ALIGNMENT END> + mdl_info["unp_end"] = int(row[3]) + # ?(IS THIS A UNP FIELD? MAYBE CODE?) + mdl_info["protein_desc"] = row[4] + # <MODEL TITLE> + mdl_info["entry_title"] = row[5] + # <MODEL DESCRIPTION> + mdl_info["entry_desc"] = row[6] + break + if len(mdl_info) == 0: + _abort_msg(f"Could not find '{mdl_file}' in '{mdl_info_csv}'.") + + return mdl_info + + +################################################################################ + + +################################################################################ +# HANDLE FULL DATA SET +################################################################################ +def _update_modelcif(mdl_file, mdl_info, unp_json_file, out_dir, compress): + """Update ModelCIF file with model info and verify UNP related data. + Caution: This is for updates BEFORE deposition, this update does not do + a CIF-style revision history for you!""" + + block = MABlock(mdl_file) + assert len(block.polymer_targets) == 1 + target = block.polymer_targets[0] + unp = UniProtKBEntryCache(unp_json_file) + # print(mdl_info) + unp, db_aln, seq_aln = unp.match_sequence( + mdl_info["unp_ac"], + target["sequence"], + mdl_info["unp_start"], + mdl_info["unp_end"], + ) + + # update ModelCIF data + block.add_to_category( + "struct", + title=mdl_info["entry_title"], + pdbx_model_details=mdl_info["entry_desc"], + ) + block.add_to_category( + "entity", + pdbx_description=mdl_info["protein_desc"], + match=("id", target["entity_id"]), + ) + block.add_to_category( + "entity", + src_method="nat", + match=("id", target["entity_id"]), + ) + block.add_category( + "audit_author", + name=[ + "Guarnetti Prandi, Ingrid", + "Chillemi, Giovanni", + "Talarico, Carmine", + ], + pdbx_ordinal=[1, 2, 3], + after="ma_software_group", + ) + # Update ModelCIF files with UniProtKB info + struct_ref_id = "1" + block.add_category( + "struct_ref", + id=[struct_ref_id], + entity_id=[target["entity_id"]], + db_name=["UNP"], + db_code=[unp.unp_id], + pdbx_db_accession=[unp.unp_ac], + pdbx_align_begin=[mdl_info["unp_start"]], + pdbx_seq_one_letter_code=[unp.unp_seq], + details=[unp.unp_details_full], + after="entity", + ) + block.add_category( + "struct_ref_seq", + align_id=["1"], + ref_id=[struct_ref_id], + seq_align_beg=[seq_aln[0]], + seq_align_end=[seq_aln[1]], + db_align_beg=[db_aln[0]], + db_align_end=[db_aln[1]], + after="struct_ref", + ) + block.add_category( + "ma_target_ref_db_details", + target_entity_id=[target["entity_id"]], + db_name=["UNP"], + db_name_other_details=[False], + db_code=[unp.unp_id], + db_accession=[unp.unp_ac], + seq_db_isoform=[False], + seq_db_align_begin=[db_aln[0]], + seq_db_align_end=[db_aln[1]], + ncbi_taxonomy_id=[unp.ncbi_taxid], + organism_scientific=[unp.organism_species], + seq_db_sequence_version_date=[date.isoformat(unp.last_seq_change)], + seq_db_sequence_checksum=[unp.unp_crc64], + after="struct_ref_seq", + ) + # write, if requested write compressed + if out_dir is not None: + block.write_file( + os.path.join(out_dir, os.path.basename(mdl_file)), compress + ) + + +################################################################################ + + +def _main(): + """Run as script.""" + opts = _parse_args() + + # read model info from CSV + mdl_info = _read_csv(opts.modelcif, opts.model_info_csv) + print(f"Working on model {opts.modelcif}...") + _update_modelcif( + opts.modelcif, + mdl_info, + opts.unp_json, + opts.out_dir if not opts.checks_only else None, + opts.compress, + ) + print(f"... done with model {opts.modelcif}.") + + # TEST: to judge res. needed on cluster + # import resource + # print("mem", resource.getrusage(resource.RUSAGE_SELF).ru_maxrss / 1000) + + +if __name__ == "__main__": + _main() + +# LocalWords: gemmi diff --git a/projects/2024-12-ma-denv/uniprotkb.py b/projects/2024-12-ma-denv/uniprotkb.py new file mode 100644 index 0000000000000000000000000000000000000000..d934eabc278c83018fa84b9d689e2dc0342c4d4b --- /dev/null +++ b/projects/2024-12-ma-denv/uniprotkb.py @@ -0,0 +1,495 @@ +"""Functions to retrieve data for UniProtKB entries. +""" + +# Copyright (c) 2023, SIB - Swiss Institute of Bioinformatics and +# Biozentrum - University of Basel +# +# Permission is hereby granted, free of charge, to any person obtaining a copy +# of this software and associated documentation files (the "Software"), to deal +# in the Software without restriction, including without limitation the rights +# to use, copy, modify, merge, publish, distribute, sublicense, and/or sell +# copies of the Software, and to permit persons to whom the Software is +# furnished to do so, subject to the following conditions: +# +# The above copyright notice and this permission notice shall be included in all +# copies or substantial portions of the Software. +# +# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR +# IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, +# FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE +# AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER +# LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, +# OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE +# SOFTWARE. + +from datetime import datetime +import os +import requests + +try: + # pylint: disable="bare-except" + import ujson as json +except: + import json + +import parasail + +# Name of UniProtKB as _ma_target_ref_db_details.db_name in a ModelCIF file, +# but 'pythonified' (no leading '_', no '.'). +MA_TARGET_REF_DB_DETAILS_DB_NAME = "UNP" + + +def translate_upkb_date_string(date_string): + """UniProtKB uses 3-letter month codes, which do not fly in other languages + (locales) than English, e.g. if the locale is set to de_DE, strptime() + expects 'MAI' instead of 'MAY'... switching the locale temporarily is also + not so easy (threads, not setting it back upon exceptions...). Hence, we + make the month a numeric value, here. + """ + for i, mon in enumerate( + [ + "JAN", + "FEB", + "MAR", + "APR", + "MAY", + "JUN", + "JUL", + "AUG", + "SEP", + "OCT", + "NOV", + "DEC", + ], + start=1, + ): + if mon in date_string: + return date_string.replace(mon, f"{i:02}") + + raise RuntimeError( + "Unrecognised UniProtKB date string found: " + f"'{date_string}'." + ) + + +class UniProtKBEntry: + """Deal with UniProtKB entries.""" + + # We ignore PEP8 on the number of attributes per class, here. The objective + # of UniProtKBEntry is to... represent a UniProtKB entry with all its + # data/ attributes. By the nature of an entry's meta-data, breaking it up + # into sub-classes seems counter-intuitive. + # Since this is more a data class than a functional class, we allow having + # too few public methods for now. + # pylint: disable=too-many-instance-attributes, too-few-public-methods + + def __init__(self, unp_ac, entry_version=None, json_data=None): + """Create a new UniProtKB entry object. + + UniProtKB API will be queried immediately on creation. + + :param unp_ac: Accession code of the UniProtKB entry to be fetched. + :type: unp_ac: :class:`str` + :param entry_version: Version of the UniPrtoKB entry to be fetched (not + to be mixed up with the UniProtKB release). + :type entry_version: :class:`str` or :class:`int` + :param json_data: Init object from JSON object, ignores unp_ac. + :type json_data: :class:`dict` + """ + if json_data is None: + self.unp_ac = unp_ac + # filled by self._fetch() + self.entry_status = None + self.entry_version = ( + int(entry_version) if entry_version is not None else None + ) + self.first_appearance = None + self.last_change = None + self.last_seq_change = None + self.ncbi_taxid = None + self.organism_species = "" + self.seq_version = None + self.seqlen = None + self.unp_crc64 = None + self.unp_details_full = None + self.unp_id = None + self.unp_seq = "" + self._fetch() + assert len(self.unp_seq) == self.seqlen + else: + self.entry_status = json_data["entry_status"] + self.entry_version = int(json_data["entry_version"]) + self.first_appearance = ( + datetime.fromisoformat(json_data["first_appearance"]) + if json_data["first_appearance"] is not None + else None + ) + self.last_change = ( + datetime.fromisoformat(json_data["last_change"]) + if json_data["last_change"] is not None + else None + ) + self.last_seq_change = ( + datetime.fromisoformat(json_data["last_seq_change"]) + if json_data["last_seq_change"] is not None + else None + ) + self.ncbi_taxid = json_data["ncbi_taxid"] + self.organism_species = json_data["organism_species"] + self.seq_version = json_data["seq_version"] + self.seqlen = json_data["seqlen"] + self.unp_ac = json_data["ac"] + self.unp_crc64 = json_data["crc64"] + self.unp_details_full = json_data["details_full"] + self.unp_id = json_data["id"] + self.unp_seq = json_data["seq"] + + def __str__(self): + """Print object as string.""" + return ( + f"<{__name__}.{type(self).__name__} AC={self.unp_ac} " + + f"version={self.entry_version}>" + ) + + def _parse_id_line(self, line): + """Parse a UniProtKB TXT format's ID line. + + Should support some older format versions, too.""" + sline = line.split() + if len(sline) != 5: + # Some old formats that are easy to fix + if len(sline) == 6 and sline[3] == "PRT;": + sline.pop(3) + else: + raise RuntimeError( + "ID line not conforming to 'ID EntryName" + + f"Status; SequenceLength.', found: {line}" + ) + self.unp_id = sline[1] + self.entry_status = sline[2][:-1].upper() + self.seqlen = int(sline[3]) + + def _parse_dt_line(self, line): + """Parse a UniProtKB TXT format's DT line. + + Should support some older format versions, too.""" + sline = line.split() + sline[1] = translate_upkb_date_string(sline[1]) + if sline[2] == "(Rel.": # old format + if sline[4] == "Created)": + self.first_appearance = datetime.strptime(sline[1], "%d-%m-%Y") + self.entry_version = int(sline[3][:-1]) + elif sline[5] == "sequence": + self.last_seq_change = datetime.strptime(sline[1], "%d-%m-%Y") + self.seq_version = int(sline[3][:-1]) + elif sline[5] == "annotation": + self.last_change = datetime.strptime(sline[1], "%d-%m-%Y") + return + if sline[2] == "integrated": + self.first_appearance = datetime.strptime(sline[1], "%d-%m-%Y,") + elif sline[2] == "sequence": + self.last_seq_change = datetime.strptime(sline[1], "%d-%m-%Y,") + self.seq_version = int(sline[4][:-1]) + elif sline[2] == "entry": + self.last_change = datetime.strptime(sline[1], "%d-%m-%Y,") + self.entry_version = int(sline[4][:-1]) + + def _parse_de_line(self, line): + """Parse a UniProtKB TXT format's DE line(s).""" + sline = line.split() + if sline[1] == "RecName:": + if self.unp_details_full is None: + if sline[2].startswith("Full="): + self.unp_details_full = sline[2][len("Full=") :] + for i in sline[3:]: + if i.startswith("{"): + break + self.unp_details_full += f" {i}" + if self.unp_details_full.endswith(";"): + self.unp_details_full = self.unp_details_full[:-1] + break + + def _parse_os_line(self, line): + """Parse a UniProtKB TXT format's OS line(s).""" + osl = len("OS ") + if line[-1] == ".": + self.organism_species += line[osl:-1] + else: + self.organism_species += line[osl:-1] + " " + + def _parse_ox_line(self, line): + """Parse a UniProtKB TXT format's OX line.""" + # Taxonomy codes only come from NCBI atm. + sline = line.split("=") + self.ncbi_taxid = sline[-1][:-1] + self.ncbi_taxid = self.ncbi_taxid.split()[0] + + def _parse_sq_line(self, line): + """Parse a UniProtKB TXT format's SQ line.""" + sline = line.split() + self.unp_crc64 = sline[6] + + def _parse_sequence(self, line): + """Parse the sequence out of the UniProtKB TXT format.""" + sline = line.split() + self.unp_seq += "".join(sline) + + def _fetch(self): + """Retrieve information for a single UniProtKB entry.""" + if self.entry_version is None: + query_url = f"https://rest.uniprot.org/uniprotkb/{self.unp_ac}.txt" + else: + query_url = ( + f"https://rest.uniprot.org/unisave/{self.unp_ac}?format=txt&" + + f"versions={self.entry_version}" + ) + + rspns = requests.get(query_url, timeout=180) + if not rspns.ok: + raise RuntimeError( + f"UniProtKB entry with AC '{self.unp_ac}' not retrieved for " + + f"URL '{query_url}'" + ) + for line in rspns.iter_lines(decode_unicode=True): + # Check here to learn about UniProtKB's flat file format: + # https://web.expasy.org/docs/userman.html + # ID EntryName Status; SequenceLength. + if line.startswith("ID "): + self._parse_id_line(line) + # DT DD-MMM-YYYY, integrated into UniProtKB/database_name. + # DT DD-MMM-YYYY, sequence version x. + # DT DD-MMM-YYYY, entry version x. + elif line.startswith("DT "): + self._parse_dt_line(line) + # DE RecName: Full=Highly reducing ... tstA {ECO:...|PubMed:...}; + # DE Short=HR-PKS phiA {ECO:0000303|PubMed:26558485}; + # DE EC=2.3.1.- {ECO:0000269|PubMed:26558485}; + # DE AltName: Full=Phomoidride ... protein A {ECO:...|PubMed:...}; + # OS Avian leukosis RSA (RSV-SRA) (Rous sarcoma virus (strain + # OS Schmidt-Ruppin A)). + elif line.startswith("DE "): + self._parse_de_line(line) + elif line.startswith("OS "): + self._parse_os_line(line) + # OX Taxonomy_database_Qualifier=Taxonomic code; + elif line.startswith("OX NCBI_TaxID="): + self._parse_ox_line(line) + # SQ SEQUENCE 3392 AA; 378905 MW; BBD894175578E164 CRC64; + elif line.startswith("SQ "): + self._parse_sq_line(line) + # Seqeunce is stored special, w/o prefix and multiline + elif line.startswith(" "): + self._parse_sequence(line) + # print(line) + + def to_json(self): + """Return entry as JSON object.""" + return { + "ac": self.unp_ac, + "entry_version": self.entry_version, + "organism_species": self.organism_species, + "entry_status": self.entry_status, + "first_appearance": ( + datetime.isoformat(self.first_appearance) + if self.first_appearance is not None + else None + ), + "last_change": ( + datetime.isoformat(self.last_change) + if self.last_change is not None + else None + ), + "last_seq_change": ( + datetime.isoformat(self.last_seq_change) + if self.last_seq_change is not None + else None + ), + "ncbi_taxid": self.ncbi_taxid, + "seq_version": self.seq_version, + "seqlen": self.seqlen, + "details_full": self.unp_details_full, + "id": self.unp_id, + "crc64": self.unp_crc64, + "seq": self.unp_seq, + } + + +class UniProtKBEntryCache: + """Cached retrieval of UniProtKB entries. + + To avoid calling the UniProtKB API for the same UniProtKB AC multiple times, + use this cache. Also helps with keeping code cleaner since you do not need + to carry UniProtKBEntry instances around, just call the cached entry. + + Be careful about UniProtKB entry versions. In theory, when specifying no + version for the cache, the real entry may change at UniProtKB while + running... but this is also an issue when loading the entry live from + UniProtKB multiple times. + + Also be aware that the cache has no size restrictions, it does not get swept + over its whole lifetime.""" + + # The cache serves as a data-dump, storing and handing out instances. No + # need for a lot of public methods, so we ignore PEP, here. + # pylint: disable=too-few-public-methods + def __init__(self, json_cache_file=None): + """Set up the cache.""" + self._cache = {} + self._cache_file = json_cache_file + + # try to fill the cache from file + if ( + self._cache_file is not None + and os.path.exists(self._cache_file) + and os.stat(self._cache_file).st_size != 0 + ): + with open(self._cache_file, encoding="utf8") as jfh: + self._cache = self._from_json(json.load(jfh)) + + def get(self, unp_ac, entry_version=None): + """Get an UniProtKBEntry from the cache. + + If the entry can not be found, it will be fetched from the UniProtKB + API.""" + try: + return self._cache[unp_ac][entry_version] + except KeyError: + unp = UniProtKBEntry(unp_ac, entry_version=entry_version) + if unp_ac not in self._cache: + self._cache[unp_ac] = {} + self._cache[unp_ac][entry_version] = unp + # if we end up here, store the cache on disk + if self._cache_file is not None: + with open(self._cache_file, "w", encoding="utf8") as jfh: + json.dump(self.to_json(), jfh) + + return self._cache[unp_ac][entry_version] + + def to_json(self): + """Turn the cache into a JSON object.""" + data = {} + for acc, versions in self._cache.items(): + data[acc] = {} + for version, entry in versions.items(): + data[acc][version] = entry.to_json() + + return data + + def _from_json(self, data): + """Used to initialise the cache from a JSON object.""" + cache = {} + for acc, versions in data.items(): + cache[acc] = {} + for version, entry in versions.items(): + version = int(version) if version != "null" else None + cache[acc][version] = UniProtKBEntry(None, json_data=entry) + + return cache + + def match_sequence(self, unp_ac, sequence, start, end): + """Match a sequence with the sequence of a UNP entry. + + As the UNP sequence can change over time, walk through the versions + until exact match or return the best match. + + :param unp_ac: UniProtKB Accession. + :type unp_ac: :class:`str` + :param sequence: Target sequence of the model. + :type sequence: :class:`str` + :param start: Start residue of the alignment, 1-based. + :type start: :class:`int` + :param end: End residue of the alignment 1-based. + :type end: :class:`int` + """ + + # This function was first written for a bulk deposition that came with + # alignment range in the UNP sequence, UNP AC but no UNP entry version. + # So this function works with this for now. Later use cases should be + # incorporated as they occur. E.g. having no range (start, end would be + # None) should search for the best alignment and use the range from + # this in the ModelCIF file. + + def aln(unp, trg, start, end): + unp = unp[start:end] + # trg = parasail's query + # unp = parasail's ref + # Since we have a fixed range, we expect a global alignment, use + # parasail's NW variant. + alignment = parasail.nw_trace_scan_sat( + trg, unp, 5, 2, parasail.blosum62 + ) + # get aln boundaries - assuming a NW (global) alignment + db_aln_start = ( + start + len(unp) - len(alignment.traceback.query.lstrip("-")) + ) + db_aln_end = db_aln_start + len( + alignment.traceback.query.rstrip("-") + ) + seq_aln_start = len(trg) - len(alignment.traceback.ref.lstrip("-")) + seq_aln_end = seq_aln_start + len( + alignment.traceback.query.replace("-", "") + ) + # print("TRG", alignment.traceback.query) + # print("UNP", alignment.traceback.ref) + # print(f"TRG {seq_aln_start} {seq_aln_end}") + return ( + db_aln_start + 1, + db_aln_end, + seq_aln_start + 1, + seq_aln_end, + alignment.traceback.ref, + ) + + entry = self.get(unp_ac) + # depositor has provided alignment range, so we only align that region + db_aln_start, db_aln_end, seq_aln_start, seq_aln_end, aln_unp = aln( + entry.unp_seq, sequence, start - 1, end + ) + # print( + # f"{entry.entry_version} - start: {start}/ {db_aln_start+1} end: " + # + f"{end}/ {db_aln_end} len: {len(sequence)} " + # + f"({end-start})" + # ) + # Check that the alignment fits... make sure the aligned UNP sequence + # does not contain gaps, gaps in the UNP sequence would mean deleted + # residues in the target sequence. + if ( + db_aln_start == start + and db_aln_end == end + and aln_unp.find("-") == -1 + ): + return ( + entry, + (db_aln_start, db_aln_end), + (seq_aln_start, seq_aln_end), + ) + version = entry.entry_version + while version > 1: + version -= 1 + entry = self.get(unp_ac, entry_version=version) + db_aln_start, db_aln_end, seq_aln_start, seq_aln_end, aln_unp = ( + aln(entry.unp_seq, sequence, start - 1, end) + ) + # print( + # f"{version} - start: {start}/ {db_aln_start+1} end: {end}/ " + # + f"{db_aln_end} len: {len(sequence)} ({end-start})" + # ) + if ( + db_aln_start == start + and db_aln_end == end + and aln_unp.find("-") == -1 + ): + return ( + entry, + (db_aln_start, db_aln_end), + (seq_aln_start, seq_aln_end), + ) + # print("FOUND", entry.unp_seq.find(sequence)) + # if we end up here, there was no nice alignment + # ToDo: use _abort_msg here, once there is a proper Python module + raise RuntimeError( + f"Could not find a proper alignment in region {start}-{end} for " + + f"{unp_ac}." + ) + + +# LocalWords: de strptime UniProtKBEntry diff --git a/projects/README.md b/projects/README.md index 8e5fb91c4454b4ab8681172b6269663d0d924cc7..e7302614d997d2852c1981056f019a30f547d11e 100644 --- a/projects/README.md +++ b/projects/README.md @@ -2,7 +2,7 @@ This directory tree contains tools/ code from past modelling projects converted to [ModelCIF](https://mmcif.wwpdb.org/dictionaries/mmcif_ma.dic/Index/) for [MA](https://www.modelarchive.org). -Each project should come with a small README explaining the modelling project. This will help you checking if your project is of same kind. +Each project should come with a small README explaining the modelling project. This will help you checking if your project is of same kind. The project folders are named as [YYYY-MM]-[MA-ID] according to the year and month when the conversion was done and the ID in ModelArchive. The [`docker`](projects/docker/) directory does not host a modelling project. It keeps the set up of a [Docker](https://www.docker.com) image that can be used to run the converter tools from the various projects. diff --git a/projects/README.template b/projects/README.template index d648de995c3bed69dfee51943d6cd39fcd21ea81..eb86ee4c9baea676ced99d0e6cc502de59ab7cc6 100644 --- a/projects/README.template +++ b/projects/README.template @@ -1,17 +1,15 @@ # Modelling ... -<link to project in MA> +[Link to project in ModelArchive](https://www.modelarchive.org/doi/10.5452/...) (incl. background on project itself and link to the publication) -<link to project publication> +Modelling setup: +... -<hint if this is part of a multi-model-type project> +Input files for conversion: +... -<very concise project description> +Special features here: +... -<how are the ModelCIF files created using this software> - -These models qualify as "<model type>". - -## Tips & Tricks for writing a protject's README.md - -- when mentioning taxa, put them italic using '\*' like \*Taxon\* +Content: +... diff --git a/projects/docker/Dockerfile b/projects/docker/Dockerfile index 193d8f689358986c745716237fad96b0ee990624..44a04ed077cf8a2308f33d8629e7bb72ee8259ff 100644 --- a/projects/docker/Dockerfile +++ b/projects/docker/Dockerfile @@ -1,4 +1,4 @@ -ARG VERSION_OST="2.3.0" +ARG VERSION_OST="latest" FROM registry.scicore.unibas.ch/schwede/openstructure:${VERSION_OST} ## We need to declare ARGs again which were declared before the build stage ## (FROM directive), otherwise they won't be available in this stage. @@ -24,7 +24,7 @@ COPY docker/requirements.txt ${SRC_DIR} WORKDIR ${SRC_DIR} RUN set -e pipefail; \ apt-get update -y; \ - apt-get install -y git pip; \ + apt-get install -y git pip swig; \ pip install -r requirements.txt; \ ## use the python-ihm latest git clone https://github.com/ihmwg/python-ihm.git ihm.git; \