diff --git a/modules/io/doc/mmcif.rst b/modules/io/doc/mmcif.rst index 85d8ac445ba4dfee707c104188206a06d393204d..2791add55d297f1478c5080ae3495ff0e41525d6 100644 --- a/modules/io/doc/mmcif.rst +++ b/modules/io/doc/mmcif.rst @@ -1678,6 +1678,9 @@ significant impact on how chains are assigned to mmCIF entities, chain names and residue numbers. Ideally, the input is *mmcif_conform* which is the case when loading a structure from a valid mmCIF file with :func:`ost.io.LoadMMCIF`. +Behaviour when *mmcif_conform* is True +"""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""" + Expected properties when *mmcif_conform* is enabled: * The residues in a chain all represent the same mmCIF entity. That is for @@ -1696,7 +1699,7 @@ Expected properties when *mmcif_conform* is enabled: type "branched". There, a subtype such as CHAINTYPE_OLIGOSACCHARIDE is expected. * The residue numbers in "polymer" chains must match the SEQRES of the - underlying entity with 1-based indexing.Insertion codes are not allowed + underlying entity with 1-based indexing. Insertion codes are not allowed and raise an error. * Each residue must have a valid chem class assigned (available as :func:`ost.mol.ResidueHandle.GetChemClass`). Even though this information @@ -1743,6 +1746,54 @@ a few special cases: _atom_site.pdbx_PDB_ins_code +Behaviour when *mmcif_conform* is False +"""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""" + +If *mmcif_conform* is not enabled, the only expectation is that chem classes +(available as :func:`ost.mol.ResidueHandle.GetChemClass`) are set. OpenStructure +delegates this to the :class:`ost.conop.Processor` and thus requires a valid +:class:`ost.conop.CompoundLib` when reading a structure. There will be +significant preprocessing involving the split of chains which is purely based +on the set chem classes. Each chain gets split with the following rules: + +* separate chain of _entity.type "non-polymer" for each residue with chem class + :class:`NON_POLYMER`/:class:`UNKNOWN` +* if any residue has chem class :class:`WATER`, all of them are collected + into one separate chain with _entity.type "water" +* if any residue is a saccharide, i.e. has chem class + :class:`SACCHARIDE`/:class:`L_SACCHARIDE`/:class:`D_SACCHARIDE`, all of them + are collected into one separate chain of _entity.type "branched" and + _pdbx_entity_branch.type "oligosaccharide". +* if any residue has chem class :class:`RNA_LINKING`, all of them are collected + into one separate chain of _entity.type "polymer" and + _entity_poly.type "polyribonucleotide". +* if any residue has chem class :class:`DNA_LINKING`, all of them are collected + into one separate chainof _entity.type "polymer" and + _entity_poly.type "polydeoxyribonucleotide". +* if any residue is peptide linking, all of them are collected into one separate + chain of _entity.type "polymer" and _entity_poly.type + "polypeptide(L)"/"polypeptide(D)". We only allow the following + combinations of chem classes. Either + :class:`L_PEPTIDE_LINKING`/:class:`PEPTIDE_LINKING` or + :class:`D_PEPTIDE_LINKING`/:class:`PEPTIDE_LINKING`. Mixing + :class:`L_PEPTIDE_LINKING` and :class:`D_PEPTIDE_LINKING` raises an error. + +Chain names are generated by iterating over +"ABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789abcdefghijklmnopqrstuvwxyz", starting with +AA, AB, AC etc. once the first cycle is through. There can therefore be as many +chains as needed. The mmCIF entities are built the same way as for +*mmcif_conform* with two differences: 1) the extracted SEQRES of a chain is the +ATOMSEQ, i.e. the exact sequence of its residues 2) Entity matching happens +through exact matches of SEQRES and is independent from residue numbers. As a +consequence, the residue numbers written as _atom_site.label_seq_id do not +correspond anymore to the actual residue numbers but refer to the location in +ATOMSEQ. + +Once split and new chain names assigned, the rest is straightforward. +The special cases listed above (_atom_site.auth_asym_id, +_pdbx_poly_seq_scheme.pdb_strand_id, _atom_site.auth_seq_id etc.) are +treated the same as if *mmcif_conform* was true. + .. class:: MMCifWriterEntity Defines mmCIF entity which will be written in :class:`MMCifWriter` @@ -1752,7 +1803,7 @@ a few special cases: Static constructor function for entities of type "polymer" - :param entity_poly_type: Entity poly type from restricted alphabet for + :param entity_poly_type: Entity poly type from restricted vocabulary for `_entity_poly.type <https://mmcif.wwpdb.org/dictionaries/mmcif_pdbx_v50.dic/Items/_entity_poly.type.html>`_ :type entity_poly_type: :class:`str` :param mon_ids: Full names of all compounds defining the SEQRES of that @@ -1806,7 +1857,7 @@ a few special cases: .. method:: SetStructure(ent, mmcif_conform=True, entity_info=list()) Extracts mmCIF categories/attributes based on the description above. - An object of type :class:`MMCifWriter` can only be associated to one + An object of type :class:`MMCifWriter` can only be associated with one Structure. Calling this function more than once raises an error. :param ent: The stucture to write