Skip to content
Snippets Groups Projects
mmcif.rst 30.20 KiB

mmCIF File Format

The mmCIF file format is an alternate container for structural entities, also provided by the PDB. Here we describe how to load those files and how to deal with information provided above the common PDB format (:class:`MMCifInfo`, :class:`MMCifInfoCitation`, :class:`MMCifInfoTransOp`, :class:`MMCifInfoBioUnit`, :class:`MMCifInfoStructDetails`, :class:`MMCifInfoObsolete`, :class:`MMCifInfoStructRef`, :class:`MMCifInfoStructRefSeq`, :class:`MMCifInfoStructRefSeqDif`, :class:`MMCifInfoRevisions`).

Loading mmCIF Files

Categories Available

The following categories of a mmCIF file are considered by the reader:

  • atom_site: Used to build the :class:`~ost.mol.EntityHandle`
  • entity: Involved in setting :class:`~ost.mol.ChainType` of chains
  • entity_poly: Involved in setting :class:`~ost.mol.ChainType` of chains
  • citation: Goes into :class:`MMCifInfoCitation`
  • citation_author: Goes into :class:`MMCifInfoCitation`
  • exptl: Goes into :class:`MMCifInfo` as :attr:`~MMCifInfo.method`.
  • refine: Goes into :class:`MMCifInfo` as :attr:`~MMCifInfo.resolution`, :attr:`~MMCifInfo.r_free` and :attr:`~MMCifInfo.r_work`.
  • pdbx_struct_assembly: Used for :class:`MMCifInfoBioUnit`.
  • pdbx_struct_assembly_gen: Used for :class:`MMCifInfoBioUnit`.
  • pdbx_struct_oper_list: Used for :class:`MMCifInfoBioUnit`.
  • struct: Details about a structure, stored in :class:`MMCifInfoStructDetails`.
  • struct_conf: Stores secondary structure information (practically helices) in the :class:`~ost.mol.EntityHandle`
  • struct_sheet_range: Stores secondary structure information for sheets in the :class:`~ost.mol.EntityHandle`
  • pdbx_database_PDB_obs_spr: Verbose information on obsoleted/ superseded entries, stored in :class:`MMCifInfoObsolete`
  • struct_ref stored in :class:`MMCifInfoStructRef`
  • struct_ref_seq stored in :class:`MMCifInfoStructRefSeqDif`
  • struct_ref_seq_dif stored in :class:`MMCifInfoStructRefDif`
  • database_pdb_rev (mmCIF dictionary version < 5) stored in :class:`MMCifInfoRevisions`
  • pdbx_audit_revision_history and pdbx_audit_revision_details (mmCIF dictionary version >= 5) used to fill :class:`MMCifInfoRevisions`

Notes:

  • Structures in mmCIF format can have two chain names. The "new" chain name extracted from atom_site.label_asym_id is used to name the chains in the :class:`~ost.mol.EntityHandle`. The "old" (author provided) chain name is extracted from atom_site.auth_asym_id for the first atom of the chain. It is added as string property named "pdb_auth_chain_name" to the :class:`~ost.mol.ChainHandle`. The mapping is also stored in :class:`MMCifInfo` as :meth:`~MMCifInfo.GetMMCifPDBChainTr` and :meth:`~MMCifInfo.GetPDBMMCifChainTr` if SEQRES records are read in :func:`~ost.io.LoadMMCIF` and a non-empty SEQRES record exists for that chain (this should exclude ligands and water).
  • Molecular entities in mmCIF are identified by an entity.id. Each chain is mapped to an ID in :class:`MMCifInfo` as :meth:`~MMCifInfo.GetMMCifEntityIdTr`.