Skip to content
Snippets Groups Projects
mmcif.rst 34.53 KiB

mmCIF File Format

The mmCIF file format is a container for structural entities provided by the PDB. Here we describe how to load those files and how to deal with information provided above the legacy PDB format (:class:`MMCifInfo`, :class:`MMCifInfoCitation`, :class:`MMCifInfoTransOp`, :class:`MMCifInfoBioUnit`, :class:`MMCifInfoStructDetails`, :class:`MMCifInfoObsolete`, :class:`MMCifInfoStructRef`, :class:`MMCifInfoStructRefSeq`, :class:`MMCifInfoStructRefSeqDif`, :class:`MMCifInfoRevisions`, :class:`MMCifInfoEntityBranchLink`).

Loading mmCIF Files

Categories Available

The following categories of a mmCIF file are considered by the reader:

  • atom_site: Used to build the :class:`~ost.mol.EntityHandle`
  • entity: Involved in setting :class:`~ost.mol.ChainType` of chains
  • entity_poly: Involved in setting :class:`~ost.mol.ChainType` of chains
  • citation: Goes into :class:`MMCifInfoCitation`
  • citation_author: Goes into :class:`MMCifInfoCitation`
  • exptl: Goes into :class:`MMCifInfo` as :attr:`~MMCifInfo.method`.
  • refine: Goes into :class:`MMCifInfo` as :attr:`~MMCifInfo.resolution`, :attr:`~MMCifInfo.r_free` and :attr:`~MMCifInfo.r_work`.
  • pdbx_struct_assembly: Used for :class:`MMCifInfoBioUnit`.
  • pdbx_struct_assembly_gen: Used for :class:`MMCifInfoBioUnit`.
  • pdbx_struct_oper_list: Used for :class:`MMCifInfoBioUnit`.
  • struct: Details about a structure, stored in :class:`MMCifInfoStructDetails`.
  • struct_conf: Stores secondary structure information (practically helices) in the :class:`~ost.mol.EntityHandle`
  • struct_sheet_range: Stores secondary structure information for sheets in the :class:`~ost.mol.EntityHandle`
  • pdbx_database_PDB_obs_spr: Verbose information on obsoleted/ superseded entries, stored in :class:`MMCifInfoObsolete`
  • struct_ref stored in :class:`MMCifInfoStructRef`
  • struct_ref_seq stored in :class:`MMCifInfoStructRefSeqDif`
  • struct_ref_seq_dif stored in :class:`MMCifInfoStructRefDif`
  • database_pdb_rev (mmCIF dictionary version < 5) stored in :class:`MMCifInfoRevisions`
  • pdbx_audit_revision_history and pdbx_audit_revision_details (mmCIF dictionary version >= 5) used to fill :class:`MMCifInfoRevisions`
  • pdbx_entity_branch and pdbx_entity_branch_link used for :class:`MMCifInfoEntityBranchLink`, a list of links is available by :meth:`~MMCifInfo.GetEntityBranchLinks`

Notes:

  • Structures in mmCIF format can have two chain names. The "new" chain name extracted from atom_site.label_asym_id is used to name the chains in the :class:`~ost.mol.EntityHandle`. The "old" (author provided) chain name is extracted from atom_site.auth_asym_id for the first atom of the chain. It is added as string property named "pdb_auth_chain_name" to the :class:`~ost.mol.ChainHandle`. The mapping is also stored in :class:`MMCifInfo` as :meth:`~MMCifInfo.GetMMCifPDBChainTr` and :meth:`~MMCifInfo.GetPDBMMCifChainTr` if SEQRES records are read in :func:`~ost.io.LoadMMCIF` and a non-empty SEQRES record exists for that chain (this should exclude ligands and water).
  • Molecular entities in mmCIF are identified by an entity.id. Each chain is mapped to an ID in :class:`MMCifInfo` as :meth:`~MMCifInfo.GetMMCifEntityIdTr`.

Info Classes

Information from mmCIF files that goes beyond structural data, is kept in a special container, the :class:`MMCifInfo` class. Here is a detailed description of the annotation available.

This is the container for all bits of non-molecular data pulled from a mmCIF file.

This stores citation information from an input file.

This stores operations needed to transform an :class:`~ost.mol.EntityHandle` into a bio unit.

This stores information how a structure is to be assembled to form the bio unit.

Holds details about the structure.

Holds details on obsolete / superseded structures. The data is
available both in the obsolete and in the replacement entries.

Holds the information of the struct_ref category. The category describes the link of polymers in the mmCIF file to sequences stored in external databases such as UniProt. The related categories struct_ref_seq and struct_ref_seq_dif also list differences between the sequences of the deposited structure and the sequences in the database. Two prominent examples of such differences are point mutations and/or expression tags.

An aligned range of residues between a sequence in a reference database and the deposited sequence.

A particular difference between the deposited sequence and the sequence in the database.

Revision history of a PDB entry. If you find a '?' somewhere, this means 'not set'.

Data from pdbx_entity_branch, most specifically pdbx_entity_branch_link. That is connectivity information for branched entities, e.g. carbohydrates/ oligosaccharides. :class:`Conop Processors <ost.conop.Processor>` can not easily connect them so we use this information in :meth:`LoadMMCIF` to do that.