Skip to content
Snippets Groups Projects
mmcif.rst 31.47 KiB

mmCIF File Format

The mmCIF file format is an alternate container for structural entities, also provided by the PDB. Here we describe how to load those files and how to deal with information provided above the common PDB format (:class:`MMCifInfo`, :class:`MMCifInfoCitation`, :class:`MMCifInfoTransOp`, :class:`MMCifInfoBioUnit`, :class:`MMCifInfoStructDetails`, :class:`MMCifInfoObsolete`, :class:`MMCifInfoStructRef`, :class:`MMCifInfoStructRefSeq`, :class:`MMCifInfoStructRefSeqDif`, :class:`MMCifInfoRevisions`).

Loading mmCIF Files

Categories Available

The following categories of a mmCIF file are considered by the reader:

  • atom_site: Used to build the :class:`~ost.mol.EntityHandle`
  • entity: Involved in setting :class:`~ost.mol.ChainType` of chains
  • entity_poly: Involved in setting :class:`~ost.mol.ChainType` of chains
  • citation: Goes into :class:`MMCifInfoCitation`
  • citation_author: Goes into :class:`MMCifInfoCitation`
  • exptl: Goes into :class:`MMCifInfo` as :attr:`~MMCifInfo.method`.
  • refine: Goes into :class:`MMCifInfo` as :attr:`~MMCifInfo.resolution`, :attr:`~MMCifInfo.r_free` and :attr:`~MMCifInfo.r_work`.
  • pdbx_struct_assembly: Used for :class:`MMCifInfoBioUnit`.
  • pdbx_struct_assembly_gen: Used for :class:`MMCifInfoBioUnit`.
  • pdbx_struct_oper_list: Used for :class:`MMCifInfoBioUnit`.
  • struct: Details about a structure, stored in :class:`MMCifInfoStructDetails`.
  • struct_conf: Stores secondary structure information (practically helices) in the :class:`~ost.mol.EntityHandle`
  • struct_sheet_range: Stores secondary structure information for sheets in the :class:`~ost.mol.EntityHandle`
  • pdbx_database_PDB_obs_spr: Verbose information on obsoleted/ superseded entries, stored in :class:`MMCifInfoObsolete`
  • struct_ref stored in :class:`MMCifInfoStructRef`
  • struct_ref_seq stored in :class:`MMCifInfoStructRefSeqDif`
  • struct_ref_seq_dif stored in :class:`MMCifInfoStructRefDif`
  • database_pdb_rev (mmCIF dictionary version < 5) stored in :class:`MMCifInfoRevisions`
  • pdbx_audit_revision_history and pdbx_audit_revision_details (mmCIF dictionary version >= 5) used to fill :class:`MMCifInfoRevisions`

Notes:

  • Structures in mmCIF format can have two chain names. The "new" chain name extracted from atom_site.label_asym_id is used to name the chains in the :class:`~ost.mol.EntityHandle`. The "old" (author provided) chain name is extracted from atom_site.auth_asym_id for the first atom of the chain. It is added as string property named "pdb_auth_chain_name" to the :class:`~ost.mol.ChainHandle`. The mapping is also stored in :class:`MMCifInfo` as :meth:`~MMCifInfo.GetMMCifPDBChainTr` and :meth:`~MMCifInfo.GetPDBMMCifChainTr` if SEQRES records are read in :func:`~ost.io.LoadMMCIF` and a non-empty SEQRES record exists for that chain (this should exclude ligands and water).
  • Molecular entities in mmCIF are identified by an entity.id. Each chain is mapped to an ID in :class:`MMCifInfo` as :meth:`~MMCifInfo.GetMMCifEntityIdTr`.

Info Classes

Information from mmCIF files that goes beyond structural data, is kept in a special container, the :class:`MMCifInfo` class. Here is a detailed description of the annotation available.

This is the container for all bits of non-molecular data pulled from a mmCIF file.

This stores citation information from an input file.

This stores operations needed to transform an :class:`~ost.mol.EntityHandle` into a bio unit.

This stores information how a structure is to be assembled to form the bio unit.

Holds details about the structure.

Holds details on obsolete / superseded structures. The data is
available both in the obsolete and in the replacement entries.

Holds the information of the struct_ref category. The category describes the link of polymers in the mmCIF file to sequences stored in external databases such as UniProt. The related categories struct_ref_seq and struct_ref_seq_dif also list differences between the sequences of the deposited structure and the sequences in the database. Two prominent examples of such differences are point mutations and/or expression tags.

An aligned range of residues between a sequence in a reference database and the deposited sequence.

A particular difference between the deposited sequence and the sequence in the database.

Revision history of a PDB entry. If you find a '?' somewhere, this means 'not set'.