mmCIF File Format
The mmCIF file format is a container for structural entities provided by the PDB. Here we describe how to load those files and how to deal with information provided above the legacy PDB format (:class:`MMCifInfo`, :class:`MMCifInfoCitation`, :class:`MMCifInfoTransOp`, :class:`MMCifInfoBioUnit`, :class:`MMCifInfoStructDetails`, :class:`MMCifInfoObsolete`, :class:`MMCifInfoStructRef`, :class:`MMCifInfoStructRefSeq`, :class:`MMCifInfoStructRefSeqDif`, :class:`MMCifInfoRevisions`, :class:`MMCifInfoEntityBranchLink`).
Loading mmCIF Files
Categories Available
The following categories of a mmCIF file are considered by the reader:
-
atom_site
: Used to build the :class:`~ost.mol.EntityHandle` -
entity
: Involved in setting :class:`~ost.mol.ChainType` of chains -
entity_poly
: Involved in setting :class:`~ost.mol.ChainType` of chains -
citation
: Goes into :class:`MMCifInfoCitation` -
citation_author
: Goes into :class:`MMCifInfoCitation` -
exptl
: Goes into :class:`MMCifInfo` as :attr:`~MMCifInfo.method`. -
refine
: Goes into :class:`MMCifInfo` as :attr:`~MMCifInfo.resolution`, :attr:`~MMCifInfo.r_free` and :attr:`~MMCifInfo.r_work`. -
pdbx_struct_assembly
: Used for :class:`MMCifInfoBioUnit`. -
pdbx_struct_assembly_gen
: Used for :class:`MMCifInfoBioUnit`. -
pdbx_struct_oper_list
: Used for :class:`MMCifInfoBioUnit`. -
struct
: Details about a structure, stored in :class:`MMCifInfoStructDetails`. -
struct_conf
: Stores secondary structure information (practically helices) in the :class:`~ost.mol.EntityHandle` -
struct_sheet_range
: Stores secondary structure information for sheets in the :class:`~ost.mol.EntityHandle` -
pdbx_database_PDB_obs_spr
: Verbose information on obsoleted/ superseded entries, stored in :class:`MMCifInfoObsolete` -
struct_ref
stored in :class:`MMCifInfoStructRef` -
struct_ref_seq
stored in :class:`MMCifInfoStructRefSeqDif` -
struct_ref_seq_dif
stored in :class:`MMCifInfoStructRefDif` -
database_pdb_rev
(mmCIF dictionary version < 5) stored in :class:`MMCifInfoRevisions` -
pdbx_audit_revision_history
andpdbx_audit_revision_details
(mmCIF dictionary version >= 5) used to fill :class:`MMCifInfoRevisions` -
pdbx_entity_branch
andpdbx_entity_branch_link
used for :class:`MMCifInfoEntityBranchLink`, a list of links is available by :meth:`~MMCifInfo.GetEntityBranchLinks`
Notes:
- Structures in mmCIF format can have two chain names. The "new" chain name
extracted from
atom_site.label_asym_id
is used to name the chains in the :class:`~ost.mol.EntityHandle`. The "old" (author provided) chain name is extracted fromatom_site.auth_asym_id
for the first atom of the chain. It is added as string property named "pdb_auth_chain_name" to the :class:`~ost.mol.ChainHandle`. The mapping is also stored in :class:`MMCifInfo` as :meth:`~MMCifInfo.GetMMCifPDBChainTr` and :meth:`~MMCifInfo.GetPDBMMCifChainTr` if SEQRES records are read in :func:`~ost.io.LoadMMCIF` and a non-empty SEQRES record exists for that chain (this should exclude ligands and water). - Molecular entities in mmCIF are identified by an
entity.id
. Each chain is mapped to an ID in :class:`MMCifInfo` as :meth:`~MMCifInfo.GetMMCifEntityIdTr`.
Info Classes
Information from mmCIF files that goes beyond structural data, is kept in a special container, the :class:`MMCifInfo` class. Here is a detailed description of the annotation available.
This is the container for all bits of non-molecular data pulled from a mmCIF file.
This stores citation information from an input file.
This stores operations needed to transform an :class:`~ost.mol.EntityHandle` into a bio unit.
This stores information how a structure is to be assembled to form the bio unit.
Holds details about the structure.
- Holds details on obsolete / superseded structures. The data is
- available both in the obsolete and in the replacement entries.
Holds the information of the struct_ref category. The category describes the
link of polymers in the mmCIF file to sequences stored in external databases
such as UniProt. The related categories struct_ref_seq
and
struct_ref_seq_dif
also list differences between the sequences of the
deposited structure and the sequences in the database. Two prominent examples
of such differences are point mutations and/or expression tags.
An aligned range of residues between a sequence in a reference database and the deposited sequence.
A particular difference between the deposited sequence and the sequence in the database.
Revision history of a PDB entry. If you find a '?' somewhere, this means 'not set'.
Data from pdbx_entity_branch
, most specifically
pdbx_entity_branch_link
. That is connectivity information for branched
entities, e.g. carbohydrates/ oligosaccharides.
:class:`Conop Processors <ost.conop.Processor>` can not easily connect them so
we use this information in :meth:`LoadMMCIF` to do that.