Skip to content
Snippets Groups Projects
molalg.rst 55.02 KiB

:mod:`mol.alg <ost.mol.alg>` -- Algorithms for Structures

Submodules

  • :doc:`chain_mapping`
  • :doc:`contact_score`
  • :doc:`dockq`
  • :doc:`helix_kinks`
  • :doc:`ligand_scoring`
  • :doc:`qsscore`
  • :doc:`scoring`
  • :doc:`stereochemistry`
  • :doc:`structure_analysis`
  • :doc:`trajectory_analysis`

Local Distance Test scores (lDDT, DRMSD)

Note

This is a new implementation of lDDT, introduced in OpenStructure 2.4 with focus on supporting quaternary structure and compounds beyond the 20 standard proteinogenic amino acids. The :doc:`previous lDDT code <lddt_deprecated>` that comes with Mariani et al. is considered deprecated.

Note

:class:`lddt.lDDTScorer` provides the raw Python API to compute lDDT but stereochemistry checks as described in Mariani et al. must be done seperately. You may want to check out the compare-structures action (:ref:`ost compare structures`) to compute lDDT with pre-processing and support for quaternary structures.

GDT - Global Distance Test

Implements the GDT score, i.e. identifies the largest number of positions that can be superposed within a given distance threshold. The final GDT score is then the returned number divided by the total number of reference positioons. The algorithm is similar to what is described for the LGA tool but simpler. Therefore, the fractions reported by OpenStructure tend to be systematically lower. For benchmarking we computed the full GDT_TS, i.e. average GDT for distance thresholds [1, 2, 4, 8], on all CASP15 TS models. 96.5% of differences to the LGA results from the predictioncenter are within 2 GDT points and 99.2% are within 3 GDT points. The max difference is 7.39 GDT points.

The algorithm expects two position lists of same length and applies a sliding window with specified length to define a subset of position pairs as starting point for iterative superposition. Each iterative superposition applies the following steps:

  • Compute minimal RMSD superposition on subset of position pairs
  • Apply superposition on all model positions
  • Compute pairwise distances of all model positions and reference positions
  • Define new subset of position pairs: pairs within distance threshold
  • Stop if subset doesn't change anymore

The subset in any of the iterations which is largest is stored.

This is done for each sliding window position and the largest subset ever observed is reported. To avoid long runtimes for large problem sizes, the sliding window is not applied on each possible position but is capped. If the number of positions is larger than this threshold, the sliding window is only applied on N equidistant locations.

Steric Clashes

The following function detects steric clashes in atomic structures. Two atoms are clashing if their euclidian distance is smaller than a threshold value (minus a tolerance offset).

This object is returned by the :func:`FilterClashes` function, and contains information about the clashes detected by the function.

This object contains all the information relative to a single clash detected by the :func:`FilterClashes` function

This object is returned by the :func:`CheckStereoChemistry` function, and contains information about bond lengths and planar angle widths in the structure that diverge from the parameters tabulated by Engh and Huber in the International Tables of Crystallography. Only elements that diverge from the tabulated value by a minimumnumber of standard deviations (defined when the CheckStereoChemistry function is called) are reported.

This object contains all the information relative to a single detected violation of stereo-chemical parameters in a bond length

This object contains all the information relative to a single detected violation of stereo-chemical parameters in a planar angle width

Object containing information about clashing distances between non-bonded atoms

Object containing stereo-chemical information about bonds and angles. For each item (bond or angle in a specific residue), stores the mean and standard deviation

Superposing structures

Algorithms on Structures

The accessibility algorithm enum specifies the algorithm used by the respective tools. Available are:

NACCESS, DSSP

Result object for the membrane detection algorithm described below

Trajectory Analysis

This is a set of functions used for basic trajectory analysis such as extracting positions, distances, angles and RMSDs. The organization is such that most functions have their counterpart at the individual :class:`frame level <ost.mol.CoordFrame>` so that they can also be called on one frame instead of the whole trajectory.

All these functions have a "stride" argument that defaults to stride=1, which is used to skip frames in the analysis.

Mapping functions

The following functions help to convert one residue into another by reusing as much as possible from the present atoms. They are mainly meant to map from standard amino acid to other standard amino acids or from modified amino acids to standard amino acids.

Molecular Checker (Molck)

Programmatic usage

Molecular Checker (Molck) could be called directly from the code using Molck function:

#! /bin/env python

"""Run Molck with Python API.


This is an exemplary procedure on how to run Molck using Python API which is
equivalent to the command line:

molck <PDB PATH> --rm=hyd,oxt,nonstd,unk \
                 --fix-ele --out=<OUTPUT PATH> \
                 --complib=<PATH TO compounds.chemlib>
"""

from ost.io import LoadPDB, SavePDB
from ost.mol.alg import MolckSettings, Molck

from ost.conop import CompoundLib


pdbid = "<PDB PATH>"
lib = CompoundLib.Load("<PATH TO compounds.chemlib>")

# Using Molck function
ent = LoadPDB(pdbid)
ms = MolckSettings(rm_unk_atoms=True,
                   rm_non_std=True,
                   rm_hyd_atoms=True,
                   rm_oxt_atoms=True,
                   rm_zero_occ_atoms=False,
                   colored=False,
                   map_nonstd_res=False,
                   assign_elem=True)
Molck(ent, lib, ms)
SavePDB(ent, "<OUTPUT PATH>")

It can also be split into subsequent commands for greater controll:

#! /bin/env python

"""Run Molck with Python API.


This is an exemplary procedure on how to run Molck using Python API which is
equivalent to the command line:

molck <PDB PATH> --rm=hyd,oxt,nonstd,unk \
                 --fix-ele --out=<OUTPUT PATH> \
                 --complib=<PATH TO compounds.chemlib>
"""

from ost.io import LoadPDB, SavePDB
from ost.mol.alg import (RemoveAtoms, MapNonStandardResidues,
                         CleanUpElementColumn)
from ost.conop import CompoundLib


pdbid = "<PDB PATH>"
lib = CompoundLib.Load("<PATH TO compounds.chemlib>")
map_nonstd = False

# Using function chain
ent = LoadPDB(pdbid)
if map_nonstd:
    MapNonStandardResidues(lib=lib, ent=ent)

RemoveAtoms(lib=lib,
            ent=ent,
            rm_unk_atoms=True,
            rm_non_std=True,
            rm_hyd_atoms=True,
            rm_oxt_atoms=True,
            rm_zero_occ_atoms=False,
            colored=False)

CleanUpElementColumn(lib=lib, ent=ent)
SavePDB(ent, "<OUTPUT PATH>")

API

Warning

The API here is set such that the functions modify the passed structure ent in-place. If this is not ok, please work on a copy of the structure.

Biounits

Biological assemblies, i.e. biounits, are an integral part of mmCIF files and their construction is fully defined in :class:`ost.io.MMCifInfoBioUnit`. :func:`ost.io.MMCifInfoBioUnit.PDBize` provides one possibility to construct such biounits with compatibility with the PDB format in mind. That is single character chain names, dumping all ligands in one chain etc. Here we provide a more mmCIF-style way of constructing biounits. This can either be done starting from a :class:`ost.io.MMCifInfoBioUnit` or the derived :class:`ost.mol.alg.BUInfo`. The latter is a minimalistic representation of :class:`ost.io.MMCifInfoBioUnit` and can be serialized to a byte string.

Preprocesses data from :class:`ost.io.MMCifInfoBioUnit` that are required to construct a biounit from an assymetric unit. Can be serialized.

param mmcif_buinfo: Biounit definition
type mmcif_buinfo: :class:`ost.io.MMCifInfoBioUnit`