Skip to content
Snippets Groups Projects
structure_db.rst 36.82 KiB

Structural Data

The structural database serves as a container for structural backbone and sequence data. Custom accessor objects can be implemented that relate arbitrary features to structural data. Examples provided by ProMod3 include accession using matching stem geometry (see: :class:`FragDB`) or sequence features (see: :class:`Fragger`). Besides backbone and sequence data, derived features can optionally be stored. E.g. sequence profiles or secondary structure information. Optional data includes:

  • The phi/psi dihedral angles
  • The secondary structure state as defined by dssp
  • The solvent accessibility in square Angstrom
  • The amino acid frequencies as given by an input sequence profile
  • The residue depth - The residue depth is defined as the minimum distance of a residue towards any of the exposed residues. Distances are calculated using CB positions (artificially constructed in case of glycine) and exposed is defined as: relative solvent accessibility > 25% and at least one atom being exposed to the OUTER surface. To determine whether an atom is part of that outer surface, the full structure is placed into a 3D grid and a flood fill algorithm is used to determine the atoms of interest. Internal cavities are excluded by using this approach. This is a simplified version of the residue depth as discussed in [chakravarty1999]_ and gets directly calculated when structural information is added to the StructureDB.
  • The amino acid frequency derived from structural alignments as described in [zhou2005]_ - Since the calculation of such a profile already requires a StructureDB, we end up in a hen and egg problem here... When adding structural information to the StructureDB, the according memory gets just allocated and set to zero. The usage of this information is therefore only meaningful if you calculate these profiles and manually set them (or load the provided default database).

Defining Chains and Fragments

The CoordInfo gets automatically generated when new chains are added to the structural database. It contains internal information of how a connected stretch of residues is stored in the database.

The FragmentInfo defines any fragment in the structural database. If you implement your own accessor object, thats the information you want to store.

param chain_index: Fills :attr:`chain_index`
param offset: Fills :attr:`offset`
param length: Fills :attr:`length`

The Structure Database

The following code example demonstrates how to create a structural database and fill it with content.

Calculating the structural profiles is expensive and heavily depends on the size of the database used as source. If you want to do this for a larger database, you might want to consider two things:

  1. Use a database of limited size to generate the actual profiles (something in between 5000 and 10000 nonredundant chains is enough)
  2. Use the :class:`ost.seq.ProfileDB` to gather profiles produced from jobs running in parallel

The StructureDBDataType enum has to be passed at initialization of a StructureDB in order to define what data you want to store additionally to backbone coordinates and sequence. If you want to store all data possible, use All. If you only want a subset, you can combine some of the datatypes with a bitwise or operation (see example script for StructureDB). One important note: If you enable AAFrequenciesStruct, the actual information is not automatically assigned. Only the according memory is allocated and set to zero, the actual information must be assigned manually (see example script again...).

All, Dihedrals, SolventAccessibilities, ResidueDepths, DSSP, AAFrequencies, AAFrequenciesStruct

Generates an empty StructureDB that can be filled with content through :func:`AddCoordinates`. The information extracted there is defined by data_to_store. Have a look at the :class:`StructureDBDataType` documentation and at the example script...

param data_to_store: Specifies what data to store in the database, several flags can be combined with a bitwise or operator.
type data_to_store: :class:`StructureDBDataType`

Finding Fragments based on Geometric Features

The fragment database allows to organize, search and access the information stored in a structural database (:class:`StructureDB`). In its current form it groups fragments in bins according to their length (incl. stems) and the geometry of their N-stem and C-stem (described by 4 angles and the distance between the N-stem C atom and the C-stem N atom). It can therefore be searched for fragments matching a certain geometry of N and C stems. The bins are accessed through a hash table, making searching the database ultra fast.

This example illustrates how to create a custom FragDB based on a StructureDB:

param dist_bin_size: Size of the distance parameter binning in A
param angle_bin_size: Size of the angle parameter binning in degree
type dist_bin_size: :class:`float`
type angle_bin_size: :class:`int`

Finding Fragments based on Sequence Features

In some cases you might want to use the :class:`StructureDB` to search for fragments that possibly represent the structural conformation of interest. The :class:`Fragger` searches a :class:`StructureDB` for n fragments, that maximize a certain score and gathers a set of fragments with a guaranteed structural diversity based on an rmsd_threshold. You can use the :class:`Fragger` wrapped in a full fletched pipeline implemented in :class:`~promod3.modelling.FraggerHandle` or search for fragments from scratch using an arbitrary linear combination of scores:

  • SeqID: Calculates the fraction of amino acids being identical when comparing a potential fragment from the :class:`StructureDB` and the target sequence
  • SeqSim: Calculates the avg. substitution matrix based sequence similarity of amino acids when comparing a potential fragment from the :class:`StructureDB` and the target sequence
  • SSAgree: Calculates the avg. agreement of the predicted secondary structure by PSIPRED [Jones1999]_ and the dssp [kabsch1983]_ assignment stored in the :class:`StructureDB`. The Agreement term is based on a probabilistic approach also used in HHSearch [soding2005]_.
  • TorsionProbability: Calculates the avg. probability of observing the phi/psi dihedral angles of a potential fragment from the :class:`StructureDB` given the target sequence. The probabilities are extracted from the :class:`TorsionSampler` class.
  • SequenceProfile: Calculates the avg. profile score between the amino acid frequencies of a potential fragment from the :class:`StructureDB` and a target profile assuming a gapfree alignment in between them. The scores are calculated as L1 distances between the profile columns.
  • StructureProfile: Calculates the avg. profile score between the amino acid frequencies of a potential fragment from the :class:`StructureDB` and a target profile assuming a gapfree alignment in between them. The scores are calculated as L1 distances between the profile columns. In this case, the amino acid frequencies extracted from structural alignments are used.

A Fragger object to search a :class:`StructureDB` for fragments with seq as target sequence. You need to add some score components before you can finally call the Fill function.

param seq: Sequence of fragments to be searched
type seq: :class:`str`

A simple storable map of Fragger objects. The idea is that one can use the map to cache fragger lists that have already been generated.

You can use :meth:`Contains` to check if an item with a given key (:class:`int`) already exists and access items with the [] operator (see :meth:`__getitem__` and :meth:`__setitem__`).

Serialization is meant to be temporary and is not guaranteed to be portable.

The PsipredPrediction class

A container for the secondary structure prediction by Psipred.