structure_db.rst



Structural Data
The structural database serves as a container for structural backbone and
sequence data. Custom accessor objects can be implemented that relate
arbitrary features to structural data. Examples provided by ProMod3 include
accession using matching stem geometry (see: :class:`FragDB`) or sequence
features (see: :class:`Fragger`).
Besides backbone and sequence data, derived features can
optionally be stored. E.g. sequence profiles or secondary structure information.
Optional data includes:


The phi/psi dihedral angles
The secondary structure state as defined by dssp
The solvent accessibility in square Angstrom
The amino acid frequencies as given by an input sequence profile
The residue depth - The residue depth is defined as the minimum distance of
a residue towards any of the exposed residues.
Distances are calculated using CB positions (artificially constructed in case
of glycine) and exposed is defined as:
relative solvent accessibility > 25% and at least one atom being exposed
to the OUTER surface. To determine whether an atom is part of that outer
surface, the full structure is placed into a 3D grid and a flood fill
algorithm is used to determine the atoms of interest.
Internal cavities are excluded by using this approach. This is a simplified
version of the residue depth as discussed in [chakravarty1999]_ and gets
directly calculated when structural information is added to the StructureDB.
The amino acid frequency derived from structural alignments as described
in [zhou2005]_ - Since the calculation of such a profile already requires a
StructureDB, we end up in a hen and egg problem here... When adding
structural information to the StructureDB, the according memory gets
just allocated and set to zero. The usage of this information
is therefore only meaningful if you calculate these profiles
and manually set them (or load the provided default database).


Defining Chains and Fragments
The CoordInfo gets automatically generated when new chains are added to
the structural database. It contains internal information of how a
connected stretch of residues is stored in the database.
The FragmentInfo defines any fragment in the structural database. If you
implement your own accessor object, thats the information you want to store.


param chain_index:
Fills :attr:`chain_index`


param offset:
Fills :attr:`offset`


param length:
Fills :attr:`length`


The Structure Database
The following code example demonstrates how to create a structural database
and fill it with content.
Calculating the structural profiles is expensive and heavily depends on
the size of the database used as source. If you want to do this for a larger
database, you might want to consider two things:

Use a database of limited size to generate the actual profiles (something
in between 5000 and 10000 nonredundant chains is enough)
Use the :class:`ost.seq.ProfileDB` to gather profiles produced from jobs
running in parallel

The StructureDBDataType enum has to be passed at initialization of a
StructureDB in order to define what data you want to store additionally
to backbone coordinates and sequence.
If you want to store all data possible, use All. If you only want a subset,
you can combine some of the datatypes with a bitwise or operation
(see example script for StructureDB). One important note:
If you enable AAFrequenciesStruct, the actual information is not automatically
assigned. Only the according memory is allocated and set to zero, the actual
information must be assigned manually (see example script again...).
All, Dihedrals, SolventAccessibilities, ResidueDepths, DSSP, AAFrequencies,
AAFrequenciesStruct
Generates an empty StructureDB that can be filled with content through
:func:`AddCoordinates`. The information extracted there is defined by
data_to_store. Have a look at the :class:`StructureDBDataType`
documentation and at the example script...


param data_to_store:
Specifies what data to store in the database, several
flags can be combined with a bitwise or operator.


type data_to_store:
:class:`StructureDBDataType`


Finding Fragments based on Geometric Features
The fragment database allows to organize, search and access the information
stored in a structural database (:class:`StructureDB`). In its current form it
groups fragments in bins according to their length (incl. stems) and the
geometry of their N-stem and C-stem (described by 4 angles and the distance
between the N-stem C atom and the C-stem N atom). It can therefore be searched
for fragments matching a certain geometry of N and C stems. The bins are
accessed through a hash table, making searching the database ultra fast.
This example illustrates how to create a custom FragDB based on a StructureDB:


param dist_bin_size:
Size of the distance parameter binning in A


param angle_bin_size:
Size of the angle parameter binning in degree


type dist_bin_size:
:class:`float`


type angle_bin_size:
:class:`int`


Finding Fragments based on Sequence Features
In some cases you might want to use the :class:`StructureDB` to search
for fragments that possibly represent the structural conformation of interest.
The :class:`Fragger` searches a :class:`StructureDB` for n fragments,
that maximize a certain score and gathers a set of fragments with a guaranteed
structural diversity based on an rmsd_threshold. You can use the :class:`Fragger`
wrapped in a full fletched pipeline implemented in
:class:`~promod3.modelling.FraggerHandle` or search for fragments from scratch
using an arbitrary linear combination of scores:


SeqID:
Calculates the fraction of amino acids being identical when comparing
a potential fragment from the :class:`StructureDB` and the target sequence

SeqSim:
Calculates the avg. substitution matrix based sequence similarity of amino acids
when comparing a potential fragment from the :class:`StructureDB` and the target
sequence

SSAgree:
Calculates the avg. agreement of the predicted secondary structure by PSIPRED [Jones1999]_
and the dssp [kabsch1983]_ assignment stored in the :class:`StructureDB`.
The Agreement term is based on a probabilistic approach also used in HHSearch [soding2005]_.

TorsionProbability:
Calculates the avg. probability of observing the phi/psi dihedral angles of a potential
fragment from the :class:`StructureDB` given the target sequence. The probabilities are
extracted from the :class:`TorsionSampler` class.

SequenceProfile:
Calculates the avg. profile score between the amino acid frequencies of a potential
fragment from the :class:`StructureDB` and a target profile assuming a gapfree alignment
in between them. The scores are calculated as L1 distances between the profile columns.

StructureProfile:
Calculates the avg. profile score between the amino acid frequencies of a potential
fragment from the :class:`StructureDB` and a target profile assuming a gapfree alignment
in between them. The scores are calculated as L1 distances between the profile columns.
In this case, the amino acid frequencies extracted from structural alignments are used.

A Fragger object to search a :class:`StructureDB` for fragments with seq
as target sequence. You need to add some score components before you can
finally call the Fill function.


param seq:
Sequence of fragments to be searched


type seq:
:class:`str`


A simple storable map of Fragger objects. The idea is that one can use the map
to cache fragger lists that have already been generated.
You can use :meth:`Contains` to check if an item with a given key
(:class:`int`) already exists and access items with the [] operator (see
:meth:`__getitem__` and :meth:`__setitem__`).
Serialization is meant to be temporary and is not guaranteed to be portable.

The PsipredPrediction class
A container for the secondary structure prediction by Psipred.