-
Studer Gabriel authoredStuder Gabriel authored
Structural Data
The structural database serves as a container for structural backbone and profile data. It can be filled with chains of pdb structures with their corresponding profiles as they are produced by the HHSuite tools [soding2005]. Structural and profile data get complemented by with additional information. Following features get stored on a per residue basis:
- The amino acid one letter code
- The coordinates of the backbone atoms (N,CA,C,O)
- The phi/psi dihedral angles
- The secondary structure state as defined by dssp
- The solvent accessibility in square Angstrom
- The residue depth defined as the average distance from all atoms of a residue to the closest surface vertex as calculated by msms [sanner1996]. This is a simplified version of the residue depth as discussed in [chakravarty1999] and gets directly calculated when structural information gets added to the StructureDB.
- The amino acid frequencies as given by an input sequence profile
- The amino acid frequency derived from structural alignments as described in [zhou2005] - Since the calculation of such a profile already requires a StructureDB, we end up in a hen and egg problem here... When adding structural information to the StructureDB, the according memory gets just allocated and set to zero. The usage of this information is therefore only meaningful if you calculate these profiles and manually set them (or load the provided default database).
Defining Chains and Fragments
The CoordInfo gets automatically generated when new chains are added to the structural database. It contains internal information of how the according chain is stored in the database.
The FragmentInfo defines a fragment in the structural database.
param chain_index: | Fills :attr:`chain_index` |
---|---|
param offset: | Fills :attr:`offset` |
param length: | Fills :attr:`length` |
The Structure Database
The following code example demonstrates how to create a structural database and fill it with content.
Calculating the structural profiles is highly expensive and heavily depends on the size of the database used as source. If you want to do this for a larger database, you might want to consider two things:
- Use a database of limited size as structural source (something in between 5000 and 10000 nonredundant chains is enough)
- Use the :class:`ost.seq.ProfileDB` to gather profiles produced from jobs running in parallel
Finding Fragments based on Geometric Features
The fragment database allows to organize, search and access the information stored in a structural database (:class:`StructureDB`). In its current form it groups fragments in bins according to their length (incl. stems) and the geometry of their N-stem and C-stem (described by 4 angles and the distance between the N-stem C atom and the C-stem N atom). It can therefore be searched for fragments matching a certain geometry of N and C stems. The bins are accessed through a hash table, making searching the database ultra fast.
This example illustrates how to create a custom FragDB based on a StructureDB:
param dist_bin_size: | Size of the distance parameter binning in A |
---|---|
param angle_bin_size: | Size of the angle parameter binning in degree |
type dist_bin_size: | :class:`float` |
type angle_bin_size: | :class:`int` |
Finding Fragments based on Sequence Features
In some cases you might want to use the :class:`StructureDB` to search for fragments that possibly represent the structural conformation of interest. The :class:`Fragger` searches a :class:`StructureDB` for n fragments, that maximize a certain score and gathers a set of fragments with a guaranteed structural diversity based on an rmsd_threshold. You can use the :class:`Fragger` wrapped in a full fletched pipeline implemented in :class:`~promod3.modelling.FraggerHandle` or search for fragments from scratch using an arbitrary linear combination of scores:
- SeqID: Calculates the fraction of amino acids being identical when comparing a potential fragment from the :class:`StructureDB` and the target sequence
- SeqSim: Calculates the avg. substitution matrix based sequence similarity of amino acids when comparing a potential fragment from the :class:`StructureDB` and the target sequence
- SSAgree: Calculates the avg. agreement of the predicted secondary structure by PSIPRED [Jones1999] and the dssp [kabsch1983] assignment stored in the :class:`StructureDB`. The Agreement term is based on a probabilistic approach also used in HHSearch [soding2005].
- TorsionProbability: Calculates the avg. probability of observing the phi/psi dihedral angles of a potential fragment from the :class:`StructureDB` given the target sequence. The probabilities are extracted from the :class:`TorsionSampler` class.
- SequenceProfile: Calculates the avg. profile score between the amino acid frequencies of a potential fragment from the :class:`StructureDB` and a target profile assuming a gapfree alignment in between them. The scores are calculated as L1 distances between the profile columns.
- StructureProfile: Calculates the avg. profile score between the amino acid frequencies of a potential fragment from the :class:`StructureDB` and a target profile assuming a gapfree alignment in between them. The scores are calculated as L1 distances between the profile columns. In this case, the amino acid frequencies extracted from structural alignments are used.
A Fragger object to search a :class:`StructureDB` for fragments with seq as target sequence. You need to add some score components before you can finally call the Fill function.
param seq: | Sequence of fragments to be searched |
---|---|
type seq: | :class:`str` |
A simple storable map of Fragger objects. The idea is that one can use the map to cache fragger lists that have already been generated.
You can use :meth:`Contains` to check if an item with a given key (:class:`int`) already exists and access items with the [] operator (see :meth:`__getitem__` and :meth:`__setitem__`).
Serialization is meant to be temporary and is not guaranteed to be portable.
The PsipredPrediction class
A container for the secondary structure prediction by Psipred.
Represents a list of :class:`PsipredPrediction` objects
[soding2005] | (1, 2) Söding J (2005). Protein homology detection by HMM-HMM comparison. Bioinformatics 21 (7): 951–960. |
[sanner1996] | Sanner M, Olson AJ, Spehner JC (1996). Reduced Surface: an Efficient Way to Compute Molecular Surfaces. Biopolymers 38 (3): 305-320. |
[chakravarty1999] | Chakravarty S, Varadarajan R (1999). Residue depth: a novel parameter for the analysis of protein structure and stability. Structure 7 (7): 723–732. |
[zhou2005] | Zhou H, Zhou Y (2005). Fold Recognition by Combining Sequence Profiles Derived From Evolution and From Depth-Dependent Structural Alignment of Fragments. Proteins 58 (2): 321–328. |
[Jones1999] | Jones DT (1999) Protein secondary structure prediction based on position-specific scoring matrices. J. Mol. Biol. 292: 195-202. |
[kabsch1983] | Kabsch W, Sander C (1983) Dictionary of protein secondary structure: pattern recognition of hydrogen-bonded and geometrical features. Biopolymers 22 2577-2637. |