algorithms.rst



Modelling Algorithms
A collection of algorithms that can be useful in modelling

El Enumerador
In the modelling process you typically have to generate loops at several
locations in the model. They're potentially close enough to interact. If you
simply iterate over to locations and set the loop at one location after the
other, the scorer cannot "see" loops at locations that will be processed later
on. A naive approach to tackle this problem would be to generate an enumeration
with all possible loop combinations:
For every possible combination of loops:

Set the according loop from every location in the environment
Calculate the scores from all loops given that environment
Keep the best overall score

Assuming a small modelling problem with a total of 5 locations with 10
candidates each, this already gives 100000 set loop scoring environment calls
and 100000 loop scoring calls. ElEnumerador tries to solve the problem in
a more clever approach. It first generates an interaction graph of which
locations actually have candidates that interact (If any of the candidates
have a CA distances closer than 10A). This graph gets then decomposed into
a minimal width tree and solved in a similar manner as in the sidechain
modelling problem. This typically massively reduces the problem size by solving
small subproblems and allows to tackle larger problems in a feasible amount
of computing time.

Rigid Blocks
RMSD is a typical measure for similarity of two structures. Given an atom atom
mapping between two structures, the minimum RMSD and the according superposition
can efficiently be calculated using an approach based on singular value
decomposition. This approach is problematic if there are very dissimilar regions
or when domain movement events occur. We can therefore implement an iterative
superposition. The two structures undergo an initial superposition. For every
iteration we then select a subset of atoms that are within a certain distance
threshold that serve as input for the next superposition. This iterative
superpostion typically converges to the largest common subpart but is
non-deterministic since it depends on the initial superposition.
The RigidBlocks algorithm is based on only the CA positions and performs this
iterative superposition multiple times by using a sliding window to select the
initial subset and gathers all unique results. These results can be very
similar and only differ by single positions. The algorithm therefore reduces
the amount of solutions by merging them based on a threshold of similarity.
If the sum of matching positions within the distance threshold divided by
the maximum length of the two solutions is above a cluster thresh, the two
solutions get merged by producing a common solution containing the shared
positions. As a final result, the algorithm therefore detects common rigid
subsets of positions.