-
Studer Gabriel authoredStuder Gabriel authored
Modelling Algorithms
A collection of algorithms that can be useful in modelling
El Enumerador
In the modelling process you typically have to generate loops at several locations in the model. They're potentially close enough to interact. If you simply iterate over to locations and set the loop at one location after the other, the scorer cannot "see" loops at locations that will be processed later on. A naive approach to tackle this problem would be to generate an enumeration with all possible loop combinations:
For every possible combination of loops:
- Set the according loop from every location in the environment
- Calculate the scores from all loops given that environment
- Keep the best overall score
Assuming a small modelling problem with a total of 5 locations with 10 candidates each, this already gives 100000 set loop scoring environment calls and 100000 loop scoring calls. ElEnumerador tries to solve the problem in a more clever approach. It first generates an interaction graph of which locations actually have candidates that interact (If any of the candidates have a CA distances closer than 10A). This graph gets then decomposed into a minimal width tree and solved in a similar manner as in the sidechain modelling problem. This typically massively reduces the problem size by solving small subproblems and allows to tackle larger problems in a feasible amount of computing time.
Rigid Blocks
RMSD is a typical measure for similarity of two structures. Given an atom atom mapping between two structures, the minimum RMSD and the according superposition can efficiently be calculated using an approach based on singular value decomposition. This approach is problematic if there are very dissimilar regions or when domain movement events occur. We can therefore implement an iterative superposition. The two structures undergo an initial superposition. For every iteration we then select a subset of atoms that are within a certain distance threshold that serve as input for the next superposition. This iterative superpostion typically converges to the largest common subpart but is non-deterministic since it depends on the initial superposition. The RigidBlocks algorithm is based on only the CA positions and performs this iterative superposition multiple times by using a sliding window to select the initial subset and gathers all unique results. These results can be very similar and only differ by single positions. The algorithm therefore reduces the amount of solutions by merging them based on a threshold of similarity. If the sum of matching positions within the distance threshold divided by the maximum length of the two solutions is above a cluster thresh, the two solutions get merged by producing a common solution containing the shared positions. As a final result, the algorithm therefore detects common rigid subsets of positions.