Next: Parallel Processing, Previous: Sidechain Overview, Up: Overview of Conformational Search

The process of conformational search can be viewed as a search of tree, where a tree means a graph with no cycles. See the following figure:

o <-- Root /|\ / | \ / | \ / | \ / | \ o o o Degree of Freedom 1 /| |\ \ / | | \ \ / | | \ \ / | | \ \ / | | \ \ o o o o o Degree of Freedom 2

In this analysis, the root of the tree represents the system at the beginning of the search process, where no degrees of freedom have been sampled. The next level represents the search process after the first degree of freedom has been sampled. In the above figure, there are three nodes at this level which signifies that the good conformations were found. The third level in the tree represents the effects of sampling the second degree of freedom, and so on.

Each node in the tree represents the result of a particular set of samplings for the degrees of freedom. Leaf nodes are those nodes which exist at the very end of the tree, and represent complete conformations, where every degree of freedom has been completely sampled. Nodes in between the root and leaves are partial conformations. Any node which has been sampled is called an expanded node, and any node which has not been sampled is called an open node.

The goal of a conformational search is to find the lowest energy leaf node. An exhaustive search of the tree can be programmed easily by simply following down the leftmost branch of the tree until either a leaf is hit, or a conformation which is blocked because of bad contacts or other problems. The program then backs up to the previous level, and examines the next node over, and again searches down towards the leaves. Effectively, the search proceeds from left to right across the entire tree, and every leaf is eventually generated. This is a depth-first search. Because each degree of freedom increases the number the nodes by an approximately constant factor, the time for an exhaustive search is exponential in the number of degrees of freedom.

In principle, knowledge of the path through the search tree to the best leaf would make it theoretically possible to find the best leaf in time linear with the number of degrees of freedom. Such knowledge is generally not available in advance. However, in some situations, information about the partial conformations can provide a guide towards finding the best leaf. For example, if CONGEN is used to reconstruct the backbone of a protein from the alpha carbon coordinates, then the RMS deviation of the partial conformations from the known alpha carbon coordinates is an excellent guide to a good quality fit.

The use of information about partial conformations is the basis for the directed search options. When these options are used, energies or RMS deviations for conformations at each node in the tree are calculated. The program maintains a sorted index of the open nodes in the tree, and it samples degrees of freedom in an order that depends on the order of the energies or RMS deviations.

The evaluations of the open nodes can be affected by the use of the
`EINHERIT` and `EIMMEDIATE` keywords for particular degrees of freedom.
`EINHERIT` causes all nodes expanded for a particular degree of freedom
to inherit the evaluation of their father.
`EIMMEDIATE` sets the evaluation of nodes at this level to the smallest
possible value which forces the program to expand nodes at this level ahead
of all others. This is useful when you do not want the energies or RMS's of
this degree of freedom to influence the directed search.

Three selection strategies exist at present. The first strategy, the
evaluation strategy which is
selected by the `EVAL` keyword, always selects the lowest energy
or RMS deviation node. It has a serious drawback when a degree of
freedom results in conformers whose energy or RMS deviation is raised.
In that case, the program will first expand all nodes before that degree
of freedom before moving on towards the leaves.

The second strategy, the deepening evaluation strategy which is selected
by the `DEVAL` keyword, is intended to circumvent the drawback of
the `EVAL` strategy. Here, CONGEN maintains a sorted index of open
nodes for each degree of freedom. When the search begins, all the
indices are empty except for the first degree of freedom which contains
just the root node. The program then cycles through each degree of
freedom, and selects the lowest energy or RMS deviation node at that
level for expansion. If there are no open nodes available, or if the
search reaches the leaves, then it cycles back to the first degree of
freedom. This cycling forces the program to make progress toward the
leaves.

The third strategy, the mixed method selected by the `MIX` keyword,
is a combination of `EVAL` and `DEVAL`. The `MIX` strategy
maintains both types of indices, and simply alternates between each
rule for the selection of nodes to expand. So far, it is the best directed
search strategy.

It is possible to use the directed search methods to generate random
structures. The trick here is to use random numbers for the evaluation
of nodes rather than energy values. There are two possible approaches
to the generation of random structures, generating multiple
conformations in a single CONGEN run or making multiple runs of CONGEN
with each generating one conformation. In the first case, the `EVAL`
search option generally leads to more variation in the structures,
although there is substantial overlap of structures. In the second case,
the `DEVAL` option should be used because it is faster for each run.

The directed search capabilities are still under active development, and suggestions for improvement are always welcome.

When the directed search methods are used, the program maintains a search
tree in memory. Since each node requires several hundred bytes of memory,
it is impractical to keep the entire search tree. Thus, there is a tree
pruning mechanism which can be tailored to periodically eliminate nodes
that are either unnecessary or unlikely to be sampled in a reasonable amount
of time given the search strategy in use. See Global Options for Conformational Search, for a description of the `TREE` options.