Introduction to Analysis - Congen V2.2.3327

Next: Analysis Usage, Previous: Analysis, Up: Analysis

16.1 An Overview of the Analysis Facility

CONGEN provides a facility for analyzing the results of any calculation made as well as comparing one's results to any other calculation. This facility is general in the sense that it will work with arbitrary residues, any set of parameters, and will permit a broad range of comparisons. It provides all the features of Bruce Gelin's ANC program plus many others.

There are several important aspects of the design of this facility which are important to its users. First, the facility provides a small number of simple commands which can be combined to do a variety of tasks. Secondly, the program is well adapted is the job of comparing results, regardless of whether the results were obtained on the same system or on a homologous one. This will permit previously impossible comparative studies to be performed – such as comparing the dynamics of hemoglobin and myoglobin in homologous regions or comparing the results obtained from the explicit hydrogen or extended atom models.

These two design considerations dictate a great deal about the analysis facility's operation. The first consideration, being able to combine commands, require that the facility store the results of one operation so that it can be used in another. There are two data structures (see Glossary) that the facility uses to store such results, and they are important to understand.

The major data structure is the table. In analyzing the large amounts of data inherent in a macromolecule, we need a method for organizing it. Consider, for example, the 631 bond angles in bovine pancreatic trypsin inhibitor. Without a good ordering of these angles, it would difficult for a person to see any relationships in these angles. However, since a structure in CONGEN consists of a number of segments which, in turn, consist of a number of residues which a number of atoms or internal coordinates, we can organize the data along these lines.

Therefore, a table contains a list of segments which are identified by their segment identifiers as specified in the GENERATE command. Each segment contains a list of residues. The residues are named (GLY, ALA, etc.) and have identifiers as well. The identifiers are the character form of the sequence number of residue. Each residue in the table contains a list of data arrays where every array is “tagged”. “Tagged” means that each array point has associated with it a character string which serves to identify it. The tags are easily constructed. For example, the tag for a bond is the IUPAC name for each atom in the bond separated by a dash. Each element of the array contains a property of the atom or of the internal coordinates. For example, the minimum energy and average length of bonds during a dynamics run are properties of bonds. The table also contains a title which identifies the entire table and is printed along with it.

Many operations can be performed on these tables. First, the BUILD command will generate a table. Currently, there are dozens of different tables which can be generated. Tables can be printed in several different ways using the PRINT command. Simple statistical information can be added to them using the ADD command. The DELETE command may be used to delete data from them so that one more easily study a subset of the data. Finally, the SELECT command may be used to select data from a table and record the results in the second major data structure, the selection.

The selection is another data structure which is a collection of data which is less organized than the table. The selection consists an array of numbers where every number has associated with it its position in the table as well as the residue to which it belonged. Two selections are provided in the analysis facility, and the following operations are supported: First, data may be selected from the table using SELECT command. Second, a histogram of the selected data may be made using the HISTO command. Third, using the PLOT command, the data in the selection may be plotted against its position on the table or against the residue number of the residue to which each data point belongs. Finally, the two selections may be plotted together using the 2DPLOT command to yield a scatter plot. We can therefore make a scatter plot of any two sets of numbers; we are not limited to phi-psi plots.

In addition to analyzing the static properties of the structure which are maintained in CONGEN, the analysis facility can analyze the results from a dynamics calculation. Properties of the internal coordinates and atoms which are fairly easy to calculate can be built into a table. The ACCUM and COMBINE commands are used for preparing the data for inclusion into a table. Correlation functions may also be calculated using the CORREL command.

Complementing the above commands, there are commands which perform more isolated functions. The analysis facility has READ and WRITE commands for reading and writing data structures which are peculiar to it. There is a close contact search command, SEARCH, which searches for close contacts of atoms to other atoms or to spatial positions. There is a DRAW command which prepares input to the PLT2 plotting program, and MOLD, a molecule drawing program. The SET command may be used to change I/O units and the size of the page. Finally, there is a command, DELIM, which changes the command delimiter.