Next: , Previous: Read Sequence, Up: Read Command


3.1.3 Reading coordinates

The reading of coordinates is done with the READ COOR command, and there are several options (which may change over in future versions).

There are four possible file formats that can be used to read in coordinates. They are coordinate binary files, dynamics coordinate trajectories, coordinate card images, and Brookhaven Protein Data Bank files.

For all formats, a subset of the atoms in the PSF may be selected using the standard atom selection syntax. For binary files, this is a risky maneuver, and warning messages are given when this is attempted. Only coordinates of selected atoms may be modified. When reading binary files, or using the IGNORE keyword, coordinate values are mapped into the selected atoms sequentially (NO checking is done!).

The reading of the first two file formats is specified with the FILE option. The program reads the file header to tell which format it is dealing with. The coordinate binary files have a file header of COOR and contain only one set of coordinates. These are created with a WRIT COOR FILE command. The dynamics coordinate trajectories have a file header of CORD and have multiple coordinate sets. These files are created by the dynamics function of the program. To specify which coordinate set in the trajectory to be read, the IFILE option is provided. One specifies the coordinates position within the file. The default value for this option will cause the first coordinate set to be read.

For binary files, the APPEnd command will 'deselect' all atoms up to the highest one with a known position. This is done in addition to the normal atom selection. This is useful for structures with several distinct segments where it is desirable to keep separate coordinate modules.

The CARD file format is the standard means in CONGEN for providing a human readable and writable coordinate file. The format is as follows:

              title
              NATOM (I5)
              ATOMNO RESNO   RES  TYPE  X     Y     Z  (repeated NATOM times)
                I5    I5  1X A4 1X A4 F10.5 F10.5 F10.5

The title is a title for the coordinates, see Syntactic Glossary. Next comes the number of coordinates. If this number is zero or too large, the entire file will be read. Finally, there is one line for each coordinate. The coordinates, but not the initial lines, may contain blank lines for readability

ATOMNO gives the number of the atom in the file. It is ignored on reading. RESNO gives the residue number of the atom. It must be specified relative to the first residue in the PSF. The OFFSet option should be specified if one wishes to read coordinates into other positions. The APPEnd option adds an additional offset which points to the the residue just beyond the highest one with known positions. This option also `deselects' all atoms below this residue (inclusive). For example, if one is reading in coordinates for the second segment of a two chain protein using two card files, and the APPEnd option is used, RESNO must start at 1 in both files for the file reading to work correctly.

It should also be remembered that for card images, residues are identified by residue number. This will change someday. What this implies, is that if one wishes to read coordinates from an extended atom (PROT) RTF into a structure using an explicit hydrogen (HPRO) RTF, the OFFSet keyword MUST be used to shift the residue numbers by one, (to make room for the NTER) so that the residues will line up. If the reverse process is required, an OFFSet value of -1 is called for.

RES gives the residue name of the atom. RES is checked against the residue name in the PSF for consistency. TYPE gives the IUPAC name of the atom. The coordinates of an atom within a residue need not be specified in any particular order. A search is made within each residue in the PSF for an atom whose IUPAC name is given in the coordinate file.

The MAXERR option controls how many error messages are printed. Its default value is 10. Normally, the coordinate reader will scan the entire file, and it will list errors as it encounters them, until to the MAXERR limit. At the end of reading, it will terminate execution if any fatal errors were encountered.

The KONN option allows the reading of Konnert Hendrickson format files. The file consists of just atom records where each atom coordinate has the following format:

             Res Segid Resid Iupac X Y Z
          3X,A4,  A1,   A3,   A4,  3F10.5

The four alphabetic fields are left justified by the program so they can be placed anywhere within their columns. If the Segid is not specified, the program will attempt to place the atoms within a segment which is determined by the APPEnd option (above). If APPEND is not specified, then the first segment in the structure will be used. If APPEND is specified, then the first segment which has a residue with all undefined atoms will be used. Blank lines may be specified between coordinates.

Note that the Segid and Resid fields are too small to hold the maximum length values. Truncations will cause unavoidable problems. However, residue identifiers NTE and CTE are extended to NTER and CTER.

The BROOKHAVEN option (or its synonyms, TAPE or BRKHVN) specify that the coordinate file is in the Brookhaven Data Bank format. CONGEN can read the ATOM records for coordinates. However, because the Brookhaven format uses slightly different naming conventions, there are a number of inconsistencies you should be aware of when using this option:

  1. Chain identifiers in Brookhaven are only one character long. In CONGEN, the corresponding segment identifiers are currently four characters. Thus, if you read a Brookhaven file with chain identifiers, you must generate your segments with one character identifiers (see Generate Command). If no chain identifiers are present in the Brookhaven file, then CONGEN will search the coordinates for the first residue which has all undefined atoms. Then, it will add the value you specify for OFFSET to this number, and it will read coordinates into the segment which contains the offsetted residue. Be careful in the case where the terminal atom is undefined because in the protein and DNA cases, that atom is in a residue all by itself.
  2. The sequence number in the Brookhaven data bank is an amalgam of a residue number and an insertion code, whereas in CONGEN, it is a four letter identifier which is usually just the text representation of the sequence number (except for the terminating residues). There are two ways that you can handle this number in CONGEN. The SEQUENCE option, which is the default, causes CONGEN to assume that the atoms are provided in sequence order, and that every change of sequence number or insertion code in the file implies that the next residue is being specified. When insertions and deletions are present, the IDREad option is used to read the both the residue number and the insertion code. Note that this option must used in conjunction with reading the sequence with the IDREad option. If NOSEQUENCE is specified, then the residue number is used, but the insertion code is ignored.
  3. Hydrogen atoms have different specifications. In some cases the final digit of the hydrogen name is placed before the `H'. In others it is placed after. CONGEN will move any digit found before the `H' after the other atoms in the name.
  4. Sometimes, the hydrogen atom attached to the peptide nitrogen is labeled `HN'. If so, it is renamed to `H'.
  5. Atoms at the terminii, such as `NT' and `OT1' are renamed.
  6. Atoms can be defined at different positions in the Brookhaven Data Bank. CONGEN will use the last value found in the file.

  7. If you wish to select a particular alternate location identifier for a set of coordinates, use the ALTERNATE option along with the one-letter identifier desired.

  8. Multiple models may be handled by using the MODEL option. In these cases, it is necessary to specify the model number you wish to use.

Reading Brookhaven file format is not straightforward, so check the coordinates after they are read to see if there are correct. Energy evaluations (see Energy Manipulations followed by analysis of the geometric terms (see Analysis) are a useful way to do this. Also, the brkchm command (see Brkchm) is an alternate way of converting Brookhaven files into a form that can be edited.

The IGNORE option allows one to read in a card coordinate file while bypassing the normal tests of the residue name, number, and atom name. When IGNORE is specified in place of card, the identifying information is ignored completely. Starting from the first selected atom, the coordinates are copied sequentially from the file.

Normally, the coordinates are not reinitialized before new values are read, but if this is desired, the INITIALIZE keyword, will cause the coordinate values for all selected atoms to be initialized. Note that only atoms that have been selected, will be initialized. The COOR INIT command provides a more general way to initialize coordinates.

The EXPAnd option should be specified if the following conditions apply:

  1. An explicit hydrogen topology file is being used, and the coordinates we are reading do not have hydrogens in them.
  2. The coordinates were read using the IGNORE option or were read from a binary file.

In this case, the coordinates will be shuffled in order to leave room for the hydrogens. The hydrogen bond generation routine, HBUILD Command, or the builder routines, Internal Coordinates, must be called to construct the positions of these hydrogens.

It is also possible to read coordinates into the comparison (or reference) set using the COMP keyword. The DIFF keyword will read coordinates into the coordinate differences (also referred to as the normal mode arrays). It expected that these “coordinates” are really displacements that will be processed by the vibrational analysis command, see Vibrational Analysis.

Currently, CONGEN will perform a limited set of name translations on any formatted coordinate reading operation. The isoleucine translations are not needed for the AMBER 94 topology file, see AMBER94RTF. represent common differences in nomenclature:

Residue
Input => Final
ILE
CD1 => CD
ILE
HD11 => HD1
ILE
HD12 => HD2
ILE
HD13 => HD3
SER
HG1 => HG
OH2
O => OH2

The ABBREV option allows the specification of residue names using one letter abbreviations. When the AA keyword is specified, one letter amino acid codes can be used. For RNA and DNA, one letter nucleotide names will be translated into the appropriate two letter AMBER94 residue names.

Finally, the reading of coordinates is always a tricky business. Although standards exist for naming conventions, there are enough minor variations to make the situation difficult. Always check the structure after reading coordinates to ensure that the geometries and energies are reasonable.