The reading of coordinates is done with the READ COOR command, and there are several options (which may change over in future versions).
There are four possible file formats that can be used to read in coordinates. They are coordinate binary files, dynamics coordinate trajectories, coordinate card images, and Brookhaven Protein Data Bank files.
For all formats, a subset of the atoms in the PSF may be selected using the standard atom selection syntax. For binary files, this is a risky maneuver, and warning messages are given when this is attempted. Only coordinates of selected atoms may be modified. When reading binary files, or using the IGNORE keyword, coordinate values are mapped into the selected atoms sequentially (NO checking is done!).
The reading of the first two file formats is specified with the
FILE option. The program reads the file header to tell which format it
is dealing with. The coordinate binary files have a file header of
COOR and contain only one set of coordinates. These are created with a
WRIT COOR FILE command. The dynamics coordinate trajectories have a file
CORD and have multiple coordinate sets. These files are
created by the dynamics function of the program. To specify which
coordinate set in the trajectory to be read, the IFILE option is
provided. One specifies the coordinates position within the file. The
default value for this option will cause the first coordinate set to be
For binary files, the APPEnd command will 'deselect' all atoms up to the highest one with a known position. This is done in addition to the normal atom selection. This is useful for structures with several distinct segments where it is desirable to keep separate coordinate modules.
The CARD file format is the standard means in CONGEN for providing a human readable and writable coordinate file. The format is as follows:
title NATOM (I5) ATOMNO RESNO RES TYPE X Y Z (repeated NATOM times) I5 I5 1X A4 1X A4 F10.5 F10.5 F10.5
title is a title for the coordinates, see Syntactic Glossary. Next comes the number of coordinates. If this number is zero
or too large, the entire file will be read. Finally, there is one line
for each coordinate. The coordinates, but not the initial lines,
may contain blank lines for readability
ATOMNO gives the number of the atom in the file. It is ignored
RESNO gives the residue number of the atom. It must be
specified relative to the first residue in the PSF. The OFFSet option
should be specified if one wishes to read coordinates into other positions.
The APPEnd option adds an additional offset which points to the
the residue just beyond the highest one with known positions. This option
also `deselects' all atoms below this residue (inclusive).
For example, if one is reading in coordinates for the second segment of a
two chain protein using two card files, and the APPEnd option is used,
RESNO must start at 1 in both files for the file reading to work
It should also be remembered that for card images, residues are identified by residue number. This will change someday. What this implies, is that if one wishes to read coordinates from an extended atom (PROT) RTF into a structure using an explicit hydrogen (HPRO) RTF, the OFFSet keyword MUST be used to shift the residue numbers by one, (to make room for the NTER) so that the residues will line up. If the reverse process is required, an OFFSet value of -1 is called for.
RES gives the residue name of the atom.
RES is checked against
the residue name in the PSF for consistency.
TYPE gives the IUPAC name
of the atom. The coordinates of an atom within a residue need not be
specified in any particular order. A search is made within each residue
in the PSF for an atom whose IUPAC name is given in the coordinate file.
The MAXERR option controls how many error messages are printed. Its default value is 10. Normally, the coordinate reader will scan the entire file, and it will list errors as it encounters them, until to the MAXERR limit. At the end of reading, it will terminate execution if any fatal errors were encountered.
The KONN option allows the reading of Konnert Hendrickson format files. The file consists of just atom records where each atom coordinate has the following format:
Res Segid Resid Iupac X Y Z 3X,A4, A1, A3, A4, 3F10.5
The four alphabetic fields are left justified by the program so
they can be placed anywhere within their columns. If the
Segid is not
specified, the program will attempt to place the atoms within a segment
which is determined by the APPEnd option (above). If APPEND is not
specified, then the first segment in the structure will be used. If APPEND
is specified, then the first segment which has a residue with all
undefined atoms will be used. Blank lines may be specified between coordinates.
Note that the
Resid fields are too small to hold the
maximum length values. Truncations will cause unavoidable problems.
However, residue identifiers NTE and CTE are extended to NTER and CTER.
The BROOKHAVEN option (or its synonyms, TAPE or BRKHVN) specify that the coordinate file is in the Brookhaven Data Bank format. CONGEN can read the ATOM records for coordinates. However, because the Brookhaven format uses slightly different naming conventions, there are a number of inconsistencies you should be aware of when using this option:
Reading Brookhaven file format is not straightforward, so check the coordinates after they are read to see if there are correct. Energy evaluations (see Energy Manipulations followed by analysis of the geometric terms (see Analysis) are a useful way to do this. Also, the brkchm command (see Brkchm) is an alternate way of converting Brookhaven files into a form that can be edited.
The IGNORE option allows one to read in a card coordinate file while bypassing the normal tests of the residue name, number, and atom name. When IGNORE is specified in place of card, the identifying information is ignored completely. Starting from the first selected atom, the coordinates are copied sequentially from the file.
Normally, the coordinates are not reinitialized before new values are read, but if this is desired, the INITIALIZE keyword, will cause the coordinate values for all selected atoms to be initialized. Note that only atoms that have been selected, will be initialized. The COOR INIT command provides a more general way to initialize coordinates.
The EXPAnd option should be specified if the following conditions apply:
In this case, the coordinates will be shuffled in order to leave room for the hydrogens. The hydrogen bond generation routine, HBUILD Command, or the builder routines, Internal Coordinates, must be called to construct the positions of these hydrogens.
It is also possible to read coordinates into the comparison (or reference) set using the COMP keyword. The DIFF keyword will read coordinates into the coordinate differences (also referred to as the normal mode arrays). It expected that these “coordinates” are really displacements that will be processed by the vibrational analysis command, see Vibrational Analysis.
Currently, CONGEN will perform a limited set of name translations on any formatted coordinate reading operation. The isoleucine translations are not needed for the AMBER 94 topology file, see AMBER94RTF. represent common differences in nomenclature:
The ABBREV option allows the specification of residue names using one letter abbreviations. When the AA keyword is specified, one letter amino acid codes can be used. For RNA and DNA, one letter nucleotide names will be translated into the appropriate two letter AMBER94 residue names.
Finally, the reading of coordinates is always a tricky business. Although standards exist for naming conventions, there are enough minor variations to make the situation difficult. Always check the structure after reading coordinates to ensure that the geometries and energies are reasonable.