Read Sequence - Congen V2.2.3327

Next: Read Coordinates, Previous: Read Syntax, Up: Read Command

3.1.2 Specifying a Sequence of Residues for a Segment

The specification of SEQUence or RESIdue causes the program to accept a sequence of residue names to be used to generate the next segment in the molecule. There are four sources of sequence information. The first source is a CONGEN format sequence file which has the following syntax:

     title
     number-of-residues repeat(residue-names)

The form of the title is defined in the syntactic glossary, see Syntactic Glossary. The number of residues is specified on the line following the title in free field format. If the number of residues you specify is less than zero, CONGEN will read residues until it encounters a blank line or end of file. If the number is greater than zero, it will also stop once it has read at least as many residues as you've specified. If the number you specify is zero, you will get a warning message as one common error is to forget the number entirely. In this case, the first residue name will be consumed as the number and converted to zero.

The residue names are specified as separate words, each no longer than 4 characters, on as many lines as are required for all the residues. This sequence may be placed immediately following the READ command if the unit number is 5 or may be placed in a separate file.

The second source of sequences is a CONGEN coordinate file in CARD format. Currently, the BYATom option reads all residues within the file for inclusion in the sequence.

The third source of sequence information is a Brookhaven Protein Data Bank file. The BROOKHAVEN, BRKHVN, and TAPE options allow the sequence to be read from the SEQRES records in a Brookhaven protein data bank coordinate file. (TAPE is used because the Brookhaven protein data bank used to come on a tape.) If the CHAIN option is specified, then only the sequence of chain with the specified segid is read. Otherwise, the sequence of all the chains will be read together. Note that the Brookhaven format only allows single letter chain names, so your segid should only have one character.

Alternatively, the sequence may be read directly from the ATOM records by using the BYATOM option. Under the BYATOM option, if there are insertions or deletions in the within the list of residue idenitifiers, the IDREAD option will read the sequence identifiers, including insertion codes, directly from the Brookhaven file, rather than automatically generating a residue number based on sequential order. It should be noted that currently, the IDREAD option conflicts with the DISULPHIDE command, since this command assumes that the residue identifiers are those generated automatically. The MODEL option may also be used in conjunction with BYATOM to read the sequence from a particular model number in the file. If not specified, the first model in the brookhaven file is used.

The final source of sequences are the two water options. The WATEr option allows a sequence of water molecules to be specified. The integer which follows the keyword gives the number of waters. Likewise, the ST2 option allows ST2 waters to be specified. Obviously, no sequence on separate lines need be given. For CONGEN topology files, a residue named OH2 (or ST2) must be present. For AMBER94 topology file, a residue named HOH must be present. If these residues are missing, the GENErate command called afterwards will fail.

When reading is complete, CONGEN will list all the residues it has read.

The options; PROT, HPRO, ALLH, and DNA; specify what type of CHARMM potential file is being read. They are very important because they specify which patching operations are to take place on the segment once it is generated. The patching operations involve correcting the linkage of prolines, and correcting the charges and chemical types of the ends of the segment. PROT signifies that we are using an extended atom residue topology file as the source of residues. HPRO signifies that we have an explicit hydrogen topology file being used. ALLH signifies that we have an all hydrogen topology file. DNA specifies that we are working with the DNA topology file.

In addition, these options may cause additional residues to be added to the sequence. These additional residues serve to terminate the segment. However, if the segment is generated cyclically (see Generate Command), then no termini will be added. In particular, PROT will add a CTER residue that has the C-terminal oxygen. HPRO and ALLH will add a CTER residue along with an NTER residue that holds two additional hydrogens for the N-terminus. DNA will add a 5TER to the beginning and a 3TER residue to the end of the segment.

If an AMBER94 topology file is being used, then the keywords, A94P or A94N, should be specified to indicate whether a protein or nucleic sequence is being read. Use of these keywords will then result the correct terminal residues being used at the ends of the segment. See Generate Command, for more information about this process.

The ABBREV option allows the specification of residues using one letter abbreviations. When the AA keyword is specified, one letter amino acid codes can be used. For RNA and DNA, one letter nucleotide names will be translated into the appropriate two letter AMBER94 residue names.