Volume 2 Chapter 10 Search Menu *PDBSEQ
*PEPTIDE is used to test for amino-acids and peptide sequences using 3-letter code symbols.
Graphics QUEST3D Procedure
The equivalent test construction in Graphics QUEST3D is discussed under PEPTIDE-SEQ.
CSD Contents
Entries having basic class 48 now contain, in the compound name synonym field, information describing the sequence of [[alpha]]-amino-acids. The general form of this information is:
n is the number of amino-acid residues.
Sequence indicates which amino-acid residues are involved and how they are linked.
A few examples will illustrate the principal features of the coding scheme:
PEPSEQ A=1 ALA
There is only one residue, viz. alanine
D-alanine, L-alanine, DL-alanine would all be represented by this notation.
Note that all simple amino-acids are designated A=1
PEPSEQ A=1 ALA*
An * immediately following the 3-letter code symbol indicates that the amino-acid has been modified in some way.
Thus ALA* would apply to, eg.
PEPSEQ A=3 PRO*-LEU-GLY
This indicates an acyclic peptide containing the 3 residues PRO* and LEU and GLY joined together by normal peptide links.
The symbol - indicates the normal peptide link.
PEPSEQ A=2 CYS,HCY
This indicates an acyclic peptide containing the two amino-acids cysteine and homocysteine joined together by a non-peptide link. In this example the two residues are linked through their sulfur atoms to form a disulfide bridge.
The symbol , indicates a non-peptide link.
PEPSEQ C=5 GLY-PRO-SER-ALA-VAL-
This indicates a cyclic peptide containing 5 amino-acid residues joined by normal peptide links.
The final - indicates that VAL is joined to GLY by a normal peptide link.
PEPSEQ C=3 PHE-PRO,UND,
This indicates a cyclic peptide containing 3 residues.
The final , indicates that UND is joined to PHE by a non-peptide link.
The 3-letter code symbol UND is used to indicate an undefined residue. Such residues cannot be described as modified [[alpha]]-amino-acids and examples are lactyl, [[alpha]]-hydroxyisovaleryl, [[beta]]-alanyl, etc.
Note, however, that UND is included in the count of residues when assigning the value of n.
PEPSEQ A=1 ARG A=1 GLU
This example illustrates the situation where two amino-acids, arginine and glutamic acid, co-crystallise.
Alanine ALA Glycine GLY Proline PRO Arginine ARG Histidine HIS Serine SER Asparagine ASN Isoleucine ILE Threonine THR Aspartic acid ASP Leucine LEU Tryptophan TRP Cysteine CYS Lysine LYS Tyrosine TYR Glutamine GLN Methionine MET Valine VAL Glutamic acid GLU Phenylalanine PHE
A further 9 [[alpha]]-amino-acids are assigned 3-letter codes in the CSD:
a-Aminoisobutyric acid AIB Norvaline NVA Homocysteine HCY Ornithine ORN Homoserine HSE Pyroglutamic acid GLP Isovaline IVA Sarcosine SAR Norleucine NLE
Other less common [[alpha]]-amino-acids are described as modifications of those listed above.
Examples are:
Basic QUEST Procedure
Searching for simple amino-acids and peptides can be accomplished as text searches using *SYNONYM but the search keyword *PEPTIDE offers more flexibility.
The form of *PEPTIDE search queries is illustrated by the following 2 examples:
(i)
T1 *PEPT PSEQ -PRO-AIB- QUES T1
This will register hits for:
PEPSEQ A=5 AIB*-PRO-AIB-ALA-AIB* PEPSEQ C=4 CYS*-PRO-AIB-CYS*-etc.
The PSEQ record is used to define the search sequence.
(ii)
T2 *PEPT PDEF ABC= ILE LEU PSEQ -GLY-ABC-GLY- QUES T2
This will register hits for:
PEPSEQ C=8 ASN-PRO*-THR*-TRP*-GLY-ILE-GLY-CYS*- PEPSEQ C=6 GLY-LEU-GLY-GLY-LEU-GLY-etc.
In this example the PSEQ record is preceded by a PDEF record which allows the user to define combinations of the 3-letter code symbols.
Here ABC is defined to be either ILE or LEU.
PDEF can be compared to ELDEF in a connectivity search test packet.
PSEQ
The general form of this record can be exemplified thus:
PSEQ %-ALA-ARG*-,CYS'- A
It is composed of three components:
Note also that the symbol ANY can be used to indicate any of the twenty-nine 3-letter code symbols.
ARG* is joined to CYS' by either a peptide link or a non-peptide link.
When alternative links are allowed the order of -,% is immaterial.
The general form of this record can be exemplified thus:
PDEF ABC= ILE+LEU PDEF XYZ= ANY-ABC
More than one PDEF record can be present in a *PEPTIDE search question and their effect is cumulative.
In the above example,
The 3-letter code UND can not be used in a PDEF record.
The new symbol for the defined residue must start with an alphabetic character and can have a maximum of 3 alphanumeric characters.
Examples
Ex.1
T1 *PEPT PSEQ -PRO-AIB- QUES T1
This registers hits for:
PEPSEQ A=5 AIB*-PRO-AIB-ALA-AIB* PEPSEQ A=4 AIB*-PRO-AIB-PRO* PEPSEQ C=4 CYS*-PRO-AIB-CYS*-etc.
If we had coded PSEQ -PRO-AIB- A then hits would be registered only for acyclic sequences.
Ex.2
T2 *PEPT PSEQ -PRO-GLY QUES T2
This registers a hit for:
PEPSEQ A=3 PRO*-PRO-GLYbut not for:
PEPSEQ A=4 VAL*-PRO-GLY-VAL
In PSEQ the GLY residue does not have any bond indication to its right.
The software then interprets this as a terminal GLY and the sequence to be acyclic.
Ex.3
T3 *PEPT PSEQ LEU-GLY- QUES T3
This registers a hit for:
PEPSEQ A=4 LEU-GLY-GLY-GLYbut not for:
PEPSEQ A=3 PRO*-LEU-GLY
In PSEQ the LEU residue does not have any bond indication to its left.
The software then interprets this as a terminal LEU and the sequence to be acyclic.
Also, in PSEQ the GLY residue is not terminal whereas in the A=3 entry the GLY is terminal.
Ex.4
T4 *PEPT PSEQ -,CYS*-PRO-AIB- C QUES T4
This registers a hit for:
PEPSEQ C=4 CYS*-PRO-AIB-CYS*,and would also for:
PEPSEQ C=4 CYS*-PRO-AIB-ALA-
Ex.5
T5 *PEPT PSEQ -,%CYS*%,- QUES T5
This register hits for:
PEPSEQ A=1 CYS* PEPSEQ A=2 PRO*-CYS* PEPSEQ C=2 CYS*-PRO- PEPSEQ A=2 CYS,CYS* PEPSEQ C=8 CYS*,UND,THR*,UND,CYS*,UND,THR*,UND,etc.
Ex.6
T6 *PEPT PSEQ SER' QUES T6
This registers hits for:
PEPSEQ A=1 SER PEPSEQ A=1 SER* +
Ex.7
T7 *PEPT PSEQ ANY QUES T7
This registers hits for all unmodified simple amino-acids.
Ex.8
T8 *PEPT PDEF ABC= ILE LEU PDEF X2= ANY-ABC PSEQ -GLY-X2-GLY- QUES T8
This registers a hit for:
PEPSEQ C=6 PRO-GLY-PRO-GLY-PRO-GLY-but not for:
PEPSEQ C=6 GLY-LEU-GLY-GLY-LEU-GLY-
Ex.9
T9 *PEPT PSEQ -PHE-PHE-PRO- T5 *PEPT PSEQ -PHE-VAL-PRO- QUES T9.AND.T5
This registers a hit for:
PEPSEQ C=10 ALA-PHE-PHE-PRO-PRO-PHE-PHE-VAL-PRO-PRO-
Ex.10
T3 *PEPT PSEQ -GLY,-ANY,-GLY'- QUES T3
This registers a hit for:
PEPSEQ C=8 PRO-GLY-PRO-GLY-PRO-GLY-PRO-GLY-but not for:
PEPSEQ C=7 PRO*-GLY,UND,GLY*-GLY,PRO*-GLY,This example illustrates that UND is ignored in determining whether or not an entry is a hit.
Related Bit Screens
The database bit screen is 67.
Ex.
SCRE 67
This query screen registers hits for all entries having a peptide sequence record.
Related Keywords
*SYNONYM
Ex.A
T1 *SYNO PEPSEQ A=5 QUES T1
This registers a hit for all acyclic sequences having 5 residues.
Ex.B
T2 *SYNO PEPSEQ C=2 QUES T2
This registers a hit for all cyclic sequences having 2 residues.
Ex.C
T3 *SYNO PEPSEQ A=1 T4 *SYNO A=1 QUES T3.AND.T4
This registers a hit for all entries involving co-crystallisation of two simple amino-acids, unmodified or modified,for example:
PEPSEQ A=1 ARG A=1 GLU