Volume 2 Chapter 10 Search Menu *PDBSEQ

Back to Table of Contents

*PEPTIDE

Function

*PEPTIDE is used to test for amino-acids and peptide sequences using 3-letter code symbols.

Graphics QUEST3D Procedure

The equivalent test construction in Graphics QUEST3D is discussed under PEPTIDE-SEQ.

CSD Contents

Entries having basic class 48 now contain, in the compound name synonym field, information describing the sequence of [[alpha]]-amino-acids. The general form of this information is:

PEPSEQ A=n sequence
or
PEPSEQ C=n sequence

A and C indicate acyclic and cyclic respectively.

n is the number of amino-acid residues.

Sequence indicates which amino-acid residues are involved and how they are linked.

A few examples will illustrate the principal features of the coding scheme:

(a)
PEPSEQ A=1 ALA

There is only one residue, viz. alanine

D-alanine, L-alanine, DL-alanine would all be represented by this notation.

Note that all simple amino-acids are designated A=1

(b)
PEPSEQ A=1 ALA*

An * immediately following the 3-letter code symbol indicates that the amino-acid has been modified in some way.

Thus ALA* would apply to, eg.

Note that the zwitterionic form of an amino-acid is not considered to be a modification of the parent amino-acid.

(c)
PEPSEQ A=3 PRO*-LEU-GLY

This indicates an acyclic peptide containing the 3 residues PRO* and LEU and GLY joined together by normal peptide links.

The symbol - indicates the normal peptide link.

(d)
PEPSEQ A=2 CYS,HCY

This indicates an acyclic peptide containing the two amino-acids cysteine and homocysteine joined together by a non-peptide link. In this example the two residues are linked through their sulfur atoms to form a disulfide bridge.

The symbol , indicates a non-peptide link.

(e)
PEPSEQ C=5 GLY-PRO-SER-ALA-VAL-

This indicates a cyclic peptide containing 5 amino-acid residues joined by normal peptide links.

The final - indicates that VAL is joined to GLY by a normal peptide link.

(f)
PEPSEQ C=3 PHE-PRO,UND,

This indicates a cyclic peptide containing 3 residues.

The final , indicates that UND is joined to PHE by a non-peptide link.

The 3-letter code symbol UND is used to indicate an undefined residue. Such residues cannot be described as modified [[alpha]]-amino-acids and examples are lactyl, [[alpha]]-hydroxyisovaleryl, [[beta]]-alanyl, etc.

Note, however, that UND is included in the count of residues when assigning the value of n.

(g)
PEPSEQ A=1 ARG A=1 GLU

This example illustrates the situation where two amino-acids, arginine and glutamic acid, co-crystallise.

The 3-letter code symbols for the 20 common [[alpha]]-amino-acids are:

Alanine       ALA  Glycine       GLY  Proline    PRO
Arginine      ARG  Histidine     HIS  Serine     SER
Asparagine    ASN  Isoleucine    ILE  Threonine  THR
Aspartic acid ASP  Leucine       LEU  Tryptophan TRP
Cysteine      CYS  Lysine        LYS  Tyrosine   TYR
Glutamine     GLN  Methionine    MET  Valine     VAL
Glutamic acid GLU  Phenylalanine PHE

A further 9 [[alpha]]-amino-acids are assigned 3-letter codes in the CSD:

a-Aminoisobutyric acid AIB  Norvaline         NVA
Homocysteine           HCY  Ornithine         ORN
Homoserine             HSE  Pyroglutamic acid GLP
Isovaline              IVA  Sarcosine         SAR
Norleucine             NLE

Other less common [[alpha]]-amino-acids are described as modifications of those listed above.

Examples are:

The 3-letter code UND is assigned to any undefined residue which cannot be described as a modified [[alpha]]-amino-acid, for example, lactyl, [[alpha]]-hydroxyisovaleryl, [[beta]]-alanyl, etc.

Basic QUEST Procedure

Searching for simple amino-acids and peptides can be accomplished as text searches using *SYNONYM but the search keyword *PEPTIDE offers more flexibility.

The form of *PEPTIDE search queries is illustrated by the following 2 examples:

(i)

T1  *PEPT
PSEQ  -PRO-AIB-
QUES  T1

This will register hits for:

PEPSEQ A=5 AIB*-PRO-AIB-ALA-AIB*
PEPSEQ C=4 CYS*-PRO-AIB-CYS*- 
etc.

The PSEQ record is used to define the search sequence.

(ii)

T2  *PEPT
PDEF  ABC= ILE  LEU
PSEQ  -GLY-ABC-GLY-
QUES  T2

This will register hits for:

PEPSEQ C=8 ASN-PRO*-THR*-TRP*-GLY-ILE-GLY-CYS*-
PEPSEQ C=6 GLY-LEU-GLY-GLY-LEU-GLY- 
etc.

In this example the PSEQ record is preceded by a PDEF record which allows the user to define combinations of the 3-letter code symbols.

Here ABC is defined to be either ILE or LEU.

PDEF can be compared to ELDEF in a connectivity search test packet.

PSEQ

The general form of this record can be exemplified thus:

PSEQ  %-ALA-ARG*-,CYS'-  A

It is composed of three components:

(i)
In this example the amino-acids are: The symbol ' is used to allow for unmodified or modified amino-acids.

Note also that the symbol ANY can be used to indicate any of the twenty-nine 3-letter code symbols.

(ii)
The linkage of residues is denoted by: Thus, in the above example, the ALA is either terminal or joined to its left by a normal peptide link.

ARG* is joined to CYS' by either a peptide link or a non-peptide link.

When alternative links are allowed the order of -,% is immaterial.

(iii)
The cyclicity of the sequence is indicated by: If A or C are not specified then hits can be registered for both acyclic and cyclic sequences.

PDEF

The general form of this record can be exemplified thus:

PDEF  ABC=  ILE+LEU
PDEF  XYZ=  ANY-ABC

More than one PDEF record can be present in a *PEPTIDE search question and their effect is cumulative.

In the above example,

In a PDEF record + can be replaced by a blank space.

The 3-letter code UND can not be used in a PDEF record.

The new symbol for the defined residue must start with an alphabetic character and can have a maximum of 3 alphanumeric characters.

Examples

Ex.1

T1  *PEPT
PSEQ  -PRO-AIB-
QUES T1

This registers hits for:

PEPSEQ A=5 AIB*-PRO-AIB-ALA-AIB*
PEPSEQ A=4 AIB*-PRO-AIB-PRO*
PEPSEQ C=4 CYS*-PRO-AIB-CYS*-   
etc.

If we had coded PSEQ -PRO-AIB- A then hits would be registered only for acyclic sequences.

Ex.2

T2  *PEPT
PSEQ  -PRO-GLY
QUES  T2

This registers a hit for:

PEPSEQ A=3 PRO*-PRO-GLY
but not for:
PEPSEQ A=4 VAL*-PRO-GLY-VAL

In PSEQ the GLY residue does not have any bond indication to its right.

The software then interprets this as a terminal GLY and the sequence to be acyclic.

Ex.3

T3  *PEPT
PSEQ  LEU-GLY-
QUES  T3

This registers a hit for:

PEPSEQ A=4 LEU-GLY-GLY-GLY
but not for:
PEPSEQ A=3 PRO*-LEU-GLY

In PSEQ the LEU residue does not have any bond indication to its left.

The software then interprets this as a terminal LEU and the sequence to be acyclic.

Also, in PSEQ the GLY residue is not terminal whereas in the A=3 entry the GLY is terminal.

Ex.4

T4  *PEPT
PSEQ  -,CYS*-PRO-AIB- C
QUES T4

This registers a hit for:

PEPSEQ C=4 CYS*-PRO-AIB-CYS*,
and would also for:
PEPSEQ C=4 CYS*-PRO-AIB-ALA-

Ex.5

T5  *PEPT
PSEQ  -,%CYS*%,-
QUES  T5

This register hits for:

PEPSEQ A=1 CYS*
PEPSEQ A=2 PRO*-CYS*
PEPSEQ C=2 CYS*-PRO-
PEPSEQ A=2 CYS,CYS*
PEPSEQ C=8 CYS*,UND,THR*,UND,CYS*,UND,THR*,UND,   
etc.

Ex.6

T6  *PEPT
PSEQ  SER'
QUES  T6

This registers hits for:

PEPSEQ A=1 SER
PEPSEQ A=1 SER* +

Ex.7

T7  *PEPT
PSEQ  ANY
QUES  T7

This registers hits for all unmodified simple amino-acids.

Ex.8

T8  *PEPT
PDEF  ABC=  ILE  LEU
PDEF  X2=  ANY-ABC
PSEQ  -GLY-X2-GLY-
QUES  T8

This registers a hit for:

PEPSEQ C=6 PRO-GLY-PRO-GLY-PRO-GLY-
but not for:
PEPSEQ C=6 GLY-LEU-GLY-GLY-LEU-GLY-

Ex.9

T9  *PEPT
PSEQ  -PHE-PHE-PRO-
T5  *PEPT
PSEQ  -PHE-VAL-PRO-
QUES T9.AND.T5

This registers a hit for:

PEPSEQ C=10 ALA-PHE-PHE-PRO-PRO-PHE-PHE-VAL-PRO-PRO-

Ex.10

T3  *PEPT
PSEQ  -GLY,-ANY,-GLY'-
QUES T3

This registers a hit for:

PEPSEQ C=8 PRO-GLY-PRO-GLY-PRO-GLY-PRO-GLY-
but not for:
PEPSEQ C=7 PRO*-GLY,UND,GLY*-GLY,PRO*-GLY,
This example illustrates that UND is ignored in determining whether or not an entry is a hit.

Related Bit Screens

The database bit screen is 67.

Ex.

SCRE  67

This query screen registers hits for all entries having a peptide sequence record.

Related Keywords

*SYNONYM

Ex.A

T1  *SYNO  PEPSEQ A=5
QUES  T1

This registers a hit for all acyclic sequences having 5 residues.

Ex.B

T2  *SYNO  PEPSEQ C=2
QUES  T2

This registers a hit for all cyclic sequences having 2 residues.

Ex.C

T3  *SYNO  PEPSEQ A=1
T4  *SYNO  A=1
QUES  T3.AND.T4

This registers a hit for all entries involving co-crystallisation of two simple amino-acids, unmodified or modified,for example:

PEPSEQ A=1 ARG A=1 GLU

Back to Table of Contents

Volume 2 Chapter 10 Search Menu PEPTIDE-SEQ.