3.7 Searching PDB Information

From October 1993 release Protein Data Bank entries (PDB) have been included in the CSD. At present only bibliographic and sequence information is available, but in the future 3D coordinates will also be stored. All PDB entries have a refcode of the form 0ABC0n where 0ABC is the PDB ID code.

For further information refer to Oct. 1993 release notes, pp.10-25a.

Searching PDB Database

For VMS and UNIX packages the PDB entries are stored in a separate database. (For MIP all searches will act on both databases.) Therefore for VMS and UNIX all the searches carried out so far in this manual have not searched the PDB database.

To start-up QUEST3D to search only PDB database:

	UNIX:		quest  probname -db $CSDHOME/csd/pdb
To start-up QUEST3D searches of PDB and master CSD database:

	UNIX:		quest  probname -db $CSDHOME/csd/pdb Master
Then enter terminal type and menu in the normal way (see Section 3.1).

Information Displayed

Only the 1D display menu is of interest at present, as no coordinates are yet stored. The bibliographic information displayed is:

	COMP	Name of the macromolecular structure and its source
	QUAL	Contains a maximum of 7 information types - PDB ID code, date of
		deposition, classification of macromolecule, experimental technique
		(if any recorded in PDB), date at which this entry supersedes an 
		existing entry, name(s) of data contributors, resolution of study
	AUTH	Name(s) of authors for principal publication
	VOLU	Journal coden, volume, page and year of principal publication
	PDBS	Contains for each chain - Chain identifier (if recorded in the PDB),
		number of residues in chain, residue sequence for each chain.	
This information, except PDBS field (see below) can be searched for by using the keyword tests:

*COMPOUND, *QUALIFIER, *AUTHOR, *SURNAME, *CODEN, *VOLUME, *PAGE, *YEAR in the TEXT and NUMERIC sub-menus, (see Section 3.2 Exs.6,7).


This is similar to PEPTIDE-SEQUENCE command (see Section 3.2 Ex.10). At present there is only a command-line interface, but in the near future there will be a menu interface similar to PEPTIDE-SEQ sub-menu. For the command-line interface the test must take the format of:

The main keywords available are:

	PSEQ	Defines a sequence of residues and is very similar to PSEQ in PEPTIDE-SEQ
		sub-menu (see Ex.10). Can specify if residue is terminal or non-terminal 
		(% or -), and the residues can either be amino-acids (as defined in PEPTIDE-SEQ
		sub-menu) or modified amino-acids, nucleotides, carbohydrates and their groups
		(full list in Vol.3, Appendix 16). More than one PSEQ in a test is possible.
	PDEF	Defines a new set of residues (equivalent to PDEF in PEPTIDE-SEQ
	NRES	Specification of number of residues in a chain: NRES  .LO.  n   or   NRES
		n-m   (where LO = logical operator, n<m).
	NCHAin	Specification of number of chains in an entry: NCHA  .LO.  n   or   NCHA
	SAME	Applies when there is more than one search sequence in a test and defines
		that both sequences must be in the same chain.
	EXHA	Exhaustive search for sequence in an entry.
	FULL	Gives full display of sequence information for hit entries, command is of
		the form: FULL CHAIN or FULL ALL. (FULL CHAIN displays only complete sequences
		for chains in which the search sequence is located.)
The same keywords will be available in the future menu interface.


Detailed notes covering all the possible options are given in Vol.2, p.10-25a to 10-25l.

  1. Find PDB entries containing the amino-acid sequence -GLU-GLU-SER- where neither GLU nor SER are terminal.

    A hit entry is 01BI02. Glu-Glu-Ser is highlighted in the PDBS record.

  2. Find PDB entries containing the residue sequences -ALA-GLY-GLU- and PRO-ANY-PRO- in the same chain

