| |
Home
> CSSS Seminars > John Tate |
| |
John Tate
European Bioinformatics Institute jtate@ebi.ac.uk http://www.ebi.ac.uk/msd
The Macromolecular Structure Database
Since 1979 the Protein Data Bank (PDB) has been the central repository of macromolecular structure data, but the present flat file archive is incapable of supporting the complex tools that are required for drug discovery, molecular medicine and bioinformatics. In order to fully exploit the volume of structural data that will soon become available, new technologies must be employed. The Macromolecular Structure Database (MSD) group has developed a relational database for storing, validating, searching and retrieving the complex structural information in the PDB. A comprehensive cleaning procedure is under way, to ensure data uniformity across the whole archive, and an extensive set of derived properties and goodness-of-fit indicators will be added. The MSD includes links to many other bioinformatics databases including InterPro, GO, SwissProt, SCOP, CATH, PFAM and PROSITE.
We have developed a flexible search system which exposes the power of the relational database without requiring the user to understand the complexity of the underlying schema. This search system provides a single access point for the MSD and associated databases, allowing searches on a wide range of bio-molecular properties, such as sequence, structure similarity and active site conformation. The database, and several network based-services that are built on top of it, will be available by the end of April 2003. This talk will describe the basic design of the database, outline some of the improvements in data quality that it provides, and will describe the services and search systems that are currently available and planned for the near future.
|