Rutgers, UC San Diego, and NIST Win Grant to Manage Protein Data Bank

Published 10/01/1998

NEW BRUNSWICK/PISCATAWAY, N.J. - The Research Collaboratory for Structural Bioinformatics (RCSB), a consortium composed of Rutgers, the State University of New Jersey; the University of California at San Diego; and the National Institute of Standards and Technology (NIST), has received a $10 million, five-year award from the National Science Foundation (NSF), the Department of Energy and two units of the National Institutes of Health: the National Institute of General Medical Sciences and the National Library of Medicine. The award will enable the RCSB to operate and significantly extend the capabilities of the Protein Data Bank (PDB), a critical tool for unlocking the secrets of biological systems in pharmaceutical and medical research.

The RCSB's Protein Data Bank, in addition to being a repository of data, will now provide mechanisms for researchers to understand biological function through investigation of sequence and molecular structure. Previously maintained by Brookhaven National Laboratory, the PDB's change in management will be transparent and seamless as it moves to the RCSB with the addition of new capabilities for searching and for improving the consistency and content of existing and future depositions.

"The RCSB proposal was evaluated using a standard merit review procedure that included a site visit and advisory panel," said Gerald Selzer, program director in the NSF Division of Biological Infrastructure. "Experts in x-ray crystallography, other areas of structural and computational biology, computer science, and database technology were asked to participate in the evaluation of the proposal.

"The funding decision was reached on the basis of the comments of these experts," Selzer continued. "Reviewers and agency staff alike were impressed with the technical merit of the plans for operating the database, with the detailed scheme for management across the three participating sites, and the depth of technical and managerial expertise that the RCSB will bring to this important task."

The transfer of the Protein Data Bank from Brookhaven to the RCSB will result in several improvements, including a higher, faster throughput; a greater number of query capabilities, including more complex and more accurate queries; a uniform archive; a dynamic cross-link to other databases; and the availability of structure validation and structure and sequence neighboring. The RCSB Protein Data Bank will be scalable and will provide rapid and reliable data processing. Validation reports will be available to depositors. RCSB has already developed tools, demonstrated on the group's Web site, that allow biologists to perform queries that search several databases at the same time.

The PDB data will be stored and mirrored at all three RCSB sites. The three institutions have divided their responsibilities according to their expertise in data deposition and processing, database query and integration, and database uniformity. The PDB will also be mirrored at key sites worldwide, notably in Europe and the Pacific Rim.

The experience of the RCSB members in structure data processing and analysis covers data validation, data modeling, database development, query languages and visualization tool development. The group has developed and currently maintains 11 publicly available structural biology databases. This combined expertise lays the foundation for the future.

Principal investigator Helen Berman, a Rutgers professor of chemistry, was part of the original team that developed the Protein Data Bank at Brookhaven in 1971. She is responsible for the development of the Nucleic Acid Database (NDB) at Rutgers, which assembles and distributes structural information about nucleic acids and contains an atlas, an archive and a sophisticated search engine to access the data. She has drawn upon her experience in developing that database to help design the new capabilities of the PDB.

"Our vision is that the PDB will enable scientists worldwide to gain a greater understanding of structure-function relationships in biological systems," Berman said. "We are capable of doing this because of the unique infrastructure the RCSB offers in terms of personnel, hardware, software and network infrastructure."

At the San Diego Supercomputer Center (SDSC) at UCSD, principal scientist Phil Bourne leads a group of scientists in a Biological Data Representation and Query initiative. The group has developed locally a number of databases containing derived data on protein structures and maintains a mirror site of Berman's NDB. Recent work produced a database of structure comparisons for the more than 8,000 structures in the PDB.

The NIST effort will be led by Gary L. Gilliland, chief of the Biotechnology Division in NIST's Chemical Science and Technology Laboratory. He has maintained an active research program in protein crystallography for more than 20 years. Also, he played a key role in establishing the Center for Advanced Research in Biotechnology (CARB), a joint effort of NIST and the University of Maryland Biotechnology Institute, where he served as the Associate Director until 1996. NIST will establish data uniformity of the old and new structures in order to improve the accessibility and reliability of queries.

The PDB effort will also take advantage of computational infrastructure development by the National Partnership for Advanced Computational Infrastructure (NPACI), led by UCSD and SDSC and in which all three RCSB sites participate.

"This award exemplifies the success of NPACI at building the computational infrastructure to accelerate multi-institutional collaboration," said Sid Karin, NPACI and SDSC director. "Such successes attest to the critical national importance of the PACI program and its ability to touch all areas of NSF's portfolio. Within NPACI, Molecular Science is one of our key applications areas, serving to drive technology development as well as benefit from it. The recognition of our contributions to computational infrastructure by the peer review process leading to the PDB award is a clear endorsement of our plans."

The three-dimensional structures of proteins and other biological macromolecules are helping to unlock the secrets of how biological systems work. They hold significant promise for the pharmaceutical and biotechnology industries in the search for effective new drugs with few or no side effects and the effort to understand the mystery of human disease.

Medical researchers also envision gaining new insights on the causes, effects and treatment of many diseases by understanding the biological macromolecule structure and function. Very precise and accurate information on the atomic structure of complex biological macromolecules is needed to unlock their disease-fighting potential.

The RCSB's Protein Data Bank will give researchers access from a single source to more information about biological structures than ever before. Via the World Wide Web, database users in academia, government and industry will be able to access archival services and formulate complex queries that will provide reliable answers to further their research efforts.

