SDSC to Lead $8.4 Million Collaboration to Establish Distributed Data Archives

Published 06/26/1996

San Diego, CA--A vision of data outposts across the nation's information frontier--archives for governmental, scientific, and corporate data--is the driving force behind an SDSC-led collaboration that has been awarded an $8.4 million contract by the Defense Advanced Research Projects Agency (DARPA).

The Distributed Object Computation Testbed (DOCT) will create a testbed system for handling complex documents on geographically distributed data archives and computing platforms. In particular, the DOCT project will focus on the needs of the U.S. Patent and Trademark Office (USPTO) for handling the complex documents created in filing and updating patent applications.

"Electronic patent filing encompasses nearly all the complexities we could possibly face in dealing with distributed documents," said Reagan Moore, SDSC associate director for enabling technologies and DOCT's principal investigator. "This project will integrate archival storage, databases, object computation systems, and data handling systems in a heterogeneous environment."

The technologies should also apply to the information needs of other agencies, such as the National Science Foundation, the National Institutes of Health, the Nuclear Regulatory Commission, the Environmental Protection Agency (EPA), the Department of Energy, and the Department of Defense. For example, the project will couple environmental regulations with technical data sets for San Diego Bay and Chesapeake Bay.

To support data-intensive applications on widely distributed data sources, DOCT will need replicated archives, redundant communication paths, and database technology to access the data. In addition, the USPTO requires all this in a fault-tolerant, secure environment.

Electronically filed patents might include Virtual Reality Modeling Language (VRML) descriptions of manufacturing processes, digital images, text, chemical formulas in Chemical eXchange Format (CXF), mathematical formulas, and other elements. Furthermore, both paper and electronic documents must co-exist in the face of updates and amendments, and critical to the implementation, USPTO's existing Messenger and CSIR legacy databases must be migrated to the more advanced DOCT.

Co-principal investigator for the project is Richard L. Klobuchar, vice president for corporate development at Science Applications International Corporation (SAIC). SAIC, a DOCT team member, will operate the Washington-area Metacomputing Site. SAIC's draft standard for USPTO electronic filing will be prototyped in the DOCT efforts. SAIC's optical power server will also be enhanced for scanning technical documents.

Other DOCT team members are the California Institute of Technology's Center for Advanced Computing, which will serve as a geographically remote mirror site for the data archives; the National Center for Supercomputing Applications (NCSA), which will provide additional supercomputing resources; the University of Virginia, whose Legion system will provide object-oriented support; Old Dominion University, whose Center for Coastal Physical Oceanography will provide extensive environmental data on Chesapeake Bay; the University of California, San Diego, which will collaborate on fault tolerance and distributed computing; and Open Text Corporation, which will provide full-text indexing and retrieval software.

The testbed will be built from existing hardware systems: supercomputers at SDSC and NCSA; archival storage systems at SDSC and Caltech; and the vBNS and AAI national networks through NSF, ARPA, and the Naval Command, Control and Ocean Surveillance Center in San Diego. For software, DOCT team members will develop object-relational database interfaces to the High Performance Storage System, text retrieval software, and complex and distributed document support.

SDSC, a national laboratory for computational science and engineering, is sponsored by NSF, other federal agencies, the State and University of California, and private organizations; is affiliated with the University of California, San Diego; and is administered by General Atomics. For more information, see SDSC's Web site ( or contact Ann Redelfs,, 619-534-5032.