Press Archive

IBM and San Diego Supercomputer Center to Collaborate on Massive Data Analysis Testbed

Published 12/05/1995

IBM and the San Diego Supercomputer Center (SDSC) have announced they will collaborate on the development of a high performance solution for researchers that will provide the compute power, storage capacity and software necessary to quickly analyze immense amounts of scientific data. This Massive Data Analysis Testbed (MDAT) will support scientific data mining on terabyte (TB) data sets accessed from a petabyte archive at gigabyte (GB) speeds. A petabyte is 1,000 trillion bytes‹equal to the contents of one billion books of average size (400 pages).

"The academic research community is confronted by the need to analyze very large data sets, for example, information on sound waves traveling through the world oceans or the structures of molecules," said Reagan Moore, Associate Director for Enabling Technologies at the San Diego Supercomputer Center. "Rapid access to enormous data sets, such as we are designing here, will turn scientific databases into rich troves of information."

The key hardware components of the testbed are: a 17-processor IBM RISC System/6000* Scalable POWERparallel Systems (RS/6000 SP)* high performance parallel processing computer; a 60-TB IBM 3494 Tape Library Dataserver using six high-capacity 3590 Magstar tape drives; and a 500-GB IBM Serial Storage Architecture (SSA) Disk Subsystem. The strategic enabling software technologies are IBM's DB2 Parallel Edition relational database and the High Performance Storage System (HPSS). HPSS technology is being developed by four Department of Energy laboratories (Lawrence Livermore, Los Alamos, Oak Ridge and Sandia), Cornell University, NASA Langley Research Center and IBM.

"This major step forward in data-intensive computing will enable the unconstrained flow of huge volumes of archived data, which will provide the tools researchers need to advance the frontiers of science," said Irving Wladawsky-Berger, general manager of the IBM RISC System/6000 Division. "The advances we make here will also transfer nicely to the commercial sector, as similar needs exist for high-speed data mining at credit card companies, hospitals and insurance agencies, to name a few."

IBM and the San Diego Supercomputer Center will collaborate on an effort to integrate DB2 Parallel Edition relational technology with the advanced storage management functions of HPSS. The focus of the project will be to develop a robust set of DB2 Parallel Edition user-defined functions (operations defined by the computer operator) that will communicate with the established application programming interfaces of HPSS. By integrating these components, a scalable system can be created in which processing cycles, data access rates and storage capacity can be increased proportionally as the size of the archive grows. In support of this collaborative effort IBM has awarded to SDSC a Shared University Research (SUR) grant.

"IBM has long benefited from the rigorous testing its research partners, such as SDSC, conduct on its systems. Such collaborations as this one help IBM maintain its leadership in the high performance computing arena and raises the bar for performance and implementation," said Bob Steen, Director of IBM Scientific and Technical Systems and Solutions. "This collaboration will meet researchers' data processing needs through the development of a complete solution that seamlessly integrates parallel processing and relational database and hierarchical storage technologies."

HPSS

The High Performance Storage System (HPSS) is a scalable parallel storage system for highly parallel computers, traditional supercomputers and workstation clusters. Targeted for the high end of storage system and data management requirements, the HPSS software provides such functions as striping and parallel delivery of data into parallel compute systems, partial file caching, dynamic data hierarchies and support for network attached I/O devices. The advanced functions of HPSS are designed to accommodate storage capacities measured in petabytes, and to transfer data at rates of 100 megabytes per second and beyond. DB2 Parallel Edition relational database software provides client/server capability on the RS/6000 SP platform, static and dynamic SQL, and query segmentation capability for parallel execution.

SDSC

The San Diego Supercomputer Center, a national laboratory for computational science and engineering, is sponsored by the National Science Foundation, other federal agencies, the State and University of California, and private organizations; is affiliated with the University of California, San Diego; and is administered by General Atomics. For additional information, refer to SDSC's World Wide Web server at http://www.sdsc.edu/, or contact Ann Redelfs, SDSC, 619-534-5032, redelfs@sdsc.edu; Reagan Moore, SDSC, 619-534-5073, moore@sdsc.edu.

IBM RISC System/6000 Division

The RS/6000 platform, IBM's high-performance line of technical and commercial workstations and servers, offers customers the most extensive line of UNIX-based solutions available in the marketplace. From powerful, cost-efficient PowerPC-based workstations with industry-leading graphics capability and servers that can run businesses both small and large, to symmetric multi-processor and scalable, parallel computing systems for the most demanding of client/server or commercial on-line transaction processing applications, the RS/6000 offers unequaled performance and price-performance to customers.

IBM also produces world-class scalable parallel information and computing systems for commercial and scientific/technical customers. The Scalable POWERparallel Systems SP is the high end of the RS/6000 line.

All of these systems run AIX*, IBM's industry-leading UNIX** operating system that offers binary compatibility for the more than 10,000 applications available on the RS/6000. The RS/6000 is marketed worldwide through the IBM sales force and IBM Business Partners. IBM's RISC System/6000 Division is headquartered in Somers, NY.

IBM STSS

The IBM STSS organization consists of applied scientists and engineers, and delivers advanced computer solutions to customers involved with research and development. STSS focuses on scientific and technical end-users: scientists, engineers, academicians, and mathematicians employed in government research labs, academic institutions, and industry.


Contact:
Ann Redelfs
San Diego Supercomputer Center
619-534-5032
619-534-5077 (FAX)
redelfs@sdsc.edu

Jim Keller
IBM
914-766-3250
jkeller@vnet.ibm.com