Press Archive

SDSC and Partners Successfully Process One Billion Files

Published 11/14/2007

For Immediate Release

Media Contacts:
Warren R. Froelich
SDSC Communications
858 822-3622 or

Jan Zverina
SDSC Communications
858 534-5111 or

The San Diego Supercomputer Center at the University of California, San Diego, in partnership with IBM, Data Direct Networks Inc., and Brocade, has successfully processed one billion files at speeds never before seen in the industry.

The Billion File Demonstration was formally announced this week in Reno, NV, at SC07, a leading international conference on high-performance computing, networking, storage and analysis.

The demonstration was completed using a single instance of IBM's General Parallel File System (GPFS) and DataDirect Networks' S2A9550 Storage System. In preparation for the demonstration, IBM assembled a GPFS cluster at SDSC using 17 eight-way cluster members.

Clustering is the practice of connecting multiple processors or servers to handle complex workloads as a single "virtual" computing resource. It is now being used increasingly as a significantly faster and more cost-effective way to provide high-performance, high availability computing for a wide variety of applications.

The Billion File demonstration successfully proved that the GPFS cluster was able to efficiently process metadata -- vital descriptive information about the files -- allowing more than a billion files to be scanned. The test also proved that candidates for migration could be identified and moved to SDSC's HPSS (High Performance Storage System) tape multiple times daily.

In total, the test used 100 tapes, each with 700 gigabytes capacity. In addition, the interface between GPFS and HPSS at SDSC successfully used the ILM (Information Lifecycle Management) policy manager to transfer data from GPFS to HPSS.

"SDSC has long been a leader in data-intensive computing and storage, and the center's experience with both GPFS and HPSS made it the natural location for this groundbreaking achievement," said Patricia Kovatch, manager of the Allocated Systems Group with SDSC.

SDSC has a GPFS installation with about 750 terabytes of disk, giving researchers easier access to data no matter where it is located. GPFS, which provides concurrent access at lightning speed to multiple disk drives and storage devices, fulfills a key requirement of powerful business intelligence and scientific computing applications that analyze vast quantities of often unstructured information, which may include video, audio, books, transactions, reports and presentations.

In total, SDSC has 25 petabytes of archival tape capacity - some 1,000 times the digital text equivalent of the printed collection of the Library of Congress. The HPSS is a flexible, performance-oriented mass storage system that was developed to address the high performance computing (HPC) hierarchical storage needs of multiple U. S. Department of Energy programs and to make this technology available to the HPC community.

HPSS, which is funded by IBM and the HPC community, is the result of a 15-year collaboration between IBM, Lawrence Livermore National Labs, Los Alamos National Laboratory, Lawrence Berkeley National Laboratory, Sandia National Laboratories and Oak Ridge National Laboratory. HPSS is licensed to other users and supported by IBM under an agreement between IBM and the United States Department of Energy.

DataDirect Networks, headquartered in Chatsworth, CA, is a leading provider of scalable storage systems for performance and capacity driven applications. Brocade, based in San Jose, CA, is a leading provider of data center networking solutions that help organizations connect, share, and manage their information.