Rex Graham, San Diego Supercomputer Center,
"These growing capabilities are enabling new science to be done in new ways," said Reagan Moore, co-director of SDSC's Data and Knowledge Systems (DAKS) program and lead of the Data-Intensive Computing Environments (DICE) group thrust in the National Partnership for Advanced Computational Infrastructure (NPACI). "Working closely with the computational science community, we have incorporated user requests over the last two years into version 2.0 so that the SRB addresses a growing array of scientific needs."
Research groups in a variety of fields are using or planning to use the SDSC SRB software to integrate, manage, and access explosively growing data collections. Developed by Moore, Arcot Rajasekar, Michael Wan, and colleagues in SDSC's DAKS program, the SRB is being used in projects as diverse as helping astronomers integrate multi-terabyte image collections in the National Science Foundation's National Virtual Observatory to enabling NIH-funded neuroscientists to share brain data across the country in the Biomedical Informatics Research Network. The National Archives and Records Administration is using the software to develop persistent archives, NASA is using it to merge massive sets of satellite data, and other groups are employing the SRB to bring together diverse types of environmental data.
In general, the SRB offers many advantages over traditional file systems. What appears as a single collection to the user of the software is actually a virtual collection consisting of digital entities scattered across distributed, heterogeneous storage resources, including file systems, archives, and databases. The SRB makes all these differences transparent to users. It negotiates all protocols and access permissions across the multiple sites so that users can access data based on familiar, user-defined names of data attributes. This frees them from having to keep track of such complexities as file names, physical locations, protocols, and security arrangements. The SRB not only supports more efficient science at the researcher level, but it also enables rapid collaborations never before possible.
SRB collections are highly scalable, both in size and in distribution across remote sites. For example, SRB collections at SDSC support more than 6.5 million files and 40 terabytes of data. There are currently more than 200 registered users of the SDSC SRB at more than 50 sites.
The principal new features in version 2.0 of the SRB (the previous version was 1.1.8) include:
Server-initiated, multi-threaded parallel data transfers, which give the new version faster and more robust transfers of very large data sets.
Revamping the SRB Administration GUI into an easy-to-use Java-based client-side tool that assists in the management of the SRB.
MCAT port for Sybase and Postgres databases.
Improved MCAT metadata catalog functions for such things as creating and deleting users and resources, and parallel bulk loading of metadata into the MCAT, yielding speeds of more than 400 files per second, a factor of 50 faster loading for collections that contain large numbers of small files.
Its own Mass Storage System (MSS), which uses a new type of "compound resource" to manage connectivity to tape silos and tape devices, using the SRB to provide caching and other functionality, without requiring a proprietary tape management system. The MSS enables users to economically build their own mass storage system in which data migrate automatically between cache and tape.
"One of the most important new features is the server-driven parallel data transfers," said Rajasekar, leader of the DAKS SRB development team. "By incorporating automatic parallel data transfers with up to five threads in a way that is transparent to users, the software optimizes and matches the transfer to the network and server export rates, resulting in transfers that are more robust and two or three times faster." Early tests have already shown transfer rates at 85 percent of network capability.
DAKS researchers on the SDSC SRB project led by Rajasekar include: Wan, Sheau-Yen Chen, Charles Cowart, Lucas Gilbert, Arun Jagatheesan, George Kremenek, Roman Olschanowsky, Vicky Rowley, Wayne Schroeder, and Bing Zhu.
PROJECTS USING THE SDSC SRB
Ecology and Environmental Sciences