Skip to content

news

Home > News Center > SDSC Headlines

11.07.03
SDSC Releases Zone SRB, Version 3.0 of Popular Data Management Middleware

Helps Larger Teams of Scientists Efficiently Share Growing Data Collections

Collaborating scientists spread around the globe will now have an easier time efficiently sharing the rapidly growing data collections that are essential to today's science.

The San Diego Supercomputer Center (SDSC) at UC San Diego has released the beta version of Zone SRB, version 3.0 of the popular SDSC Storage Resource Broker (SRB) middleware package, which will give SRB users increased performance and capabilities to conduct large scale collaborations. New features allow users to access descriptive metadata and share data between different SRB installations, or "zones." SDSC SRB version 3.0, the user manual, and release notes are available to the research community for download as a source distribution at http://www.npaci.edu/DICE/SRB/.

"SDSC SRB Version 3.0 represents a major advance in data handling capabilities and performance," said Reagan Moore, co-director of the Data and Knowledge Systems program at SDSC. "Being able to federate individual SRB installations so that they exchange data and metadata allows SRB data collections to scale up to much larger sizes, with flexible collaborations between users spread around the globe."

The SDSC SRB is proven production software that meets scientists' urgent needs to integrate, manage, and access explosively growing data collections. Currently, there are more than 3,600 registered users of the SDSC SRB. Developed by Reagan Moore, Arcot Rajasekar, Michael Wan, Wayne Schroeder, and the SRB team in SDSC's Data and Knowledge Systems (DAKS) program, the SDSC SRB is being used in projects as diverse as helping astronomers integrate multi-terabyte image collections in the NSF's National Virtual Observatory, enabling NIH-funded neuroscientists to share brain data across the country in the Biomedical Informatics Research Network, and developing persistent archives for long-term preservation of government records for the National Archives and Records Administration. Still other SRB applications include NASA, which is using the SRB to manage massive collections of satellite data; the Science Environment for Ecological Knowledge, a large NSF Information Technology Research project, which is using the SRB to integrate ecological data collections; and UCSD's ROADNet, an NSF ITR project employing the SRB in conjunction with object ring buffers to bring together in real time diverse types of environmental data from networked sensors.

Previous to Version 3.0, each SRB installation has made use of a single Metadata Catalog (MCAT) database containing all of the information, or metadata, describing the entire data collection. No matter how large the collection grew and how many users it had, it was managed by a single MCAT, located in one place. The advantage of the MCAT architecture is that it implements a global name space, giving the SRB the power to build unified "virtual collections" in which users can transparently access data Users can transparently access data no matter where the data sets are located or what hardware or software they are stored on. from any location and any type of storage resource, all from within this single name space. However, having only a single MCAT for each collection becomes limiting when data collections and collaborations grow large, with many users spread across the nation and world, some of them far from the single MCAT. Federation of multiple independent MCATs makes it possible for each site to manage their own data using a local MCAT database for greater speed and autonomy, while still being able to access remote data under the control of other SRB zones and MCATs.

The widespread adoption of the SRB, with an increasing number of users working in big collaborations, has driven the need for a federated SRB, where multiple MCAT systems can share collection information. The capability to connect MCATs extends the power of SRB collaboration support with faster access to global MCAT collection catalogs, and at the same time maintains local autonomy.

Zone SRB can be looked at as a new layer that connects any previously stand-alone SRB installations that choose to federate (provided they upgrade to version 3.0). That is, each individual SRB installation can now be seen as an SRB zone. The idea of version 3.0 is to enable the zones to interact for speed and scalability, sharing as much or little of their separate collection information as they wish in broad-area collaborations. In version 3.0, the SRB researchers address the challenges of identifying and creating the mechanisms needed to maintain the consistency of metadata and manage access controls across the interconnected zones.

"Because there are now so many experienced SRB users, they give us valuable guidance on which new capabilities will help them do real science," Arcot Rajasekar, director of SDSC's DAKS Data Grids Technologies group. "Zone SRB answers their requests for a scalable, 'fail-safe,' and synchronizable SRB system, and opens the door to a rich array of new usage models, from Occasional Interchange Zones, Replicated Catalog and/or Data Zones, and Master-Slave Zones with identical or derived data products replicated down the chain, to Nomadic Zones and 'SRB in a Box,' useful in such applications as oceanographic voyages, and more." Among the projects that have driven the development of Zone SRB are the National Archives and Records Administration (NARA), the NIH Biomedical Informatics Research Network (BIRN), the DOE Particle Physics Data Grid and the high energy physics experiment, BaBar, the UK e-Science project, the NSF SIOExplorer, and the NSF National Partnership for Advanced Computational Infrastructure.

SDSC SRB Version 3.0 is supported on a wide variety of systems. The MCAT Metadata Catalog runs on Oracle, IBM DB2, Sybase, and Postgres. The SRB Server runs on Microsoft Windows NT, 2000,and XP, as well as most UNIX platforms including Linux and MacOS X, and supports data in file systems, tape stores, and databases. Once a collection is created, users can transparently replicate, manage, and control the collection across geographically distributed locations through any of several interfaces: a command-line interface, and graphical user interfaces including inQ — short for inQuisitor — a Windows-Explorer-like graphical user interface, and the Web interface, MySRB, which enables user access to collections without a client. The beta form of Zone SRB supports the command line interface. inQ and MySRB will be added soon.

A Quick Tour of the SDSC SRB

The comprehensive SDSC SRB system provides users with the capabilities of a distributed file system, a data grid collaboration environment, a digital library publication environment, and a persistent archive preservation environment. As a distributed system, each individual SRB zone consists of three major components: the Metadata Catalog (MCAT) database; one or more SRB Servers for data "brokered" or administered through the MCAT database; and SRB clients that provide an interface to the data collections, as well as a Web interface.

The SRB organizes descriptive metadata about the files in a Metadata Catalog, or MCAT, to help researchers assemble, search, access, and manage collections of data. Because it creates a single virtual representation that encompasses many data storage resources, the SRB allows users to treat multiple storage devices, which can be widely distributed, as if they were a single storage resource. By providing a global name space that spans all the separate resources, SRB middleware makes differences such as location, protocols, and authentication transparent to users.

Because the MCAT is implemented with relational database technology, it has been extended to include capabilities beyond those of traditional file systems and provides the sophisticated access control necessary for today's collaborations, proxy operations for such things as delivering subsets of a collection, and knowledge discovery based on system- and application-level metadata, including extensive user-defined metadata. SRB's sophisticated MCAT lets users take advantage of integrated metadata management to discover new knowledge by browsing and querying collections, as well as to "repurpose" or restructure collections, providing new "views" for novel uses.

SRB collections are highly scalable, both in size and in distribution across remote sites. Growing SRB collections at SDSC currently support more than 81 terabytes of data in 15 million files.

In addition to Moore, Wan, and Shroeder, DAKS researchers on the SDSC SRB team, led by Arcot Rajasekar, include Sheau-Yen Chen, Charles Cowart, Lucas Gilbert, Arun Jagatheesan, George Kremenek, Roman Olschanowsky, Vicky Rowley, Antoine de Torcy, Tim Warnock, and Bing Zhu. —Paul Tooby

Related Links

SDSC SRB version 3.0 - http://www.npaci.edu/DICE/SRB/

SDSC Data and Knowledge Systems (DAKS) program - http://www.sdsc.edu/daks/

User Guide for SRB - http://www.npaci.edu/SRB/

Information on inQ, Windows SRB interface - http://www.npaci.edu/dice/srb/inQ/inQ.html

Information on MySRB, Web SRB interface -http://www.npaci.edu/dice/srb/mySRB/mySRB.html


back to top