FOR IMMEDIATE RELEASE
Immense collections of digital data are now ushering in a new era in science and engineering, with dramatic results:
Such digital data collections can exceed 100 terabytes in size (one terabyte is 1,000 gigabytes), many times larger than the digital text size of all the books in the Library of Congress. But managing and using this explosion of data is easier said than done.
Long a leader in data technologies, the San Diego Supercomputer Center (SDSC) at UC San Diego has released version 0.5 of iRODS, the open-source Integrated Rule-Oriented Data System, which represents a new approach to distributed data management. The iRODS data grid system incorporates and goes beyond the experience gained during nearly 10 years of applying the SDSC Storage Resource Broker (SRB) technology in data grids, digital libraries, persistent archives, and real-time data systems. More information about iRODS, including software downloads, documentation, and support is available at http://irods.sdsc.edu.
"Data management is not just one task, it's a large number of complex, interrelated tasks," said Arcot Rajasekar, director of SDSC's Data Grids Technologies group. "And iRODS is a powerful tool that gives users a full, end-to-end solution."
With iRODS, researchers will be able to ingest digital data from real-time sensor networks like those used in oceanography, or capture massive data output from large-scale simulations of turbulence. They can extract descriptive metadata, manage their data, move it efficiently, share it securely with collaborators, publish it in digital libraries and, finally, archive it for long-term preservation.
Current distributed data management systems apply crucial management policies and internal checks of consistency directly within the software. If these constraints change over time, however, the software itself must be rewritten. With more than 100 variables to describe the state of a data collection, making these changes requires great care, seriously hampering the ability of a system to flexibly scale up to manage the hundreds of millions of files, petabytes of data, and dynamic collaborations encountered in today's science and engineering research.
iRODS introduces a new approach to data management based on automatic application of rules, or "constraint virtualization." At the core of iRODS, a Rule Engine interprets the rules that decide how the system will respond to various requests and conditions. With iRODS, users can make changes such as granting higher authorization for a particular collection, using stricter authorization, specifying how many copies will be kept and where they will be stored, or treating a particular data resource differently, all through user-defined iRODS controls rather than having to hard code changes in the software.
"Each community has different management policies or sets of assertions they make about their collections," said Reagan Moore, director of SDSC's Data-Intensive Computing Environments (DICE) group. "iRODS characterizes these policies in terms of rules and information that describe the state of the collection, and then these rules are applied automatically."
By using the rules, users are shielded from having to deal directly with the intricate technologies of the system, making it far easier for administrators to configure their sites in customized ways, and making possible a data management system that can flexibly scale up to handle the multi-site, petabyte collections of today.
The initial version supports Solaris, Linux, and Mac OSX. iRODS is being released as open source software under a BSD license. In an open source effort, users collaborate in developing, maintaining, and peer-reviewing a common software tool. The source code is made available to the public, and the user community cooperates on its development, identifying and fixing bugs and adding new features that speed the development of the software and benefit the whole community. Already groups in the US and internationally in the UK and other countries are expressing interest in developing new capabilities for iRODS.
SDSC will continue to support the widely-used SDSC SRB system into the future, and will also provide tools for existing users to migrate from SDSC's SRB to iRODS if they wish.
For more than two decades, the San Diego Supercomputer Center (SDSC) has enabled breakthrough data-driven and computational science and engineering discoveries through the innovation and provision of information infrastructure, technologies and interdisciplinary expertise. A key resource to academia and industry, SDSC is an international leader in Data Cyberinfrastructure and computational science, and serves as a national data repository to nearly 100 public and private data collections. SDSC is an Organized Research Unit and integral part of the University of California, San Diego and one of the founding sites of NSF's TeraGrid. For more information, see www.sdsc.edu.