Usage Scenarios for Information Sharing in a Data Grid


    The 7th Annual International Conference on Digital Government Research,
    San Diego, May 21-24, 2006.

Tutorial Presenters

    Reagan W. Moore, Richard Marciano, Arcot Rajasekar
    San Diego Supercomputer Center
    University of California at San Diego
    {moore, marciano, sekar}@sdsc.edu

Short Description

Data Grids support shared collections that may be distributed across multiple institutions. Data Grids decouple the management of the shared collections from the storage systems, making it possible to logically organize existing data into a new collection. Data Grids support seamless access to the information, but unlike the web, provide certificate-based authentication, authorization through access controls and tickets, and audit trails to keep track of usage. Moreover, discovery of relevant files are facilitated through associated metadata. Data grids provide rich formats for organizing metadata including attribute-value pairs, relational schemas, and semi-structured XML-based metadata. Moreover, one can store system-level metadata and provenance metadata to keep track of the evolution and by products of the collections that one is sharing.
Target and Goals
Any group that manages distributed data will profit from the tutorial, which will provide the basic information needed to design and implement a shared collection that spans multiple storage systems and administrative domains. The shared collections can be used to implement data grids for sharing data, digital libraries for publishing data, and persistent archives for preserving data. In our tutorial, we will detail real-life use case scenarios for information sharing. The usage scenarios are based on applications of the Storage Resource Broker, premier data grid software developed at the San Diego Supercomputer Center. In our tutorial, we will discuss information sharing for the National Archives (NARA), distributed state archives (Persistent Archive Testbed project with NHPRC), the Worldwide Universities Network data grid (shared collections spanning three continents), biomedical networks (NIH Bio-medical Informatics Research Network), astronomy collections (NSF National Virtual Observatory), seismic digital libraries (NSF Southern California Earthquake Center), real-time sensor data (NSF Real-time Observatories, Applications and Data management Network), etc. Each of these use cases provides unique insights into problems and solutions for data sharing. We will outline for each case the aims, problems encountered, solutions adapted and user experiences. Solutions to common problems that are encountered in the formation of shared collections will be presented. The tutorial will cover the following topics, at about 30 minutes per topic: Material describing the Storage Resource Broker is available at the URL http://www.sdsc.edu/srb/

Relevant papers include:

  1. Moore, R., M. Wan, A. Rajasekar, "Storage Resource Broker: Generic Software Infrastructure for Managing Globally Distributed Data", Proceedings of IEEE Conference on Globally Distributed Data, Sardinia, Italy, June 28, 2005.
  2. Moore, R., R. Marciano, "Technologies for Preservation", book chapter in "Managing Electronic Records", edited by Julie McLeod and Catherine Hare, Facet Publishing, UK, October 2005.
  3. Rajasekar, A., M. Wan, R. Moore, W. Schroeder, "Data Grid Federation", PDPTA 2004 - Special Session on New Trends in Distributed Data Access, June 2004.

Bios of Presenters

Reagan Moore: Dr. Reagan W. Moore is Director for Data Intensive Computing at the San Diego Supercomputer Center. He coordinates research efforts in development of massive data analysis systems, data grids, digital libraries, and persistent archives. Moore is the principal investigator for the development of the Storage Resource Broker data grid technology, which is used to support international shared collections. Moore leads SDSC involvement in projects with NARA on persistent archives, NSF on NSDL persistent archives, NASA Information Power Grid, NIH on Biomedical Informatics Research Network, and data management for the Library of Congress. His email address is moore@sdsc.edu

Richard Marciano: Dr. Richard Marciano heads the Sustainable Archives and Libraries Technology Group at the San Diego Supercomputer Center. He is the principal investigator on preservation projects including the NHPRC Persistent Archives Testbed, the NHPRC California Geospatial Records Preservation Grant, and collaborates on the NARA research prototype persistent archive. His research interests include mapping of collections to Graphical Information Systems, development of preservation environments, and analysis of preservation environment consistency properties. His email address is marciano@sdsc.edu.

Arcot Rajasekar: Dr. Arcot K. Rajasekar heads the Data Grid Technologies Group at the San Diego Supercomputer Center (SDSC). His major research interests include research and development of technologies for data grids, digital library systems and persistent archives. His current research activities at SDSC include development of the Storage Resource Broker for integrating distributed data repositories and digital library systems, and development of metadata catalog system for handling system-level and domain-specific meta data. His email address is sekar@sdsc.edu.