|GRID COMPUTING | Contents | Next|
Building an Information Power Grid
Fran Berman, Andrew Chien
UC San Diego
University of Chicago, Argonne National Laboratory
University of Virginia
NASA Ames Research Center, Lawrence Berkeley National Laboratory
University of Southern California
University of Tennessee
ome NASA scientists use satellite data to learn about the Earth, while others study the cosmos from space. Both types of projects are data-intensive, with data coming from satellites, telescopes, and other instruments. However, data are often stored in one location, such as the Goddard Space Flight Center in Maryland, but processed at another, such as the Ames Research Center in California. Consequently, NASA requires an intelligent data-handling system to identify and store data based on discipline-specific attributes and access and supply the correct data to applications that need it. NPACI researchers, in partnership with scientists from the National Computational Science Alliance, have lent their experience to help build the NASA Information Power Grid (IPG), a system to acquire, move, and process data from and on far-flung resources.
"NASA scientists are developing next-generation technologies predicated on the services and resources available in the IPG," said William Johnston, manager of the project within NASA Ames. "The hardier the infrastructure, the more innovative and useful the NASA applications can be."
Underlying IPG is a computational grid anchored by Globus, led by Carl Kesselman of the University of Southern California and Ian Foster of the University of Chicago and Argonne National Laboratory. To promote interoperability between grid environments and minimize costs, the Legion project headed by Andrew Grimshaw of the University of Virginia is incorporating services from the Globus environment--for example, authentication support through the Grid Security Infrastructure (GSI). The grid infrastructure also allows an application to use both NPACI and Alliance resources, regardless of their physical location.
Cluster computing is an emerging technology for supporting scientific applications. In a collaboration headed by Andrew Chien of UC San Diego, the grid technologies are being adapted to interface with an NT cluster at UC San Diego and a Linux cluster at the University of Michigan.
To provide user-level middleware for the IPG, application-level schedulers (AppLeS) are being developed for parameter sweep applications, a common application class for NASA. The AppLeS parameter sweep template (APST), being developed by Fran Berman and AppLeS Project Scientist Henri Casanova at UC San Diego will help NASA researchers conduct large-scale simulations on the IPG.
Also on the IPG, the Network Weather Service (NWS), developed by Rich Wolski at the University of Tennessee, takes the pulse of the shared, heterogeneous network links. The NWS monitors and predicts network load and availability and can be used by any IPG user. The NWS can provide an additional Globus service and provides a useful system predictor for AppLeS schedulers.
|Top| Contents | Next|
Of course, applications require data. To manage data movement between stored data and applications, SDSC researchers participating in the Data-Intensive Computing Environments thrust area, led by SDSC associate director Reagan Moore, are developing components of the grid's data-handling system. The SDSC Storage Resource Broker (SRB), which has been deployed at NPACI sites and NASA Ames, supports collection-based access to data across distributed storage systems.
When an application requires data, the SDSC SRB queries its Metadata Catalog to discover the correct data set, and then supports UNIX-style read and write operations on the data, wherever it is located. Once a computation has been completed, the SDSC SRB can be used to publish the new data into a collection, for use by other researchers. The SDSC SRB has also been integrated with the GSI for authentication of users.
The NASA IPG project recently completed a set of demonstrations that met some NASA level one milestones. Global hydrology data for one demonstration came from a satellite that completes 14 Earth orbits a day, acquiring in each orbit six data sets that must be stored and processed. A subset of this data was distributed across four sites--Caltech, SDSC, Washington University, and NASA Ames. The IPG then supported data mining on hundreds of data sets from this collection through the SDSC SRB.
Another IPG demonstration was staged in the NPACI booth at SC99 in Portland, Oregon. The SDSC SRB moved gigabyte-sized data sets between SDSC and Portland in 25 seconds. The transfer across the vBNS, reached a top rate of 39.7 MB per second by striping the transfer over four I/O channels. The bandwidth was limited by the 40-MBps backplane speed of the server used to read the remote data.
Throughout 2000, work will continue on three key elements of the IPG project: enhancing the Globus infrastructure; enhancing the data-intensive and security infrastructure; and conducting research to increase the usability and performance of the IPG for NASA applications.
"Since our work with IPG builds on the work we're undertaking for NPACI," Moore said, "the whole project is good for both NASA scientists and NPACI researchers." --AF
|Top| Contents | Next|
|Top| Contents | Next|