Skip to content

SDSC Homepage SDSC Homepage San Diego Supercomputer Center Contact EnVision NPACI: A Leading Edge Site San Diego Supercomputer Center
 
 
Features
   
  Building a National Grid
from the Bottom Up
   
 

Building the NPACI Grid:
Integrating the
Human Infrastructure

   
  Discovering Knowledge in Massive Data Collections
   
  Pacific Rim Group Evolves
into International Model
of Collaboration
   
 
News
   
Tree of Life and Virtual Grid Development among Four ITR Awards to SDSC
   
NSF Middleware Projects Receive $9 Million
   
NSF Awards $1.2 Million to Extend PRAGMA Program
   
Teachers Bring Technology They Developed at SDSC to Their Classrooms
   
Texas Installs Lonestar Cluster
   
SDSC Education Department Efforts Recognized in SIGKids Awards
 
Building the NPACI Grid: Integrating the Human Infrastructure
 
John McGee, left, and Carl Kesselman of the University of Southern California’s Information Sciences Institute in Marina Del Rey.
 

by Mike Gannis

 

Over the past two years, NPACI partners have been coordinating their resources and software to create a next-generation cyberinfrastructure for science and engineering applications: the NPACI Grid. It consists of hardware, network, and data resources at four sites-the San Diego Supercomputer Center (SDSC), the Texas Advanced Computing Center (TACC), the University of Michigan, and soon at Caltech's Center for Advanced Computing Research-running an integrated set of grid middleware and advanced NPACI scientific and engineering applications. But one of the most important aspects of the NPACI Grid won't be found on any system architecture diagram-the people who set it up, keep it running, and use it for new science.

"We had a group of people who belong to various administrative domains who came together to solve common problems and provide this infrastructure," said John McGee of the University of Southern California's Information Sciences Institute (ISI). He was responsible for coordinating the development and deployment of the NPACI Grid. "The NPACI Grid is an implementation of middleware, application software, and processes and procedures to mitigate the challenges a scientist faces when trying to use heterogeneous resources from multiple organizations."

"The NPACI Grid is clever technology, but the real secret to making it work was the dedication by maintainers to build an administrative and technical bridge to users and system administrators across administrative domains," said Rich Wolski, associate professor of computer science at UC Santa Barbara and Grid Computing thrust lead for NPACI. "This wasn't what some of us expected to be the most important activity-it wasn't in our job descriptions."

The resources comprising the NPACI Grid belong to four different administrative domains, each with its own "culture"-rules, regulations, accounting procedures, personalities in charge, and ingrained way of doing things. This is unavoidable, since each site comprises unique resources that are needed for their original operational roles in serving the users of their institutions.

"We set out to build a grid from the bottom up," Wolski said. "The process not only gave the NPACI team vital experience in using cyberinfrastructure technologies in the real world, but we also learned that there's a new human component to cyberinfrastructure as a result of the multiple administrative domains and the multiple existing user communities we dealt with. The real job of the middleware and packaging tools is to facilitate coordination between these communities. The NPACI Grid is the infrastructure behind this coordination."

SAFEGUARDING USERS

"The NPACI Grid is a complex platform and it is just not possible to tell application users-biologists, physicist-to start doing things radically differently from one minute to the next," said Henri Casanova, director of the Grid Research and Innovation Laboratory (GRAIL) at SDSC. "The way to build and sustain a large user community on a platform like the NPACI Grid is to provide a way for users to smoothly transition from what they're doing now, such as computing on a single locally administrated cluster, to the grid. This can be achieved by providing a software layer on top of the base NPACI Grid middleware infrastructure. NPACI partners have developed, packaged, and deployed such a layer, NPACKage, which hides the many details and complexities of the grid, while providing users with a convenient yet powerful way to launch, track, and exploit the results of a wide variety of larger and larger-scale applications."

"The trick is to build cyberinfrastructure on top of local administrative policies, not in contradiction to them," Wolski agreed.

One key to earning the cooperation of users was to show them the benefits of the new way of doing things. A single grid interface to the resources of multiple sites offers several advantages in convenience and simplicity. Security procedures become easier for users to manage. When users submit jobs using the Resource Specification Language from the Globus Toolkit, they no longer are forced to use different submission formats for different systems. At runtime, the HotPage facilitates bandwidth monitoring, resource discovery, and status monitoring with its interfaces to the Globus monitoring service and the Network Weather Service. DataCutter and the Storage Resource Broker give users new distributed data analysis and management capabilities.

"The grid doesn't allow you to subvert the regular NRAC [National Resource Allocations Committee] allocation process," McGee said. "Let's say I'm a geoscientist in Colorado, and I have allocations on the various NPACI resources. One important feature that the grid layer provides is single sign-on to all components of the NPACI grid that I have allocations for. Each of the four NPACI sites has its own security procedures and mechanism, but using grid technology, you can issue a single command, grid-proxy-init, to access any of the systems. It's all about GSI, GridSecurity Infrastructure, which is embedded and enabled by many of these applications-Globus, SRB, GSI-OpenSSH, DataCutter, MyProxy. For the most part, GSI cuts across all these applications."

"All these tools and security advantages make the grid accessible to the scientific and engineering community," said Shannon Whitmore, a grid user support specialist at SDSC. "What's really exciting is the potential provided by the additional compute, memory, and network resources. The NPACI Grid enables scientists to run larger jobs than could be run at a single site. Research teams can access more powerful computers for large jobs while running smaller jobs on their local clusters. Loosely coupled jobs can be distributed across grid resources concurrently, to produce results much faster than on a single resource."

"We managed to accomplish a smooth transition with the APST [AppLeS parameter sweep template] project, for two applications: MCell and the Encyclopedia of Life [EOL]," Casanova said. "APST, which is a component of NPACKage, provides a simple way to deploy and run large-scale parameter sweep applications, which arise in virtually every field of science and engineering. MCell users first were able to use APST on their local clusters, slowly getting used to it without making a huge leap to grid computing. Once they had learned how to use it'and provided us with useful feedback-it became easy to, little by little, aggregate more and more resources to scale up to a grid-wide deployment spanning multiple institutions. This is exactly the kind of process that will lead many users to adopt the NPACI Grid. We're planning a very large MCell run on the entire NPACI Grid this fall. Similarly, EOL has adopted APST as the intermediate layer between the application and the grid infrastructure."

There are many projects trying to achieve these goals of easily interfacing users and applications to the grid infrastructure. Such tools will be fundamental for the NPACI Grid, and several of them are being developed by SDSC (including APST, SRB, Grid Portals). The NPACI Grid is ideally suited for successful interaction with user communities. "We have tight links to these communities, and we've been developing grid application-level tools for a while," Casanova said.

"Grid developers need to understand users' needs and resolve issues," Wolski said. "Since grids are constantly changing, this activity is ongoing and must be factored into any production setting."

"Now that the infrastructure is in place, the true potential of the grid is just beginning to open up," said Whitmore. "With an initial investment, scientists can create customized access points and share enormous resources with an entire research community from a single location. The SVT "the Scalable Visualization Toolkit" has just such an interface. Its grid portal allows users to submit large volume datasets and render different frames concurrently across the NPACI Grid. Users of the SVT portal place their datasets into the Storage Resource Broker (SRB) repository and then use the portal to select the parameters for rendering their jobs. The portal hides the details of how this job is executed across the NPACI Grid and returns the visualization results to the user via the SRB."

A FOUNDATION FOR CYBERINFRASTRUCTURE

"One of the challenges of building a new cyberinfrastructure is this," Casanova said. "How can we simultaneously use the NPACI Grid as an experimental platform for fundamental cyberinfrastructure research while satisfying the needs of scientific users for a working, reliable cyberinfrastructure? "

"As we deploy platforms like the NPACI Grid, it is important that we understand their fundamental characteristics," Casanova explained. "Grids are still very young. When the Internet was at a comparable stage of development, the challenge was to make it work and to extend its capabilities. With its maturity has arisen an entire science of monitoring, characterizing, and understanding the Internet-for instance groups such as CAIDA [the Cooperative Association for Internet Data Analysis] at SDSC have generated Internet maps that have had a large impact on this field. The same type of activities will eventually be needed for cyberinfrastructure: to understand usage patterns, develop new middleware, decide on appropriate administration policies. In this context, the NPACI Grid represents an opportunity to both cater to user communities, and be a fantastic experiment to gather initial and fundamental understanding of cyberinfrastructure characteristics at a very early stage. The NPACI Grid has the opportunity to provide the cyberinfrastructure research community with logs, measurements, and characterizations that will be critical in developing cyberinfrastructure as a science rather than pure engineering."

THE FUTURE

The NPACI Grid complements and builds on the National Science Foundation Middleware Initiative (NMI) and the TeraGrid/Extensible Terascale Facility (ETF) project, helping to provide key building blocks for NSF's cyberinfrastructure vision. NPACI partners are working to ensure that NPACI Grid applications and NPACKage software interoperate with NMI software and complement TeraGrid/ETF environments so that these and other national grid efforts can be integrated.

"Because we are interoperable with other grid infrastructure, we are enabling collaborations with researchers associated with SDSC, or USC, or the Pittsburgh Supercomputing Center to collaborate more effectively," said Carl Kesselman of ISI, chief software architect of NPACI, and co-principal investigator on the Globus Project. "And that is something we wouldn't be able to do otherwise."

"This doesn't happen by itself," Wolski said. "You have to have experienced, dedicated people managing the system. It is the human infrastructure that is key to making cyberinfrastructure work successfully."

 

A Conversation
with Carl Kesselman,
NPACI's Chief
Software Architect

Q:  Why build an NPACI Grid?

A: The grid is about sharing and building virtual organizations. NPACI is a partnership, a virtual organization to promote access to resources and sharing of resources. And to that extent, the NPACI Grid is the instantiation of the concept of a partnership.

Back in the proposal stage of NPACI, we had a vision of a distributed grid infrastructure that would support a broad range of resources. It was also one of Fran Berman’s priorities, and through her good efforts and a talented team and contributions from the partnership, these ideas have come to fruition-not some experimental thing off the side-but a production system that our resource partners can really use.

Q: Why should scientists care about grid computing in general, and about the NPACI Grid in particular? What's in it for them?

A: An integrated environment for accessing and sharing resources creates the potential for new types of applications that transcend the traditional supercomputer center. A case in point is the online tomography work that Mark Ellisman’s Telescience Alpha project is doing on the NPACI Grid. He can couple live data-acquisition with the data reconstruction processing to interpret the data, which then facilitates interactive steering.

Another example is the work we are doing with the Southern California Earthquake Center, federating simulation models, and doing shared data analysis across this research collaboration. Because we have these grid-enabled NPACI resources, we can more effectively integrate into the workflows of earthquake researchers across the state. Recently, those of us involved in the NEES-Grid (Network for Earthquake Engineering Simulation), which is oriented toward doing earthquake structural engineering, used grid technology to demonstrate a new type of engineering experiment: we combined simulation with physical experiments at distributed sites.

Q: What's the relationship of the NPACI Grid to the TeraGrid?

A: What we are doing is complimentary to the investments in NMI [the NSF Middleware Initiative], the TeraGrid, and other grid activities. TeraGrid was designed from scratch, top-down, for specified hardware linked through a dedicated network.

On the contrary, the NPACI Grid was designed bottom-up. It is more heterogeneous and diverse than TeraGrid and uses existing research networks. While there is a great commonality in the software stacks of both grids, the NPACI Grid is focused on integrating additional hardened and integrated software components with non-TeraGrid machines like Blue Horizon and the TACC [Texas Advanced Computing Center] and Michigan clusters.

 
 

“The trick is to build cyberinfrastructure on top of local administrative policies, not in contradiction to them.”
—Rich Wolski, associate professor of computer science at UC Santa Barbara and Grid Computing thrust lead for NPACI.

 
 

“The way to build and sustain a large user community on a platform like the NPACI Grid is to provide a way for users to smoothly transition from what they’re doing now, such as computing on a single locally administrated cluster, to the grid.”
—Henri Casanova, director of the Grid Research and Innovation
Laboratory (GRAIL) at SDSC.

 

Project Leaders
John McGee
(Project Manager)
Carl Kesselman,
Mats Rynge
Information Science Institute, University of Southern California

Rich Wolski,
Lawrence Miller
UC Santa Barbara

Nancy Wilkens-Diehr,
Phil Papadopoulos
San Diego Supercomputer Center

Major Participants

Shannon Whitmore,
Diana Diehl,
Bill Link,
Larry Diegel,
Richard Moore
San Diego Supercomputer Center

Other Participants
Patricia Kovatch,
Ben Tolo,
Margin Margo
San Diego Supercomputer Center

Marty Humphrey
University of Virginia

David Carver,
Shyamal Mitra
TACC
Tom Hacker,
Ken MacInnis,
Randy Crawford
University of Michigan