|
Over the past two years, NPACI partners have been coordinating
their resources and software to create a next-generation
cyberinfrastructure for science and engineering applications:
the NPACI Grid. It consists of hardware, network, and
data resources at four sites-the San Diego Supercomputer
Center (SDSC), the Texas Advanced Computing Center (TACC),
the University of Michigan, and soon at Caltech's Center
for Advanced Computing Research-running an integrated
set of grid middleware and advanced NPACI scientific
and engineering applications. But one of the most important
aspects of the NPACI Grid won't be found on any system
architecture diagram-the people who set it up, keep
it running, and use it for new science.
"We had a group of people who belong to various
administrative domains who came together to solve common
problems and provide this infrastructure," said
John McGee of the University of Southern California's
Information Sciences Institute (ISI). He was responsible
for coordinating the development and deployment of the
NPACI Grid. "The NPACI Grid is an implementation
of middleware, application software, and processes and
procedures to mitigate the challenges a scientist faces
when trying to use heterogeneous resources from multiple
organizations."
"The NPACI Grid is clever technology, but the
real secret to making it work was the dedication by
maintainers to build an administrative and technical
bridge to users and system administrators across administrative
domains," said Rich Wolski, associate professor
of computer science at UC Santa Barbara and Grid Computing
thrust lead for NPACI. "This wasn't what some of
us expected to be the most important activity-it wasn't
in our job descriptions."
The resources comprising the NPACI Grid belong to
four different administrative domains, each with its
own "culture"-rules, regulations, accounting
procedures, personalities in charge, and ingrained way
of doing things. This is unavoidable, since each site
comprises unique resources that are needed for their
original operational roles in serving the users of their
institutions.
"We set out to build a grid from the bottom up,"
Wolski said. "The process not only gave the NPACI
team vital experience in using cyberinfrastructure technologies
in the real world, but we also learned that there's
a new human component to cyberinfrastructure as a result
of the multiple administrative domains and the multiple
existing user communities we dealt with. The real job
of the middleware and packaging tools is to facilitate
coordination between these communities. The NPACI Grid
is the infrastructure behind this coordination."
SAFEGUARDING USERS
"The NPACI Grid is a complex platform and it
is just not possible to tell application users-biologists,
physicist-to start doing things radically differently
from one minute to the next," said Henri Casanova,
director of the Grid Research and Innovation Laboratory
(GRAIL) at SDSC. "The way to build and sustain
a large user community on a platform like the NPACI
Grid is to provide a way for users to smoothly transition
from what they're doing now, such as computing on a
single locally administrated cluster, to the grid. This
can be achieved by providing a software layer on top
of the base NPACI Grid middleware infrastructure. NPACI
partners have developed, packaged, and deployed such
a layer, NPACKage, which hides the many details and
complexities of the grid, while providing users with
a convenient yet powerful way to launch, track, and
exploit the results of a wide variety of larger and
larger-scale applications."
"The trick is to build cyberinfrastructure on
top of local administrative policies, not in contradiction
to them," Wolski agreed.
One key to earning the cooperation of users was to
show them the benefits of the new way of doing things.
A single grid interface to the resources of multiple
sites offers several advantages in convenience and simplicity.
Security procedures become easier for users to manage.
When users submit jobs using the Resource Specification
Language from the Globus Toolkit, they no longer are
forced to use different submission formats for different
systems. At runtime, the HotPage facilitates bandwidth
monitoring, resource discovery, and status monitoring
with its interfaces to the Globus monitoring service
and the Network Weather Service. DataCutter and the
Storage Resource Broker give users new distributed data
analysis and management capabilities.
"The grid doesn't allow you to subvert the regular
NRAC [National Resource Allocations Committee] allocation
process," McGee said. "Let's say I'm a geoscientist
in Colorado, and I have allocations on the various NPACI
resources. One important feature that the grid layer
provides is single sign-on to all components of the
NPACI grid that I have allocations for. Each of the
four NPACI sites has its own security procedures and
mechanism, but using grid technology, you can issue
a single command, grid-proxy-init, to access any of
the systems. It's all about GSI, GridSecurity Infrastructure,
which is embedded and enabled by many of these applications-Globus,
SRB, GSI-OpenSSH, DataCutter, MyProxy. For the most
part, GSI cuts across all these applications."
"All these tools and security advantages make
the grid accessible to the scientific and engineering
community," said Shannon Whitmore, a grid user
support specialist at SDSC. "What's really exciting
is the potential provided by the additional compute,
memory, and network resources. The NPACI Grid enables
scientists to run larger jobs than could be run at a
single site. Research teams can access more powerful
computers for large jobs while running smaller jobs
on their local clusters. Loosely coupled jobs can be
distributed across grid resources concurrently, to produce
results much faster than on a single resource."
"We managed to accomplish a smooth transition
with the APST [AppLeS parameter sweep template] project,
for two applications: MCell and the Encyclopedia of
Life [EOL]," Casanova said. "APST, which is
a component of NPACKage, provides a simple way to deploy
and run large-scale parameter sweep applications, which
arise in virtually every field of science and engineering.
MCell users first were able to use APST on their local
clusters, slowly getting used to it without making a
huge leap to grid computing. Once they had learned how
to use it'and provided us with useful feedback-it became
easy to, little by little, aggregate more and more resources
to scale up to a grid-wide deployment spanning multiple
institutions. This is exactly the kind of process that
will lead many users to adopt the NPACI Grid. We're
planning a very large MCell run on the entire NPACI
Grid this fall. Similarly, EOL has adopted APST as the
intermediate layer between the application and the grid
infrastructure."
There are many projects trying to achieve these goals
of easily interfacing users and applications to the
grid infrastructure. Such tools will be fundamental
for the NPACI Grid, and several of them are being developed
by SDSC (including APST, SRB, Grid Portals). The NPACI
Grid is ideally suited for successful interaction with
user communities. "We have tight links to these
communities, and we've been developing grid application-level
tools for a while," Casanova said.
"Grid developers need to understand users' needs
and resolve issues," Wolski said. "Since grids
are constantly changing, this activity is ongoing and
must be factored into any production setting."
"Now that the infrastructure is in place, the
true potential of the grid is just beginning to open
up," said Whitmore. "With an initial investment,
scientists can create customized access points and share
enormous resources with an entire research community
from a single location. The SVT "the Scalable Visualization
Toolkit" has just such an interface. Its grid portal
allows users to submit large volume datasets and render
different frames concurrently across the NPACI Grid.
Users of the SVT portal place their datasets into the
Storage Resource Broker (SRB) repository and then use
the portal to select the parameters for rendering their
jobs. The portal hides the details of how this job is
executed across the NPACI Grid and returns the visualization
results to the user via the SRB."
A FOUNDATION FOR CYBERINFRASTRUCTURE
"One of the challenges of building a new cyberinfrastructure
is this," Casanova said. "How can we simultaneously
use the NPACI Grid as an experimental platform for fundamental
cyberinfrastructure research while satisfying the needs
of scientific users for a working, reliable cyberinfrastructure?
"
"As we deploy platforms like the NPACI Grid,
it is important that we understand their fundamental
characteristics," Casanova explained. "Grids
are still very young. When the Internet was at a comparable
stage of development, the challenge was to make it work
and to extend its capabilities. With its maturity has
arisen an entire science of monitoring, characterizing,
and understanding the Internet-for instance groups such
as CAIDA [the Cooperative Association for Internet Data
Analysis] at SDSC have generated Internet maps that
have had a large impact on this field. The same type
of activities will eventually be needed for cyberinfrastructure:
to understand usage patterns, develop new middleware,
decide on appropriate administration policies. In this
context, the NPACI Grid represents an opportunity to
both cater to user communities, and be a fantastic experiment
to gather initial and fundamental understanding of cyberinfrastructure
characteristics at a very early stage. The NPACI Grid
has the opportunity to provide the cyberinfrastructure
research community with logs, measurements, and characterizations
that will be critical in developing cyberinfrastructure
as a science rather than pure engineering."
THE FUTURE
The NPACI Grid complements and builds on the National
Science Foundation Middleware Initiative (NMI) and the
TeraGrid/Extensible Terascale Facility (ETF) project,
helping to provide key building blocks for NSF's cyberinfrastructure
vision. NPACI partners are working to ensure that NPACI
Grid applications and NPACKage software interoperate
with NMI software and complement TeraGrid/ETF environments
so that these and other national grid efforts can be
integrated.
"Because we are interoperable with other grid
infrastructure, we are enabling collaborations with
researchers associated with SDSC, or USC, or the Pittsburgh
Supercomputing Center to collaborate more effectively,"
said Carl Kesselman of ISI, chief software architect
of NPACI, and co-principal investigator on the Globus
Project. "And that is something we wouldn't be
able to do otherwise."
"This doesn't happen by itself," Wolski
said. "You have to have experienced, dedicated
people managing the system. It is the human infrastructure
that is key to making cyberinfrastructure work successfully."
|
|
|
A Conversation
with Carl Kesselman,
NPACI's Chief
Software Architect
Q:
Why build an NPACI Grid?
A:
The grid is about sharing and building virtual
organizations. NPACI is a partnership, a virtual
organization to promote access to resources and
sharing of resources. And to that extent, the
NPACI Grid is the instantiation of the concept
of a partnership.
Back in the proposal stage
of NPACI, we had a vision of a distributed grid
infrastructure that would support a broad range
of resources. It was also one of Fran Berman’s
priorities, and through her good efforts and a
talented team and contributions from the partnership,
these ideas have come to fruition-not some experimental
thing off the side-but a production system that
our resource partners can really use.
Q:
Why should scientists care
about grid computing in general, and about the
NPACI Grid in particular? What's in it for them?
A:
An integrated environment
for accessing and sharing resources creates the
potential for new types of applications that transcend
the traditional supercomputer center. A case in
point is the online tomography work that Mark
Ellisman’s Telescience Alpha project is
doing on the NPACI Grid. He can couple live data-acquisition
with the data reconstruction processing to interpret
the data, which then facilitates interactive steering.
Another example is the
work we are doing with the Southern California
Earthquake Center, federating simulation models,
and doing shared data analysis across this research
collaboration. Because we have these grid-enabled
NPACI resources, we can more effectively integrate
into the workflows of earthquake researchers across
the state. Recently, those of us involved in the
NEES-Grid (Network for Earthquake Engineering
Simulation), which is oriented toward doing earthquake
structural engineering, used grid technology to
demonstrate a new type of engineering experiment:
we combined simulation with physical experiments
at distributed sites.
Q:
What's the relationship of the NPACI Grid to the
TeraGrid?
A:
What we are doing is complimentary to the investments
in NMI [the NSF Middleware Initiative], the TeraGrid,
and other grid activities. TeraGrid was designed
from scratch, top-down, for specified hardware
linked through a dedicated network.
On the contrary, the NPACI
Grid was designed bottom-up. It is more heterogeneous
and diverse than TeraGrid and uses existing research
networks. While there is a great commonality in
the software stacks of both grids, the NPACI Grid
is focused on integrating additional hardened
and integrated software components with non-TeraGrid
machines like Blue Horizon and the TACC [Texas
Advanced Computing Center] and Michigan clusters.
|
| |
 |
| |
|
The trick is to build cyberinfrastructure
on top of local administrative policies, not in
contradiction to them.
Rich Wolski, associate professor of computer
science at UC Santa Barbara and Grid Computing
thrust lead for NPACI.
|
| |
 |
| |
|
The way to build and sustain a large
user community on a platform like the NPACI Grid
is to provide a way for users to smoothly transition
from what theyre doing now, such as computing
on a single locally administrated cluster, to
the grid.
Henri Casanova, director of the Grid Research
and Innovation
Laboratory (GRAIL) at SDSC.
|
| |
|
Project Leaders
John McGee
(Project Manager)
Carl Kesselman,
Mats Rynge
Information Science Institute, University of
Southern California
Rich Wolski,
Lawrence Miller
UC Santa Barbara
Nancy Wilkens-Diehr,
Phil Papadopoulos
San Diego Supercomputer Center
Major Participants
Shannon Whitmore,
Diana Diehl,
Bill Link,
Larry Diegel,
Richard Moore
San Diego Supercomputer Center
Other Participants
Patricia Kovatch,
Ben Tolo,
Margin Margo
San Diego Supercomputer Center
Marty Humphrey
University of Virginia
David Carver,
Shyamal Mitra
TACC
Tom Hacker,
Ken MacInnis,
Randy Crawford
University of Michigan
|
| |
| |
|