| Petascale
Challenge
Enabling
Extreme Computing
Grid
Computing Simplified
esearchers
have made significant science advances using Globus as a gateway
to computing resources, with major projects underway in fields
from high-energy physics to atmospheric modeling. By providing
consistent and uniform access to Grid computing, much like the
power grid provides electricity, Globus has altered the way scientists
and engineers use computers. In addition, SDSC researchers have
developed Web-based tools that simplify Grid access by interacting
transparently with Globus, allowing scientists to concentrate
on their research rather than the details of the supercomputer.
 |
Figure
1. Gravity Waves
This visualization of gravitational waves, computed in accordance
with Einsteins theory of General Relativity, was generated
using Globus with mpichG2 and Cactus software. Visualization
created by Werner Benger, NCSA/AEI Potsdam/Wash U/ZIB visualization
team, and provided courtesy of the Max Planck Institute for
Gravitational Physics, and Konrad Zuse Center for Scientific
Computing. |
Globus has become an
essential piece of infrastructure in distributed high-performance
computing by providing the software and tools needed to integrate
geographically distributed scientific instruments, databases,
and tools for scientific visualization. Over the past few years,
Globus has been deployed at more than 100 sites worldwide, including
the PACI centers. "Weve built
a foundation thats become the de facto standard," said
Ian Foster of Argonne National Laboratory and the University of
Chicago. "At this point, there seems to be a broad acceptance
of Globus technology for high-end systems." Foster and Carl
Kesselman of the Information Sciences Institute (ISI) at the University
of Southern California, have been collaborating on the project
since 1995. Globus is a joint project of ISI and Argonne National
Laboratory and is a focus of development at both NPACI and the
National Computational Science Alliance. Petascale
Challenge One scientific development
that will make the most of Globus features is the Grid Physics
Network (GriPhyN), led by Foster and Paul Avery of the University
of Florida. The GriPhyN collaboration, which consists of a team
of experimental physicists and information technology researchers,
aims to implement the first petabyte-scale computational environment,
the Petascale Virtual Data Grid (PVDG). The PVDG will provide
computing resources for researchers working on four experiments
examining the fundamental forces of nature and the structure of
the universe. The Compact Muon Solenoid and A Toroidal LHC ApparatuS
(ATLAS) experiments at the Large Hadron Collider at CERN will
search for the origins of mass and probe matter at the smallest
length scales; the Laser Interferometer Gravitational-wave Observatory
will detect the gravitational waves of pulsars, supernovae and
in-spiraling binary stars; and the Sloan Digital Sky Survey will
carry out an automated sky survey enabling systematic studies
of stars, galaxies, nebula, and large-scale structure. The data sets generated
by the experiments are expected to grow from 100 terabytes to
a 100 petabytes over the next decade. In addition, GriPhyN is
expected to perform computations that require more than 120 trillion
floating-point operations per second. Such a massive computing
effort will require linking thousands of computers worldwide,
said Miron Livny of the University of Wisconsin. Globus, in tandem
with other automatic resource brokering software, will provide
a set of low-level protocols for resource access and connectivity
for those computers. "Globus is one
part of a virtual toolkit of different technologies that will
be used for the project," said Livny. Other NPACI technologies
forming part of the GriPhyN environment are NPACI Rocks for managing
linux clusters and the SDSC Storage Resource Broker for managing
the large scale data sets. While Livny said the
project is still in a relatively early planning phase, GriPhyN
researchers recently did a successful trial run on Alliance resources
using data from CERNs Compact Muon Solenoid experiment.
The group used Globus, in conjunction with Condor, to move data
between computing resources at Caltech, the University of Wisconsin
and National Center for Supercomputing Applications at the University
of Illinois at Urbana-Champaign.
Click to view larger image.
|
Figure
2. How Does GridPort Work?
GridPort is based on advanced Web, security
and metacomputing technologies such as PKI and Globus to provide
secure, interactive services. The Web pages and data are built
from server-side Perl CGI scripts and simple HTML plus JavaScript
on the client side, so they can be easily viewed from any
browser. GridPort provides portal developers an interface
to an infrastructure that integrates grid technologies such
as Globus, MyProxy, the SDSC Storage Resource Broker, and
the Network Weather Service. |
Enabling Extreme Computing Another data-intensive
project which tests Globus ability to handle extreme computing
demands has been the Cactus project. The modular, open-source
problem-solving environment for scientific and engineering problems,
was originally designed to tackle some of the most complex problems
in astrophysics, including Einsteins equations for colliding
neutron stars, black holes, and the formation of singularities.
Researchers from the
Cactus group recently collaborated on a distributed run using
Globus with mpich-G2 and Cactus software to prove that massive
computing problems could be spread among multiple supercomputers.
The simulation of gravitational waves formally known as Teukolsky
waves lasted between four to eight hours and used 512 processors
on three SGI Origin systems at NCSA and 1,024 processors on NPACIs
Blue Horizon at SDSC (Figure 1). "We ran a large
test case as a single Globus job and it ran like a champ. Best
of all, even though the code had been scaled up to run on more
than 1,500 processors, it executed at better than 70 percent efficiency,"
said John Towns, Division Director of Scientific Computing at
NCSA. Thomas Dramlitsch of
the Max Planck Institute for Gravitational Physics said, "This
run demonstrated that this kind of resource coalescing is possible.
Furthermore, we achieved something more: we could show an adequate
scalingup to 70 percent. Older experiments had only two
supercomputers involved and the scaling was below 50 percent." Grid
Computing Simplified However, working with
Globus still requires technical expertise and a budget that may
be out of reach for some research groups. To address those issues,
NPACI has created services such as the NPACI HotPage and GridPort,
which allow Web access to the PACI Grid resources, yet hide complex
interactions with Grid software. Globus toolkits such as Resource
Management, Globus I/O, and Grid Security Insfrastructure are
used extensively by GridPort. The original version
of GridPort broke ground by providing a software development toolkit
of standard, portable technologies on top of Grid software such
as Globus that developers could use to create scientific application
portals. These portals could then be used to securely access high-end
computing resources from any Web-connected device, including wireless
handhelds. Last November, GridPorts "HPC anywhere"
capabilities were demonstrated at SC2000. The GridPort development
team, led by Mary Thomas at SDSC, has since taken the idea of
portal development one step further. The group recently released
a beta version of the GridPort Client Toolkit that allows scientists
who know how to build Web pages to create customized application
portals that access the PACI Grid. Researchers need only to incorporate
a few extra lines of HTML into a Web page to set up a Web site
running on any server, anywhere in the world, that can communicate,
via Globus and other software, with the PACI Grid (Figure 2). "This capability
has been a long-term design goal of the NPACI portal effort,"
said Thomas, manager of the SDSC Computational Science Portals
group. "It will allow individual scientists to build simple
portals that take advantage of existing infrastructure without
having to invest large amounts of time or funding. These portals
allow scientists to concentrate on research rather than the details
of the supercomputer." Current features allow
users to access services such as logging in, uploading files,
and submitting jobs, and more extensions that are planned for
the future. "Now, essentially all it takes to create a customized
application portal to GridPort is to incorporate a few extra lines
of HTML into a Web page," said Maytal Dahan, a portal developer.
The GridPort Client
Toolkit was demonstrated in a tutorial at the NPACI All-Hands
Meeting 2001. Students were given NPACI portal accounts; they
then downloaded an example set of HTML pages; finally, they installed
the pages either onto a Web server or ran them locally. With very
few modifications to the downloaded Web pages, the students had
an instance of a simple, working portal. Research scientist
Don Sutton successfully demonstrated the ease of use of these
tools, which he recently incorporated into the existing Basins,
Bays, and Estuaries Project (BBE), part of NPACIs Earth
Systems Science thrust area. Sutton migrated an atmospheric modeling
code, which had been running on local SDSC workstations, to HPC
resources and connected the code to the BBE portal by adding security
and job submission capabilities via the GridPort Client Toolkit. "Grid technologies,
such as Globus, have made it possible for application developers
to redefine how users interface with NPACI resources," said
Jay Boisseau, former SDSC associate director for Scientific Computing
and now director of the Texas Advanced Computing Center at the
University of Texas, Austin. "Globus, in particular, allows
GridPort developers to extend functionality via the Web, which
is a tremendous tool for doing computational science." CF
|
Project
Leaders
Ian Foster
Argonne National Laboratory and the University of
Chicago
Carl Kesselman
Information Sciences Institute, University of Southern
California
www.globus.org
www.griphyn.org
www.cactuscode.org
gridport.npaci.edu
Reference
I. Foster, C. Kesselman, S. Tuecke, (2001), "The Anatomy of
the Grid: Enabling Scalable Virtual Organizations," to be published
in Intl. J. Supercomputer Applications. |