Skip to content

News Center

Home > News Center > Publications > EnVision


TERAGRID | Contents | Next

Cluster-Management Software Developers
Preparing for the TeraGrid


he TeraGrid to be deployed by SDSC and its partners in California and Illinois involves more than the planned 13.6-teraflops, 650-terabyte supercomputing network linked with a 40-gigabit-per-second, fiber-optic back- bone. Developers of the TeraGrid (see stories, p. 1 and 2) agree that in order to fulfill its lofty scientific potential of enhancing research in many fields of science, the conputing infrastructure must have state-of-the-art, open-source, cluster-management software such as NPACI Rocks. This toolkit is steadily growing a user base with known cluster installations at 15 academic, government, and industrial sites. Such high-performance clusters based on Linux have become indispensable to many scientists since the first cluster was created in 1994.

Meteor Cluster

Meteor Cluster, which runs Rocks at SDSC

Millennium Cluster

Millennium Cluster
at UC Berkeley

Simply creating commodity clusters is easy, as universities, and industrial research operations have found. However, John R. Boisseau, director of the Texas Advanced Computing Center (TACC) at the University of Texas, Austin, said such clusters present unforeseen challenges. "A UT research group was having trouble with its cluster, so we brought the hardware to our center and installed NPACI Rocks," said Boisseau. "The installation was easy, the cluster is very robust, and the users are now productive." TACC is planning to install NPACI Rocks on two new production clusters–one using Pentium III processors and another using Itanium processors. "These cluster installations will serve as models for other UT departments and other Texas universities," said Boisseau.


The Grid Physics Network (GriPhyN), a collaboration of experimental physicists and information-technology researchers, also is using NPACI Rocks as part of its implementation of the first petabyte-scale computational environment. Scientists around the globe will use GriPhyN to coordinate research at four huge physics experiments that are exploring the fundamental forces of nature and the structure of the universe. "We’re using NPACI Rocks because it gives us the ability to manage cluster configurations in a straightforward and flexible manner," said Paul Avery, lead scientist for the National Science Foundation-funded GriPhyN project and a University of Florida physicist.

The GriPhyN project will help high-energy physicists analyze data from the Large Hadron Collider (LHC) at the European Organization for Nuclear Research, or CERN. The LHC, which is expected to begin operations by 2005, will smash protons together at the highest energies ever attained in a particle accelerator. It will allow scientists to probe the structure of matter and recreate the conditions in the universe 10—12 seconds after the Big Bang when the temperature was roughly 1016ÉC.


Scientists at SDSC are combining NPACI Rocks with a complimentary open-source toolkit called Millennium. A UC Berkeley team led by computer scientist David Culler developed Millennium to manage clusters at the university. The combined cluster-management software will be called Millennium-Rocks Cluster. It will unite the cluster-configuration and installation strengths of Rocks with the system-monitoring and job-launching strengths of Millennium. "Fusing the two is part of our strategy to provide rapidly updatable management tools that may become part of the TeraGrid," said Philip Papadopoulos, group leader for Distributed Computing at SDSC.

SDSC also is teaming with Compaq Computer Corporation to provide a high-performance computing platform based on NPACI Rocks and Compaq’s ProLiant line of servers. This alliance is aimed at satisfying the increasing demands of financial, multimedia, and data-serving markets.


NPACI Rocks is actually Red Hat Linux (currently version 7.1) with management techniques and additional features, which provide users of Rocks clusters a familiar, compatible Linux infrastructure. Users get a reliable integrated turnkey solution for high-performance computing with increased performance, streamlined administration, and simplified scalability. Rocks is designed to take bare hardware–that is a cluster of machines with no software installed–to a working cluster in a fast, straightforward manner. Administrators can customize their initial setup through a Web page and capture the information in a small text file. This file and a bootable Rocks CD are all that are needed to deploy a cluster.

"Strong adherence to highly regarded software tools allows Rocks, and now the Millennium-Rocks Cluster, to move with the rapid pace of Linux development," said Papadopoulos, "We’ve made several important upgrades, simplified installation, integrated new tools, and made remote use of clusters easier."

NPACI Rocks involves an operating system (OS) reinstallation on every node as security or software updates becomes available. "It may seem wrong to reinstall the OS only to change a configuration parameter on a subset of software packages," said Papadop-oulos. "It takes less than 15 minutes to completely reinstall a 32-node cluster and about 30 minutes for a 100-node cluster." Rocks also is hardware neutral, which is a key feature in environments where heterogeneity is the norm. –RG


NPACI Rocks Team
Philip Papadopoulos,
Mason J. Katz,
Greg Bruno

Millennium Team
David Culler,
Eric Fraser,
Matt Massie,
Albert Goto,
UC Berkeley

Grid Physics Network
Myricom Inc.
Northwestern University
Pacific Northwest Laboratory
Texas A&M University
University of Florida
University of Hong Kong
University of Houston
University of North Texas
University of Texas, Austin