Skip to content

News Center

Home > News Center > Publications > EnVision

 

FROM THE DIRECTOR | Contents | Next

TeraGrid: Computing in the ‘Data Decade’

ast January, the National Science Foundation (NSF) requested proposals to build a Distributed Terascale Facility (DTF). The solicitation recognized the critical role that grid computing plays and will continue to play in key advances in science and engineering. Computational science today is a multiresource enterprise and the NSF request was emblematic of the awareness throughout the scientific community of the importance of building a hardware and software facility that can support a new generation of critical advances in science. NSF envisioned a facility to focus on the immense data and computing needs of science and society over the next ten years. This period–already becoming known as the Data Decade–will be marked by a need to link large data collections, computers, remote instruments, visualization sites, and other resources over fast networks so that they can be integrated to enable a new generation of applications.

NPACI joined forces with the Alliance to respond to the call. The joint proposal for the TeraGrid, approved on August 9 for $53 million, is a forward-looking, ambitious vision that will form the cornerstone of a national grid effort, embodying the goals and structure of NSF’s cyberinfrastructure. The initial four sites of the TeraGrid are SDSC, Caltech, NCSA, and Argonne National Laboratory. We will share a unified grid with more than half a petabyte of disk storage, a 40-gigabit-per-second national optical backbone (a carrying capacity equivalent to 40,000 cable modem connections), and 13 teraflops of aggregate compute power. Project investigators Dan Reed and I, and co-investigators Rick Stevens, Paul Messina, and Ian Foster, teamed with a list of distinguished corporate partners, including IBM, Intel Corporation, Qwest Communications, Sun Microsystems, Myricom, and Oracle Corporation.

The TeraGrid software will leverage the strong momentum of the Linux world, the worldwide deployment of Globus, and the scientific community’s movement towards commodity clusters. Data-oriented applications will be a particular focus of the TeraGrid to address the most pressing needs of the scientific community to combine the analysis and synthesis of massive amounts of data with computation during the Data Decade. TeraGrid nodes will be connected by an ultra-high-speed network, enabling distributing computing, data management, and knowledge synthesis at an unprecedented scale.

The vision of the TeraGrid is driven by the increasing requirements of applications for online access to massive amounts of data and will engender new and important computing models. To date, we have been forced (due to cost and other factors) to use off-line and near-line archival storage systems to manage massive scientific data sets. There have been no affordable terascale online storage systems available to us. Think of a near-line system as a big "data freezer" where only small chunks of data can be retrieved, defrosted, and used at any given time. In contrast, the data-oriented computational nodes envisioned for the TeraGrid will allow lavish amounts of online data to be continually available for instantaneous analysis, data mining, and knowledge synthesis.

Few fields are challenged by the size and complexity of data more than neuroscience. Data on brain function and structure range from the molecular level, where new drugs are designed to target disease, to the whole brain, where multiple terabyte data sets from the most advanced high-resolution imaging systems provide insight into how neural structures work together. The NPACI Neuroscience thrust, led by UCSD participants at SDSC and the National Center for Microscopy and Imaging Research, has been serving as the nerve center for a pioneering project to link geographically distributed multiscale brain data sets.

With the creation of the TeraGrid, these activities will be expanded to include more partner sites and bring greater volumes of brain data into a common environment. Such an environment will allow brain researchers to address important and fundamental questions about the function of healthy brains and those affected by disease. What we learn about the brain will also be a bonus to hardware engineers who will develop the next generations of computers.

The development of the TeraGrid will involve one of the largest, most coordinated, and most distributed efforts in computing history. It will leverage the skills and talents of PACI partners, application researchers, and technical and operations staff at SDSC, NCSA, Caltech, and Argonne.

The initial four sites of this revolutionary venture will form a core in which to prototype policies, services, software, and applications. Once TeraGrid is operational, it will serve as the core of a national grid effort and connect additional sites. We at NPACI, and indeed the entire PACI program, are proud to be part of this pioneering initiative. As we embark together upon the Data Decade, the TeraGrid addresses the critical needs of science and society and opens up new avenues of discovery.


 

Picture of Fran Berman
By Fran Berman
NPACI and SDSC Director