January, the National Science Foundation (NSF) requested proposals
to build a Distributed Terascale Facility (DTF). The solicitation
recognized the critical role that grid computing plays and will
continue to play in key advances in science and engineering. Computational
science today is a multiresource enterprise and the NSF request
was emblematic of the awareness throughout the scientific community
of the importance of building a hardware and software facility
that can support a new generation of critical advances in science.
NSF envisioned a facility to focus on the immense data and computing
needs of science and society over the next ten years. This periodalready
becoming known as the Data Decadewill be marked by a need
to link large data collections, computers, remote instruments,
visualization sites, and other resources over fast networks so
that they can be integrated to enable a new generation of applications.
NPACI joined forces
with the Alliance to respond to the call. The joint proposal for
the TeraGrid, approved on August 9 for $53 million, is a forward-looking,
ambitious vision that will form the cornerstone of a national
grid effort, embodying the goals and structure of NSFs cyberinfrastructure.
The initial four sites of the TeraGrid are SDSC, Caltech, NCSA,
and Argonne National Laboratory. We will share a unified grid
with more than half a petabyte of disk storage, a 40-gigabit-per-second
national optical backbone (a carrying capacity equivalent to 40,000
cable modem connections), and 13 teraflops of aggregate compute
power. Project investigators Dan Reed and I, and co-investigators
Rick Stevens, Paul Messina, and Ian Foster, teamed with a list
of distinguished corporate partners, including IBM, Intel Corporation,
Qwest Communications, Sun Microsystems, Myricom, and Oracle Corporation.
The TeraGrid software
will leverage the strong momentum of the Linux world, the worldwide
deployment of Globus, and the scientific communitys movement
towards commodity clusters. Data-oriented applications will be
a particular focus of the TeraGrid to address the most pressing
needs of the scientific community to combine the analysis and
synthesis of massive amounts of data with computation during the
Data Decade. TeraGrid nodes will be connected by an ultra-high-speed
network, enabling distributing computing, data management, and
knowledge synthesis at an unprecedented scale.
The vision of the TeraGrid
is driven by the increasing requirements of applications for online
access to massive amounts of data and will engender new and important
computing models. To date, we have been forced (due to cost and
other factors) to use off-line and near-line archival storage
systems to manage massive scientific data sets. There have been
no affordable terascale online storage systems available to us.
Think of a near-line system as a big "data freezer"
where only small chunks of data can be retrieved, defrosted, and
used at any given time. In contrast, the data-oriented computational
nodes envisioned for the TeraGrid will allow lavish amounts of
online data to be continually available for instantaneous analysis,
data mining, and knowledge synthesis.
Few fields are challenged
by the size and complexity of data more than neuroscience. Data
on brain function and structure range from the molecular level,
where new drugs are designed to target disease, to the whole brain,
where multiple terabyte data sets from the most advanced high-resolution
imaging systems provide insight into how neural structures work
together. The NPACI Neuroscience thrust, led by UCSD participants
at SDSC and the National Center for Microscopy and Imaging Research,
has been serving as the nerve center for a pioneering project
to link geographically distributed multiscale brain data sets.
With the creation of
the TeraGrid, these activities will be expanded to include more
partner sites and bring greater volumes of brain data into a common
environment. Such an environment will allow brain researchers
to address important and fundamental questions about the function
of healthy brains and those affected by disease. What we learn
about the brain will also be a bonus to hardware engineers who
will develop the next generations of computers.
The development of
the TeraGrid will involve one of the largest, most coordinated,
and most distributed efforts in computing history. It will leverage
the skills and talents of PACI partners, application researchers,
and technical and operations staff at SDSC, NCSA, Caltech, and
The initial four sites
of this revolutionary venture will form a core in which to prototype
policies, services, software, and applications. Once TeraGrid
is operational, it will serve as the core of a national grid effort
and connect additional sites. We at NPACI, and indeed the entire
PACI program, are proud to be part of this pioneering initiative.
As we embark together upon the Data Decade, the TeraGrid addresses
the critical needs of science and society and opens up new avenues
and SDSC Director