DATA
FLOOD
THE
UNIVERSE ON THE GRID
USER-FRIENDLY
INTERFACE
SCIENTIFIC IMPACT
he
National Science Foundation has awarded $10 million to astronomers
at 17 research institutions to create a huge, expandable database
of astronomical images, catalogs, measurements, and scientific publications
called the National Virtual Observatory (NVO). The five-year Information
Technology Research grant will enable researchers to develop the
means to use the latest computer technologies and data storage and
analysis techniques to unite more than 100 terabytes of data collected
from 50 ground- and space-based telescopes and instruments. NVO
will provide images, catalogs, observational measurements, and scientific
papersavailable in an easily accessible form with a consistent
interface to professional researchers, amateur astronomers, and
students. Using powerful grids of computers, scientists will be
able to cross-correlate observations among catalogs to make discoveries
that would otherwise be impossible to make.
 |
|
Multiwavelength
Data Federation
A
composite image of the supernova remnant E0102-72 in the
Small Magellanic Cloud, a satellite galaxy of the Milky
Way. The image was created from three data sources: x-ray
(blue), optical (green), and radio (red).
|
 |
|
Raider of
the Found Arcs
This
Hubble Space Telescope image of the galaxy cluster A2218
was analyzed by Alexander Szalays prototype of an
automated discovery tool to find gravitational lens arcs.
Similar automated procedures could process National Virtual
Observatory images to discover large numbers of new gravitational
lenses.
|
The co-principal investigators
for the NVO project team are Alexander Szalay, an astronomer at
Johns Hopkins University, and Paul Messina, a computational scientist
at Caltech. Szalay has pioneered the use of advanced computing techniques
on large astronomical databases, and Messina, Chief Architect of
NPACI, has led several multidisciplinary research projects.
Szalay believes the NVO represents a new, third approach to scientific
research. First, you have science conducted through theoretical
models, he said. Next, you have science tested through
experiments. The new approach, scientific exploration through computational
methods, is developing in response to the tremendous volumes of
data were starting to gather in many of the sciences.
DATA
FLOOD
Thanks to recent breakthroughs
in telescope, detector, and computer technology, astronomical
surveys in gamma ray, and x-ray, optical, infrared, and radio
wavelengths produce terabytes of images and catalog data per yearmore
information annually than the entire body of astronomical data
before 1980. The trend is accelerating; for example, the planned
Large Synoptic Survey Telescope will produce over 10 petabytes
per year by 2008.
We now have to deal with an ever-increasing flood of information,
said David De Young, NVO project scientist. Without the
tools and structure that will be provided through the NVO, finding
a particular piece of data would be like trying to find a contact
lens in a swimming pool.
Challenges lead to opportunities. Advances in information technology
offer solutions for data mining, for sophisticated pattern recognition,
and for large-scale statistical cross-correlations.
In 1999 the National Academy of Sciences recommended establishment
of a National Virtual Observatory as a Rosetta stone
to link the major astronomical data archives and catalogs and
to support cross-correlation among these resources. The idea was
promoted by the Digital Sky project, an NSF-funded NPACI effort
led by Caltech researcher Tom Prince to federate data from four
different astronomical databases and make the information available
through a Web portal.
The NVO project will federatelink without consolidatingmore
than 100 terabytes of astronomical data from more than 50 collections.
Organizers are planning to keep the NVO virtual, not
located in any single facility.
A key challenge will be to develop ways of analyzing data from
several databases simultaneously. Each of those databases
is organized differently, which makes it quite difficult to perform
analyses of data from several collections simultaneously,
Messina explained. But such investigations promise to yield
important scientific discoveries, so enhancing our ability to
do these analyses is an important part of the NVO effort.
We have a lot of work to do, said Caltechs Roy
Williams, the NVO system architect. Astronomers must be
able to publish data resources, making their own archival data
available, and to discover resources through subject, keyword,
and other searches. When a data resource is found, it must be
usableit must comply with interoperability standards so
data services can plug and play without detailed technical
assistance from humans. The data must be understandable, with
data quality, provenance, and rights management information correctly
attached and presented. And when an astronomer has established
a processing pipeline, it should easily scale to very large quantities
of data and very large numbers of data services. Achieving these
goals is not just a technical challenge but also a social challenge.
THE
UNIVERSE ON THE GRID
NVO will use computational
techniques known as grid computing, in which processing power
and information holdings are distributed among computers on a
nationwide high-speed computer network. Grid computing enables
scientists to share data and access problem-solving resources,
and is a major technology development effort sponsored by NSF
in programs such as NPACI. The NVO framework must support computation
that is scalable from analyzing data sets on desktop workstations
to mining massive data collections on supercomputers. One key
to this is SDSCs Storage Resource Broker (SRB), which provides
transparent access to data stored in network-accessible data servers
and archives.
The NVO team will promote widely accepted astronomical metadata
standards, and build on the data grids of the sort pioneered by
the NPACI Data Intensive Computing thrust, said Reagan Moore,
SDSC distinguished scientist and adjunct professor in UCSDs
Computer Science and Engineering Department. NVO will focus
on the ability to support bulk manipulation of catalog records
or sky survey images, to enable statistical analyses across multiple
collections.
The NVO framework must support computation that is scalable, from
analyzing data sets on desktop computers to mining massive data
collections on supercomputers. One key to this is the SRB, which
provides access to data stored in network-accessible data servers
and archives. Research will be supported through the use of a
metadata catalog with standard access protocols.
Grid services for computational analysis will be based on the
Globus toolkit. This will ensure NVOs compatibility with
existing grid resources, notably those being deployed by the PACI
TeraGrid, the Grid Physics Network project, and NASAs Information
Power Grid.
An NVO Testbed will evaluate tools and standards and prototype
science applications with real astronomy data. The testbed will
use major NSF and NASA computing resources, including the PACI
TeraGrid processors and archives.
USER-FRIENDLY
INTERFACE
A major goal
for the NVO is to provide a window on the universe for students,
teachers, backyard astronomers, and the interested public,
said Bob Hanisch, NVO project manager at the Space Telescope Science
Institute. The NVO will enable the public to explore directly
the wealth of information from societys investment in our
national research facilities.
The NVO will
deliver content via the Internet to a wide range of educational
projects, from K-12 through university courses. Museums, science
centers, and planetariums will access it, in collaboration with
professional organizations such as the Association of Science-Technology
Centers, the Museum Computer Network, and the International Planetarium
Society. There will be a public/educator portal to the NVO, Encyclopedia
Galactica, and an intuitive user-friendly interface for novice
and advanced users. An online resource of NVO-derived astronomical
images will be created for the news media, to bring discoveries
to the attention of the journalists and the public.
SCIENTIFIC
IMPACT
Large multispectral
surveys tend to turn up exotic objects that dont fit normal
classifications. For example, recent surveys that identified unusually
red objects have led to the discovery of brown dwarfs and very
distant quasars. Pattern-matching software analyzing millions
or even billions of objects in survey images will enable astronomers
to find anomalies that can only be identified as statistically
unusual when catalogs at different wavelengths or observations
from different times are compared (Figure 1). Possible discoveries
include dark matter constituents of our galaxy, transits
of stars by their planets, and unknown objects (Figure 2).
All of the major astronomy data archives in the United States
have agreed to participate in the NVO, and links are being created
to similar initiatives in nations around the world. The NVO will
include the deepest x-ray data ever obtained (by the Chandra X-ray
Observatory), the most extensive optical survey and redshift catalog
(the Sloan Digital Sky Survey), the highest-resolution optical
observations (the Hubble Space Telescope), the largest infrared
archives (the Two Micron All Sky Survey and eventually the Space
Infrared Telescope Facility surveys), high-resolution spectra
from the European Southern Observatorys Very Large Telescope,
the Gemini telescopes, and radio data from the Very Large Array.
The NVO will make possible truly significant interactions between
large data sets and equally large-scale theoretical simulations
of astrophysical systems. Detailed simulations already produce
enormous data sets, comparable in size with those of observational
surveys, that can be mined for valuable information.
The NVO team represents an unusually wide range of astronomical
and computer expertise. It includes astronomers and astrophysicists
from leading astronomical institutions who will ensure that the
NVO meets the needs of working astronomers. This project
will reach across the astronomical community, Szalay said.
The number of people interested has been growing exponentially,
and I think this is likely to change astronomy as we know it.
MG
|
PROJECT LEADERS
Alexander Szalay
Johns Hopkins University
Paul Messina
Caltech
PARTICIPANTS
Kirk Borne
Astronomical Data Center/Raytheon
Thomas Prince, Roy Williams
Caltech
Andrew Moore
Carnegie Mellon University
Stephen Kent
Fermilab
Alyssa Goodman
Harvard University
Jim Gray
Microsoft Research
Tom McGlynn, Nicholas White
NASA Goddard Space Flight Center
George Helou, Carol Lonsdale
NASA IPAC at Caltech
David De Young
NOAO
Tim Cornwell
NRAO
Reagan Moore
SDSC
Giuseppina Fabbiano
Smithsonian Astrophysical Observatory
Robert Hanisch, Ethan Schreier
STScI
Jeff Pier
U.S. Naval Observatory, Flagstaff Station
Ray Plante
University of Illinois, Urbana-Champaign
Charles Alcock
University of Pennsylvania
Carl Kesselman
University of Southern California
Miron Livny
University of Wisconsin, Madison
us-vo.org
|