Skip to content

News Center

Home > News Center > Publications > EnVision


NVO | Contents | Next

Arriving Online: the National Virtual Observatory


he National Science Foundation has awarded $10 million to astronomers at 17 research institutions to create a huge, expandable database of astronomical images, catalogs, measurements, and scientific publications called the National Virtual Observatory (NVO). The five-year Information Technology Research grant will enable researchers to develop the means to use the latest computer technologies and data storage and analysis techniques to unite more than 100 terabytes of data collected from 50 ground- and space-based telescopes and instruments. NVO will provide images, catalogs, observational measurements, and scientific papers—available in an easily accessible form with a consistent interface to professional researchers, amateur astronomers, and students. Using powerful grids of computers, scientists will be able to cross-correlate observations among catalogs to make discoveries that would otherwise be impossible to make.

Multiwavelength Data Federation
A composite image of the supernova remnant E0102-72 in the Small Magellanic Cloud, a satellite galaxy of the Milky Way. The image was created from three data sources: x-ray (blue), optical (green), and radio (red).

Raider of the Found Arcs
This Hubble Space Telescope image of the galaxy cluster A2218 was analyzed by Alexander Szalay’s prototype of an automated discovery tool to find gravitational lens arcs. Similar automated procedures could process National Virtual Observatory images to discover large numbers of new gravitational lenses.

The co-principal investigators for the NVO project team are Alexander Szalay, an astronomer at Johns Hopkins University, and Paul Messina, a computational scientist at Caltech. Szalay has pioneered the use of advanced computing techniques on large astronomical databases, and Messina, Chief Architect of NPACI, has led several multidisciplinary research projects.

Szalay believes the NVO represents a new, third approach to scientific research. “First, you have science conducted through theoretical models,” he said. “Next, you have science tested through experiments. The new approach, scientific exploration through computational methods, is developing in response to the tremendous volumes of data we’re starting to gather in many of the sciences.”


Thanks to recent breakthroughs in telescope, detector, and computer technology, astronomical surveys in gamma ray, and x-ray, optical, infrared, and radio wavelengths produce terabytes of images and catalog data per year—more information annually than the entire body of astronomical data before 1980. The trend is accelerating; for example, the planned Large Synoptic Survey Telescope will produce over 10 petabytes per year by 2008.
“We now have to deal with an ever-increasing flood of information,” said David De Young, NVO project scientist. “Without the tools and structure that will be provided through the NVO, finding a particular piece of data would be like trying to find a contact lens in a swimming pool.”

Challenges lead to opportunities. Advances in information technology offer solutions for data mining, for sophisticated pattern recognition, and for large-scale statistical cross-correlations.

In 1999 the National Academy of Sciences recommended establishment of a National Virtual Observatory as a “Rosetta stone” to link the major astronomical data archives and catalogs and to support cross-correlation among these resources. The idea was promoted by the Digital Sky project, an NSF-funded NPACI effort led by Caltech researcher Tom Prince to federate data from four different astronomical databases and make the information available through a Web portal.

The NVO project will federate—link without consolidating—more than 100 terabytes of astronomical data from more than 50 collections. Organizers are planning to keep the NVO “virtual,” not located in any single facility.
A key challenge will be to develop ways of analyzing data from several databases simultaneously. “Each of those databases is organized differently, which makes it quite difficult to perform analyses of data from several collections simultaneously,” Messina explained. “But such investigations promise to yield important scientific discoveries, so enhancing our ability to do these analyses is an important part of the NVO effort.”

“We have a lot of work to do,” said Caltech’s Roy Williams, the NVO system architect. “Astronomers must be able to publish data resources, making their own archival data available, and to discover resources through subject, keyword, and other searches. When a data resource is found, it must be usable—it must comply with interoperability standards so data services can ‘plug and play’ without detailed technical assistance from humans. The data must be understandable, with data quality, provenance, and rights management information correctly attached and presented. And when an astronomer has established a processing pipeline, it should easily scale to very large quantities of data and very large numbers of data services. Achieving these goals is not just a technical challenge but also a social challenge.”


NVO will use computational techniques known as grid computing, in which processing power and information holdings are distributed among computers on a nationwide high-speed computer network. Grid computing enables scientists to share data and access problem-solving resources, and is a major technology development effort sponsored by NSF in programs such as NPACI. The NVO framework must support computation that is scalable from analyzing data sets on desktop workstations to mining massive data collections on supercomputers. One key to this is SDSC’s Storage Resource Broker (SRB), which provides transparent access to data stored in network-accessible data servers and archives.

“The NVO team will promote widely accepted astronomical metadata standards, and build on the data grids of the sort pioneered by the NPACI Data Intensive Computing thrust,” said Reagan Moore, SDSC distinguished scientist and adjunct professor in UCSD’s Computer Science and Engineering Department. “NVO will focus on the ability to support bulk manipulation of catalog records or sky survey images, to enable statistical analyses across multiple collections.”

The NVO framework must support computation that is scalable, from analyzing data sets on desktop computers to mining massive data collections on supercomputers. One key to this is the SRB, which provides access to data stored in network-accessible data servers and archives. Research will be supported through the use of a metadata catalog with standard access protocols.

Grid services for computational analysis will be based on the Globus toolkit. This will ensure NVO’s compatibility with existing grid resources, notably those being deployed by the PACI TeraGrid, the Grid Physics Network project, and NASA’s Information Power Grid.

An NVO Testbed will evaluate tools and standards and prototype science applications with real astronomy data. The testbed will use major NSF and NASA computing resources, including the PACI TeraGrid processors and archives.


“A major goal for the NVO is to provide a window on the universe for students, teachers, backyard astronomers, and the interested public,” said Bob Hanisch, NVO project manager at the Space Telescope Science Institute. “The NVO will enable the public to explore directly the wealth of information from society’s investment in our national research facilities.”

The NVO will deliver content via the Internet to a wide range of educational projects, from K-12 through university courses. Museums, science centers, and planetariums will access it, in collaboration with professional organizations such as the Association of Science-Technology Centers, the Museum Computer Network, and the International Planetarium Society. There will be a public/educator portal to the NVO, Encyclopedia Galactica, and an intuitive user-friendly interface for novice and advanced users. An online resource of NVO-derived astronomical images will be created for the news media, to bring discoveries to the attention of the journalists and the public.


Large multispectral surveys tend to turn up exotic objects that don’t fit normal classifications. For example, recent surveys that identified unusually red objects have led to the discovery of brown dwarfs and very distant quasars. Pattern-matching software analyzing millions or even billions of objects in survey images will enable astronomers to find anomalies that can only be identified as statistically unusual when catalogs at different wavelengths or observations from different times are compared (Figure 1). Possible discoveries include “dark matter” constituents of our galaxy, transits of stars by their planets, and unknown objects (Figure 2).

All of the major astronomy data archives in the United States have agreed to participate in the NVO, and links are being created to similar initiatives in nations around the world. The NVO will include the deepest x-ray data ever obtained (by the Chandra X-ray Observatory), the most extensive optical survey and redshift catalog (the Sloan Digital Sky Survey), the highest-resolution optical observations (the Hubble Space Telescope), the largest infrared archives (the Two Micron All Sky Survey and eventually the Space Infrared Telescope Facility surveys), high-resolution spectra from the European Southern Observatory’s Very Large Telescope, the Gemini telescopes, and radio data from the Very Large Array.

The NVO will make possible truly significant interactions between large data sets and equally large-scale theoretical simulations of astrophysical systems. Detailed simulations already produce enormous data sets, comparable in size with those of observational surveys, that can be mined for valuable information.

The NVO team represents an unusually wide range of astronomical and computer expertise. It includes astronomers and astrophysicists from leading astronomical institutions who will ensure that the NVO meets the needs of working astronomers. “This project will reach across the astronomical community,” Szalay said. “The number of people interested has been growing exponentially, and I think this is likely to change astronomy as we know it.” —MG

Alexander Szalay
Johns Hopkins University

Paul Messina

Kirk Borne
Astronomical Data Center/Raytheon

Thomas Prince, Roy Williams

Andrew Moore
Carnegie Mellon University

Stephen Kent

Alyssa Goodman
Harvard University

Jim Gray
Microsoft Research

Tom McGlynn, Nicholas White
NASA Goddard Space Flight Center

George Helou, Carol Lonsdale
NASA IPAC at Caltech

David De Young

Tim Cornwell

Reagan Moore

Giuseppina Fabbiano
Smithsonian Astrophysical Observatory

Robert Hanisch, Ethan Schreier

Jeff Pier
U.S. Naval Observatory, Flagstaff Station

Ray Plante
University of Illinois, Urbana-Champaign

Charles Alcock
University of Pennsylvania

Carl Kesselman
University of Southern California

Miron Livny
University of Wisconsin, Madison