| THE CROSS-IDENTIFICATION
CHALLENGE
TOWARD
A VIRTUAL OBSERVATORY
stronomers
are receiving a flood of information about the universe from spacecraft
and ground-based observatories. The last decade's series of comprehensive
digital sky surveys-compiled using the latest generation of telescopes
and electronic detectors for x-rays, ultraviolet, visible light,
infrared, and radio-frequency radiation, and amenable to computer
analysis-are an unprecedented new resource for researchers, far
more useful than the photographic atlases and printed catalogs
of previous generations. But are important discoveries being overlooked
because each survey, taken in a limited region of the electromagnetic
spectrum, gives only part of the picture? The Digital Sky project
is federating individual sky surveys into a comprehensive digital
archive of the entire sky. NPACI researchers' latest efforts are
implementing new types of correlations between surveys, which
not only will enable astronomers to mine the data in search of
specific types of objects, but also will facilitate discoveries
of new and unexpected classes of astronomical phenomena.
|
|

Figure 1.
One Galaxy,
Three Wavelengths
Various
catalogs differ in resolution, accuracy, and limiting magnitude.
These images centered on the galaxy UGC 00480 were taken in
blue light (POSS-II, top), near-infrared (2MASS, middle),
and radio (NVSS, bottom) wavelengths. |
 |
|
|
Among the surveys being affiliated in Digital
Sky are the visible light Digital Palomar Observatory Sky Survey
(DPOSS), the near-infrared 2-Micron All Sky Survey (2MASS), the
NRAO VLA Sky Survey (NVSS) and VLA FIRST radio surveys, and the
ROSAT faint x-ray source catalog. Digital Sky is working closely
with the Infrared Science Archive (IRSA), which has supplied archiving
services and expertise. Work is proceeding on several fronts. At Caltech's
Infrared Processing and Analysis Center (IPAC), John Good is leading
an effort to mosaic 2MASS images. Reagan Moore's DICE group at
SDSC is working with the 2MASS and IRSA projects to repackage
more than 10 terabytes of high-resolution 2MASS data into SDSC
Storage Resource Broker (SRB) "containers" to facilitate information
retrieval from SDSC's and Caltech's HPSS mass storage systems.
Roy Williams at Caltech's Center for Advanced Computing Research
is adapting these images, along with classical star maps, to the
Virtual Sky educational resource. And Caltech researcher Robert
Brunner is leading an effort to apply new cross-correlations to
the federated catalogs.
Top
| Contents | Next THE CROSS-IDENTIFICATION
CHALLENGE "Cross-identification of survey catalogs involves
billions of sources over large areas of the sky," said Brunner.
"The challenge is compounded by the fact that various surveys
have intrinsically different resolutions, coordinate systems,
and data representations that must be reconciled. We'll also need
to identify sources that change over time-transient events, intrinsically
variable sources, and moving objects-and this eventually will
increase the amount of data from terabytes to petabytes." Until recently, sky surveys have been federated
only by spatial proximity-items in different catalogs were associated
by the fact that they occupied the same location in the sky (Figure
1). Digital Sky researchers, building upon earlier work done at
IRSA, are now developing a more powerful approach that will utilize
all available data to associate catalogued objects using a Bayesian
approach to determine statistical probabilities of association. "Partly because the individual catalogs differ
in spatial resolution, calibration accuracy, and limiting magnitude,"
Brunner explained, "to associate objects in the multi-wavelength
federated archive, we will use a priori astrophysical knowledge
and secondary parameters such as redshift, colors, or variability,
in addition to location on the sky. We plan to produce a data
federation toolkit that will enable end users to rapidly cross-identify
both image and catalog datasets using either pre-defined association
rules or their own custom-defined rules. We will publicly release
our databases and tools as soon as they are scientifically verified." Large numbers of objects with similar characteristics-normal
stars such as the Sun, for example-form "data clouds" when their
parameters are plotted along multiple axes of a graph. Data points
in dense clouds most likely represent well-understood phenomena.
But outside these data clouds, sparse groups of points and even
isolated data points indicate rare or unusual objects-perhaps
of unknown types. "Opportunities for new discoveries come from
mining the data for these anomalous nuggets," Brunner said. "On
the other hand, anomalous points may indicate catalog errors or
equipment malfunctions, which we also need to know about." "We plan to use our multi-wavelength statistical
approach to investigate several 'hot' astrophysical problems,"
Brunner said. "What is the relationship between quasars and large-scale
structure, and how does it evolve with redshift? How do active
galaxies form and evolve? What is the history of star formation
in galaxies?" Top
| Contents | Next TOWARD A VIRTUAL
OBSERVATORY "A long-term goal is to grow and evolve the
Digital Sky project into a virtual observatory, a comprehensive
resource for all astronomers," Brunner said. Such a virtual observatory
would be accessible to researchers on the grid, and not only would
encompass current sky surveys and catalogs, but also would be
able to federate new information. Users will need analysis and
information discovery tools to access the federated catalogs.
"We expect that both the Digital Sky metadata
catalog and our data federation toolset will become cornerstones
of a future National Virtual Observatory," Brunner said. -MG
 www.cacr.caltech.edu/SDA/digital_sky.html
|
PROJECT LEADER
Tom
Prince
Caltech PARTICIPANTS
Bruce
Berriman,
Robert J. Brunner,
S. George Djorgovski,
John Good,
Tom Handley,
Wen-Piao Lee,
Carol Lonsdale,
Jin Ma,
Barry Madore,
Ashish Mahabal,
Roy Williams
Caltech Alexander S. Szalay
Johns Hopkins University Sheau-Yen Chen,
George Kremenek,
Reagan Moore,
Arcot Rajasekar,
Michael Wan
SDSC
|