pair of automated telescopes has scanned the entire sky in infrared
wavelengths near 2 micro- meters (Ám), and millions of high-resolution
digital images are now available on the Web. The Two Micron All
Sky Survey (2MASS) penetrates the dust clouds of the Milky Way
Galaxy to reveal a vast number of objects that cant be seen
in visible light. The 2MASS Image Retrieval Service from Caltechs
Infrared Processing and Analysis Center (IPAC) and SDSC provide
access to about half of the 10-terabyte database, and research-quality
data products for the entire sky will be available by late 2002.
Flame Nebula, NGC 2024, is about 1,500 light-years away
in the constellation Orion. Many of the stars embedded in
the nebula are surrounded by disks of dust and gas, which
may indicate newborn planetary systems. Near-infrared mosaic
by E. Kopan and R. Hurt (IPAC).
"The IRSA Web-based
data retrieval service was released publicly in April 2001,"
said Bruce Berriman, manager of the NPACI Digital Sky project
and of IPACs Infrared Science Archive (IRSA). "It successfully
processes more than 2,300 data requests per week on average, and
returns approximately 5 gigabytes of data per week to astronomers."
in a lossy, compressed formattogether with catalog, spatial
cross-comparison, and statistical serviceshave been available
for several years through IRSA," said Thomas Handley, former
manager of the NPACI Digital Sky project at IPAC. "But lossy
images arent useful if one wants to make quantitative measurements.
The new service provides fast, easy access to 2MASSs full,
uncompressed astronomical images."
With support from NPACI,
the Digital Sky project is developing tools to federate data in
large astronomical surveyslinking them, but maintaining
separate identities. The goal is to make the data available to
scientists and to correlate stars, galaxies, and other objects
among the surveys, enabling researchers to perform analyses and
comparisons never before possible. The new information service
represents a major step toward the National Virtual Observatory,
a National Science Foundation (NSF) initiative that ultimately
will federate all the available sky catalogs and data archives.
2MASS Atlas image mosaic by S. Van Dyk (IPAC) covers an
area slightly less than 16 arcminutes on a side. The Gum
29 nebula glows from ionized hydrogen and other gases, energized
by radiation from hot, young stars in Westerlund 2, the
cluster near the center of the region. Dust in the nebula
also reflects starlight.
The 2MASS survey is
one of the primary data sources for Digital Sky. Funded by NASA
and NSF, 2MASS is a collaboration between the University of Massachusetts
and IPAC at Caltech. The University of Massachusetts performs
overall project management of the automated observing systems
at Mt. Hopkins, Arizona, and Cerro Tololo, Chile. IPAC processes
the 25 terabytes of raw digital data, producing a digital atlas
of the entire sky containing more than 4 million images, plus
extracted catalogs of more than 350 million stars and 2 million
galaxies. IPAC is a NASA data center and is responsible for archiving
the image files and catalogs and distributing them to the astronomical
community as well as the general public.
have completed the observational phase of the survey," said
2MASS project scientist Roc Cutri, "and data products covering
47 percent of the sky have been released to the public. All of
the data are about to undergo a complete reprocessing. The final
catalogs and the complete Atlas will be released at the end of
In good atmospheric
conditions, a ground-based telescope can resolve stars separated
by 1 arcsecond. The entire sky covers roughly 40,000 square degrees,
so at 60 arcminutes per degree and 60 arcseconds per arcminute,
a composite image of the entire observable sky would require approximately
500 billion pixels. Since each point in the sky can be observed
at many different wavelengths of the electromagnetic spectrumincluding
the optical, infrared, ultraviolet, x-ray, and radio regionsa
true multiwavelength catalog requires dozens of terabytes of data.
Even though supercomputers can store and process such vast amounts
of information, efficiently managing it is a technically difficult
The Infrared Sky
equal-area projection of the three-color composite JHKs
source count map of the entire sky shows 95,851,173 stars
of magnitude 13.5 or brighter in Ks (2.2 Ám) infrared light.
The Milky Way Galaxy, laced with dark dust lanes and clouds,
and its bright core are at center. The Large and Small Magellanic
Clouds, nearby satellite galaxies of the Milky Way, look
like smudges below the galactic plane. The source generation
was performed by M.F. Skrutskie (UMass), the flux maps were
compiled by J.M. Carpenter (Caltech), and the color composite
was assembled by R. Hurt (IPAC/Caltech).
SDSC has brought the
2MASS image collection on line. The images (Figures 1-3) have
been sorted into a spatial order based on contiguous regions of
the sky. When a user of the retrieval system requests an area
of sky, the system provides the pixel information as a single
file in a standard image format. "If you just want to view
large color images of the sky taken by 2MASS, you should take
a look at the 2MASS image gallery," Berriman said. "It
contains JPEG images for the general public, and most of them
The new, uncompressed
image retrieval service is for specialistsproducing very
large files in FITS format, which is the standard file format
used by astronomers that some desktop computers may not be designed
to handle. However, these are the images that professional astronomers
can use in quantitative photometric analyses.
In addition, the image
retrieval service provides a QuickLook service that returns 20:1
compressed image files, which are useful for visual overviews
and finder charts. The image retrieval service offers three modes
of accessby the name or sky coordinates of a specific object,
by entering multiple objects or positions in a table, or by specifying
the date, hemisphere, and scan numbers of specific images as recorded
in the original survey. Images for the near-infrared bands J (1.2
Ám), H (1.6 Ám), and Ks (2.2 Ám) are available. The total file
size for each service run is limited to 100 megabytes, corresponding
to 60 uncompressed 2MASS Atlas images or 600 compressed images.
"The 2MASS image
collection is an excellent example of the use of NPACI technology
to create an important new data resource for a scientific community,"
said Reagan Moore, associate director for Data-Intensive Computing
at SDSC. "The collection combines tools to manipulate images
with digital library front ends to support queries, data handling
systems for accessing remote storage systems, and NPACI archives
for collection storage."
The system integrates
a digital library front end to the digital archives through use
of the SDSC Storage Resource Broker (SRB) data handler. In processing,
the images were read from DLT tape cartridges at IPAC and moved
over the high-speed CalREN2 network to SDSC for "ingestion"
into an archive. The primary archive is now stored on the High-Performance
Storage System (HPSS) at SDSC. An alternate copy is being created
on a HPSS at Caltechs Center for Advanced Computing Research
images at Caltech will give us mirroring capability," Handley
said. "This will make the service available all the time
from one site or the other, even during unplanned outages and
scheduled downtime for maintenance. This must operate 24-by-7."
In addition to the
2MASS survey, the Digital Sky project also will utilize data from
other archives. The National Radio Astronomy Observatory (NRAO)
Very Large Array Sky Survey (NVSS) and Faint Images of the Radio
Sky at Twenty Centimeters (FIRST) are two surveys at radio wavelengths
conducted by the NRAO. The Digital Palomar Observatory Sky Survey
(DPOSS) is a Caltech project to digitize visible-light images
on high-precision photographic plates taken by a recent re-creation
of the original Palomar survey.
The surveys are being
federated in the Digital Sky system rather than consolidated into
a single enormous database. The originating institution acts as
the data curator for the primary database. However, both DPOSS
and 2MASS would use the HPSS systems at SDSC and Caltech to maintain
their data. This would allow Digital Sky researchers to capitalize
on the NPACI infrastructure to rapidly migrate between high-performance
storage and compute servers.
The Digital Sky project
also has developed software to correlate objects across several
different sky surveys, combining information from multiple regions
of the spectrum and giving insights into the nature of cosmic
phenomena that observations at a single wavelength cannot reveal.
In addition, correlated catalogs can be queried to identify objects
with similar characteristics, and can make analyses and data mining
much more feasible.
By making the data
in these surveys available to scientists and by providing the
means for correlation, analysis, and data mining, Digital Sky
and its tools are expected to revolutionize multiwavelength astronomical
studies. The Digital Sky project not only is giving scientists
"armchair access to the universe," but it also may reveal
many insights and discoveries through statistical correlations
that would escape studies based on individual objects. MG
Caltech, Digital Sky
Tom Jarrett J. Davy Kirkpatrick
Schuyler Van Dyk