Skip to content

News Center

Home > News Center > Publications > EnVision


EARTH SCIENCE | Contents | Next

The Heavens Revealed in 10 Terabytes


matched pair of automated telescopes has scanned the entire sky in infrared wavelengths near 2 micro- meters (Ám), and millions of high-resolution digital images are now available on the Web. The Two Micron All Sky Survey (2MASS) penetrates the dust clouds of the Milky Way Galaxy to reveal a vast number of objects that can’t be seen in visible light. The 2MASS Image Retrieval Service from Caltech’s Infrared Processing and Analysis Center (IPAC) and SDSC provide access to about half of the 10-terabyte database, and research-quality data products for the entire sky will be available by late 2002.

Image of the Flame Nebula, NGC 2024

Figure 1. Emission Nebula

The Flame Nebula, NGC 2024, is about 1,500 light-years away in the constellation Orion. Many of the stars embedded in the nebula are surrounded by disks of dust and gas, which may indicate newborn planetary systems. Near-infrared mosaic by E. Kopan and R. Hurt (IPAC).

"The IRSA Web-based data retrieval service was released publicly in April 2001," said Bruce Berriman, manager of the NPACI Digital Sky project and of IPAC’s Infrared Science Archive (IRSA). "It successfully processes more than 2,300 data requests per week on average, and returns approximately 5 gigabytes of data per week to astronomers."

"2MASS images in a lossy, compressed format–together with catalog, spatial cross-comparison, and statistical services–have been available for several years through IRSA," said Thomas Handley, former manager of the NPACI Digital Sky project at IPAC. "But lossy images aren’t useful if one wants to make quantitative measurements. The new service provides fast, easy access to 2MASS’s full, uncompressed astronomical images."

With support from NPACI, the Digital Sky project is developing tools to federate data in large astronomical surveys–linking them, but maintaining separate identities. The goal is to make the data available to scientists and to correlate stars, galaxies, and other objects among the surveys, enabling researchers to perform analyses and comparisons never before possible. The new information service represents a major step toward the National Virtual Observatory, a National Science Foundation (NSF) initiative that ultimately will federate all the available sky catalogs and data archives.

Image of Gum 29 Nebula

Figure 2. Gum 29

This 2MASS Atlas image mosaic by S. Van Dyk (IPAC) covers an area slightly less than 16 arcminutes on a side. The Gum 29 nebula glows from ionized hydrogen and other gases, energized by radiation from hot, young stars in Westerlund 2, the cluster near the center of the region. Dust in the nebula also reflects starlight.

The 2MASS survey is one of the primary data sources for Digital Sky. Funded by NASA and NSF, 2MASS is a collaboration between the University of Massachusetts and IPAC at Caltech. The University of Massachusetts performs overall project management of the automated observing systems at Mt. Hopkins, Arizona, and Cerro Tololo, Chile. IPAC processes the 25 terabytes of raw digital data, producing a digital atlas of the entire sky containing more than 4 million images, plus extracted catalogs of more than 350 million stars and 2 million galaxies. IPAC is a NASA data center and is responsible for archiving the image files and catalogs and distributing them to the astronomical community as well as the general public.

"The telescopes have completed the observational phase of the survey," said 2MASS project scientist Roc Cutri, "and data products covering 47 percent of the sky have been released to the public. All of the data are about to undergo a complete reprocessing. The final catalogs and the complete Atlas will be released at the end of 2002."

In good atmospheric conditions, a ground-based telescope can resolve stars separated by 1 arcsecond. The entire sky covers roughly 40,000 square degrees, so at 60 arcminutes per degree and 60 arcseconds per arcminute, a composite image of the entire observable sky would require approximately 500 billion pixels. Since each point in the sky can be observed at many different wavelengths of the electromagnetic spectrum–including the optical, infrared, ultraviolet, x-ray, and radio regions–a true multiwavelength catalog requires dozens of terabytes of data. Even though supercomputers can store and process such vast amounts of information, efficiently managing it is a technically difficult problem.

Image of Milky Way Galaxy,

Figure 3. The Infrared Sky

This equal-area projection of the three-color composite JHKs source count map of the entire sky shows 95,851,173 stars of magnitude 13.5 or brighter in Ks (2.2 Ám) infrared light. The Milky Way Galaxy, laced with dark dust lanes and clouds, and its bright core are at center. The Large and Small Magellanic Clouds, nearby satellite galaxies of the Milky Way, look like smudges below the galactic plane. The source generation was performed by M.F. Skrutskie (UMass), the flux maps were compiled by J.M. Carpenter (Caltech), and the color composite was assembled by R. Hurt (IPAC/Caltech).


SDSC has brought the 2MASS image collection on line. The images (Figures 1-3) have been sorted into a spatial order based on contiguous regions of the sky. When a user of the retrieval system requests an area of sky, the system provides the pixel information as a single file in a standard image format. "If you just want to view large color images of the sky taken by 2MASS, you should take a look at the 2MASS image gallery," Berriman said. "It contains JPEG images for the general public, and most of them are spectacular."

The new, uncompressed image retrieval service is for specialists–producing very large files in FITS format, which is the standard file format used by astronomers that some desktop computers may not be designed to handle. However, these are the images that professional astronomers can use in quantitative photometric analyses.

In addition, the image retrieval service provides a QuickLook service that returns 20:1 compressed image files, which are useful for visual overviews and finder charts. The image retrieval service offers three modes of access–by the name or sky coordinates of a specific object, by entering multiple objects or positions in a table, or by specifying the date, hemisphere, and scan numbers of specific images as recorded in the original survey. Images for the near-infrared bands J (1.2 Ám), H (1.6 Ám), and Ks (2.2 Ám) are available. The total file size for each service run is limited to 100 megabytes, corresponding to 60 uncompressed 2MASS Atlas images or 600 compressed images.

"The 2MASS image collection is an excellent example of the use of NPACI technology to create an important new data resource for a scientific community," said Reagan Moore, associate director for Data-Intensive Computing at SDSC. "The collection combines tools to manipulate images with digital library front ends to support queries, data handling systems for accessing remote storage systems, and NPACI archives for collection storage."


The system integrates a digital library front end to the digital archives through use of the SDSC Storage Resource Broker (SRB) data handler. In processing, the images were read from DLT tape cartridges at IPAC and moved over the high-speed CalREN2 network to SDSC for "ingestion" into an archive. The primary archive is now stored on the High-Performance Storage System (HPSS) at SDSC. An alternate copy is being created on a HPSS at Caltech’s Center for Advanced Computing Research (CACR).

"Duplicating the images at Caltech will give us mirroring capability," Handley said. "This will make the service available all the time from one site or the other, even during unplanned outages and scheduled downtime for maintenance. This must operate 24-by-7."

In addition to the 2MASS survey, the Digital Sky project also will utilize data from other archives. The National Radio Astronomy Observatory (NRAO) Very Large Array Sky Survey (NVSS) and Faint Images of the Radio Sky at Twenty Centimeters (FIRST) are two surveys at radio wavelengths conducted by the NRAO. The Digital Palomar Observatory Sky Survey (DPOSS) is a Caltech project to digitize visible-light images on high-precision photographic plates taken by a recent re-creation of the original Palomar survey.

The surveys are being federated in the Digital Sky system rather than consolidated into a single enormous database. The originating institution acts as the data curator for the primary database. However, both DPOSS and 2MASS would use the HPSS systems at SDSC and Caltech to maintain their data. This would allow Digital Sky researchers to capitalize on the NPACI infrastructure to rapidly migrate between high-performance storage and compute servers.

The Digital Sky project also has developed software to correlate objects across several different sky surveys, combining information from multiple regions of the spectrum and giving insights into the nature of cosmic phenomena that observations at a single wavelength cannot reveal. In addition, correlated catalogs can be queried to identify objects with similar characteristics, and can make analyses and data mining much more feasible.

By making the data in these surveys available to scientists and by providing the means for correlation, analysis, and data mining, Digital Sky and its tools are expected to revolutionize multiwavelength astronomical studies. The Digital Sky project not only is giving scientists "armchair access to the universe," but it also may reveal many insights and discoveries through statistical correlations that would escape studies based on individual objects. –MG


Thomas Prince
Caltech, Digital Sky

Bruce Berriman
Caltech, IPAC

Ron Beck
Laurent Cambresy
Tom Chester
Nian-Ming Chin
Roc Cutri
Diane Engler
Tracey Evans
John Fowler
John Gizis
Thomas Handley
Robert Hurt
Helene Huynh
Tom Jarrett J. Davy Kirkpatrick
Eugene Kopan
Wen-Piao Lee
Ken Marsh
Howard McCallon
Brant Nelson
Jeonghee Rho
Raymond Tam
Schuyler Van Dyk
William Wheaton
Sherry Wheelock
Cong Xu
Caltech, IPAC

Roy Williams
Caltech, CACR

George Krememek
Reagan Moore
Arcot Rajasekar
Michael Wan