Skip to content


    Skip to content

     

    DATA-INTENSIVE COMPUTING ENVIRONMENTS | Contents | Next

    Heavens on Earth: The Digital Sky

    PROJECT LEADER
    Tom Prince, Robert Brunner, Thomas Handley, Caltech

    PARTICIPANTS
    George Djorgovski,
    John Good, Jin Ma,
    Robert Rutledge, Roy Williams
    Caltech

    A hundred years before the invention of the telescope, Copernicus observed the night sky with his naked eye and catalogued thousands of objects gleaming in the heavens. A century later, the telescope brought tens of thousands, and then millions, into focus. Today, astronomers can detect billions of objects in the cosmos with optical, infrared, and radio telescopes. Providing access to these data and tools to analyze them, is the Digital Sky project's objective. Soon, tens of terabytes, and ultimately much more, of image and catalog data will be available to astronomers. They'll use an online interface that will access remote databases, compute on high-performance resources, and package results in any of a large selection of formats--a process that researcher Roy Williams likens to custom gift wrapping.

    "Imagine electronic commerce--where you never see the warehouses, you just receive the items you've requested--and you've got an idea of how our concept of a 'virtual observatory' would work," Williams says.

    MANY SURVEYS, ONE SKY

    MAXIMIZING SCIENTIFIC POTENTIAL

    A Globular Cluster False-Color Image
    Figure 1. A Globular Cluster False-Color Image
    This image from the DPOSS archive shows a globular cluster--a gravitationally bound group of stars. Such clusters provide a minimum age for the universe, as well as an important constraint on theories of how galaxies and clusters of galaxies are formed. Image by Thomas Handley, Caltech.

     


    MANY SURVEYS, ONE SKY

    The Digital Sky project, led by Tom Prince of Caltech's Center for Advanced Computing Research, initially makes use of data from four sky surveys. "These contain high-resolution image data and the catalog data that are essential to computational analysis," says Robert Brunner, a Digital Sky researcher.

    Brunner is also involved with one of the initial surveys, the Digital Palomar Observatory Sky Survey (DPOSS)--an optical wavelength project at Caltech. The others are the 2 Micron All Sky Survey (2MASS), a joint project of the University of Massachusetts and the NASA-funded Infrared Processing and Analysis Center at Caltech, and the NRAO VLA Sky Survey (NVSS) and Faint Images of the Radio Sky at 20 cm (FIRST)--two radio-wavelength surveys of the National Radio Astronomy Observatory (NRAO). With the exception of NVSS, which is complete, each of these surveys is still actively collecting data, and Digital Sky will come to full fruition when collection has been finished.

    Cumulatively, their data are expected to yield more than one billion objects in the sky that emit detectable energy or light. Because of the dynamic nature of the information, the surveys are federated in the Digital Sky framework, not consolidated. For example, 2MASS takes in 20 gigabytes of new data per night from telescopes in Arizona and Chile. Data curation is retained by the originating institution. Both DPOSS and 2MASS use the HPSS-managed mass storage systems at SDSC and Caltech to maintain their data, which allows Digital Sky researchers to capitalize on the NPACI infrastructure for rapid data migration between high-performance storage and compute servers.

    Current project efforts are focused on implementing standards across these disparate databases, so that they can be made seamlessly accessible over the Web. "We're defining the different services that a Digital Sky should provide, and how these services should interact with the different archives," Brunner says. "We're also formulating use-case scenarios so that we can define common client interfaces." A major issue, for example, is making data from the surveys interoperable although they use different commercial database management systems--relational, object-oriented, and hybrid. The envisioned client will interact with a distributed, Web-based fabric of data, computation, and catalog services.

    Once standards have been determined, Williams anticipates that other surveys will be able to implement them on their own servers to make their data accessible through the Digital Sky. "This is how the virtual observatory will be constructed," Williams says.

    Top | Contents | Next

    MAXIMIZING SCIENTIFIC POTENTIAL

    The opportunity to conduct multi-wavelength studies across the Digital Sky datasets will significantly change the way astronomical research is conducted. "Astronomers tend to focus on a single class of objects," Brunner says. "Digital Sky will allow researchers to branch out into new areas or approach their current interests with a statistically more complete sample."

    Williams offers identification of star and galaxy clusters as an example of what Digital Sky may support in the future. "It's very difficult to locate a cluster in multidimensional space," he says. "Database technology and statistical analysis help us classify stars and galaxies, improving our understanding of the known and making the unknown stand out more clearly."

    Initial explorations of digitized plates obtained from DPOSS (Figure 1) have already yielded discovery of a number of high-redshift quasars. "This is an example of the type of benefit that Digital Sky may bestow upon the astronomy community," Prince says. "Having complete wavelength datasets available together online, combined with the terabyte storage and teraflops compute capabilities of the NPACI facilities, will increase the scientific potential of these data by orders of magnitude." --AF *

    Top | Contents | Next