Skip to content

SDSC Homepage SDSC Homepage San Diego Supercomputer Center Contact EnVision NPACI: A Leading Edge Site San Diego Supercomputer Center
  A Geosciences Network or Understanding the Whole Earth

A Science Environment for Ecological Knowledge

Hot Commodities

How to Run an Antimatter Generator

Modeling the Matrix

Morphin’ Lizards!
SDSC Homepage
SDSC Researcher Likes Augmented Reality View
NPACI Releases Updated Cluster Software
NSF Announces Cyberinfrastructure Initiative
Fran Berman Appointed to UCSD Endowed Chair
Phil Andrews and Jay Boisseau Elected to NPACI Executive Committee
Primate Resource Online at SDSC
SDSC Announces New IBM Data-Oriented Supercomputer

By Paul Tooby

Real-World Testbeds

Concept Spaces
for Smarter Queries

From space, we can recognize that the Earth is a single, unified system. But geoscientists who study the planet are still able to glimpse only partial views through the windows of their separate disciplines. Now, researchers at the San Diego Supercomputer Center (SDSC) are collaborating with scientists who study the solid Earth to build a prototype national Geosciences Network–GEON. "GEON cyberinfrastructure will help weave the separate strands of the solid Earth sciences disciplines and data into a unified fabric," said Chaitan Baru, co-director of SDSC’s Data and Knowledge Systems (DAKS) program, who coordinates information technology in GEON. "This will give the geosciences an ‘IT head start’ for viewing the complex dynamics of the Earth system as an interrelated whole." (See Guest Commentary.)

A five-year multi-institutional National Science Foundation (NSF) large Information Technology Research project, GEON grew out of a workshop at the 1999 Geological Society of America meeting. "We kept rediscovering that we had data from different disciplines that we wanted to integrate, but we weren’t getting anywhere talking with other geologists," said professor Krishna Sinha of Virginia Tech, who coordinates the geosciences component of GEON. The researchers realized they needed to talk to IT experts, and subsequent NSF-sponsored workshops and contacts with SDSC matured into the GEON collaboration.

Project leaders
Chaitan Baru, SDSC
A. Krishna Sinha, Virginia Tech

Ramon Arrowsmith,
Arizona State
Maria Crawford, Bryn Mawr
Mary Marlino, Rajul Pandya, Michael Wright, DLESE
Boyan Brodaric, Geological Survey of Canada
David Archbell, Michael Bailey, Karan Bhatia, John Helly, Jane Kwon, Kai Lin, Bertram Ludäscher, Ashraf Memon, Viswanath Nandigam, Phil Papadopoulos, Dogan Seber, Ilya Zaslavsky, SDSC
Eric Frost,
San Diego State University
Mark Gahegan, Penn State
Karl Flessa, Allister Rees, University of Arizona
Geof Bowker, Yannis Papakonstantinou, David Ribes, UC San Diego
John Oldow,
University of Idaho
Tony Gary, Paul Sikora, University of Utah
Mian Liu, University of Missouri-Columbia
Chuck Meertens, UNAVCO
Tom Gunther, Marc Levine, Mike McCreer, David Soller, Mary Lou Zoback, USGS
Ann Gates, Randy Keller,
Vladik Kreinovich, UTEP

The IT component of GEON includes researchers from SDSC, Penn State University, the Geological Survey of Canada, and San Diego State University. The geoscience research component involves researchers from 10 institutions–Arizona State University, Bryn Mawr College, Cornell University, Rice University, University of Arizona, University of Idaho, University of Missouri-Columbia, University of Texas at El Paso, University of Utah, and UNAVCO, with the Digital Library for Earth Sciences Education (DLESE) coordinating GEON education and outreach.

Major partners in GEON include the U.S. Geological Survey (USGS), which maintains a wide range of Earth system databases. "We’re very interested in working with GEON, both to provide access to major USGS data sets, and to participate in the development and use of the cyberinfrastructure," said Tom Gunther, Enterprise Information Program Coordinator at the USGS. The California Institute for Telecommunications and Information Technologies, Lawrence Livermore National Lab (LLNL), and the Geological Survey of Canada (GSC) are also partners, and ESRI, the major provider of GIS, or geographic information systems software, and Georeference Online are corporate partners.

"New partners continue to join GEON, a sign that the time is right for this project," said Baru. GEON will also collaborate with other major geosciences initiatives such as Earthscope and IRIS, the Incorporated Research Institutions for Seismology.

GEON is being designed as a scientist-centered cyberinfrastructure, freeing researchers to think and be creative by relieving them of onerous data-management tasks. Through a scalable and interoperable network, the project will provide scientists with a growing array of tools they can use without having to be IT experts. These include data integration mechanisms, as well as computational resources and integrated software for analysis, modeling, and visualization. In this way, GEON will bridge traditional disciplines–an indispensable step in understanding the Earth as a unified system.

In addition to benefiting scientific research, GEON will enable policymakers to answer the call for more science-based decision making. Nonspecialists will be able to obtain answers to questions ranging from flood and landslide potential to groundwater problems, volcanic activity, and soil quality, all based on the best data available.

The integration of scientific disciplines is being driven by the rapid progress of information technologies, which are themselves merging into unified grid environments that support integration of data and grid services, or "virtualization," with transparency extending across data location and name, replication, heterogeneous formats, access permissions, and more. And beyond discipline grids is the larger-scale cyberinfrastructure of the NPACI Grid, and the NSF
Extended Terascale Facility (ETF), in which SDSC plays a core role.

"It’s no accident that GEON is called a network," said Dogan Seber, GEON project manager and director of the new Geoinformatics Lab in SDSC’s DAKS program. "This indicates both the collaborative nature of the project and the architecture that connects separate disciplines and sites into a unified environment."

As scientific research grows larger in scale, projects such as GEON must simultaneously address multiple challenges. There are complex, intertwined IT and geosciences questions, which are being explored in testbeds to ensure the systematic growth of the national cyberinfrastructure. Additional challenges include integrating the databases of diverse disciplines spread across many sites, the cultural and social issues in building effective interdisciplinary collaborations, and designing and building the GEON Grid infrastructure.

Real-World Testbeds

View Larger Image

Known terrane boundaries in the mid-Atlantic GEON testbed region are indicated as colored regions. Red lines are mapped faults
in the region.

To demonstrate GEON cyberinfrastructure development on real scientific questions, the researchers have identified two testbed areas, the Mid-Atlantic region of the Eastern U.S., and the Rocky Mountain region.

"The Rockies are in the middle of the North American plate, and we want to understand why there is so much intraplate deformation there, because deformation is usually focused at plate margins," said professor Randy Keller of the University of Texas at El Paso. "Our goals are to use multiple types of data to understand the underlying geological processes in order to find the origin of this tectonic activity."

Integrating diverse data sets will help geoscientists better understand the dynamics of crustal deformation in the Rockies. "We can derive a picture of the deformation from Global Positioning System (GPS) data, from earthquake data and the slip rate of recent faulting, and from geological records in deformed rock formations, each of which is obtained in different ways," said Mian Liu, professor of geological sciences at the University of Missouri-Columbia. "These results won’t always agree because they represent different time scales, and this is where you find new science–for example, we’d like to understand why some places are deforming at a high rate now that didn’t historically."

The mid-Atlantic region reflects a complete Wilson tectonic cycle (the opening and closing of the Atlantic Ocean basin as continental plates rift apart and later rejoin, accompanied by mountain formation). A fundamental question in plate tectonics, the major paradigm of the solid Earth sciences, is to understand the dynamics of how pieces of continents break off and rejoin in the Wilson cycle. To explore this, GEON scientists are focusing on the basic unit of continents–the terrane–a crustal unit that preserves a characteristic geological history, is distinct (in physical, geochemical, and other ways) from adjacent areas, and is usually bounded by faults.

In the mid-Atlantic testbed, the researchers are addressing the basic question of how to characterize and recognize terranes. To determine whether a given fault is a terrane boundary, geoscientists want to ask questions such as: What is the distribution and age of rock types across the fault? What are the geochemical signatures of these rocks? Are different types of ore deposits present across the faults?

GEON will allow researchers to ask these questions by querying databases from different disciplines for characteristics such as rock types and their ages, filtering results for certain properties, representing terranes in 3-D, performing analyses and other tests, and finally, visualizing the results–things they have never been able to do before in an integrated way. These capabilities will both broaden the scope of scientific questions and speed the process greatly, transforming the way science is done.

Databases the researchers plan to integrate through GEON activities range from geological and terrane maps to geochemical, ore deposit, stratigraphic, geophysical, and other data types, giving geoscientists additional "windows" into the Earth’s past and the hidden world beneath our feet.

Connecting the Dots

The power of GEON will come from linking many databases and research tools, but this involves a number of challenges. Differences in physical location as well as hardware and software protocols and permissions are being addressed by such methods as XML-based standards for data exchange and data integration technologies.

Beyond these obvious differences are more subtle discrepancies in the underlying terminology and concepts scientists use that give the data meaning–even data from nearby disciplines can contain "hidden semantics" with differing conventions and terminologies that complicate efforts to integrate the data into a global geosciences environment.

For example, depending on where they were trained and their discipline, different geoscientists may call the same rock type by different names, or two different rock types may have the same name. Both situations introduce serious ambiguities into database integration.

To overcome these problems, the GEON knowledge representation working group is using semantic mediation, which links diverse data sets by reconciling or "mapping across" the differences and idiosyncrasies. Being able to do this relies on concept spaces that describe relevant concepts and their relationships, and ontologies, which can be seen as formal machine-processable counterparts of those concept spaces. "To implement ontologies in GEON, we are harvesting the agreements on names and concepts among geoscientists and putting them in computer-usable form," said Bertram Ludäscher, director of the DAKS Knowledge-based Information Systems lab at SDSC. To make GEON more broadly useful, researchers are also taking care to develop knowledge representation techniques that are compatible with W3C standards for the Semantic Web.

Concept Spaces for Smarter Queries

View larger images

To help geoscientists study the Rocky Mountains, Ludäscher, SDSC postdoctoral research fellow Kai Lin, and GSC researcher Boyan Brodaric have built an ontology-based geological map integration prototype which brings together in a single unified queriable map interface, the geologic maps from nine state geologic surveys in the Rocky Mountain region. Previously, those data sets and maps could only be accessed and queried separately, requiring far more labor and time.

By aligning the database formats, capturing the geological concepts underlying the databases into ontologies whose concepts can be interrelated, and applying semantic mediation techniques, the researchers have made the data uniformly accessible.

"Our integration ontology enables researchers to query the data by five major geological factors–age, composition, rock fabric, texture, and genesis," said Ludäscher. "And we want to provide views of the data based on accepted standards, so that geologists can rely on the answers they get." To do this, the researchers incorporated rock type classifications for the various geological factors, and with the collaboration of Brodaric turned them into ontologies on which the system can operate.

In the architecture, the global ontology covering nine states contains the local ontologies for the rock type data of each state, and reconciles any ambiguities introduced through differences in names and other factors.

GEON researchers will also have to balance the stability of ontologies–which enables the interoperation of disparate data sets–with the need for flexibility to allow ontology evolution. "In building GEON, we are keeping in mind the need to refresh, update, and allow our knowledge and ontologies to evolve," said Mark Gahegan, a computer scientist and professor of geography at Penn State. A related issue is to develop methods to identify data and knowledge that may initially appear anomalous or incorrect, but may contain the seeds for new insights and paradigms.

Pioneering Collaboration

The GEON cyberinfrastructure is part of a broader information-technology-driven integration occurring across all of the sciences. There is a need for human-level integration, because cyberinfrastructure involves more than hardware and software. GEON includes such activities as face-to-face meetings, online collaboration, and overviews of both IT and geosciences components to assist participants from different disciplines in "speaking each other’s languages."

"Modern science is increasingly collaborative," said Alan Blatecky, executive director of SDSC. "Part of the value of emerging cyberinfrastructure is in making this possible, not only through the technical advances of hardware and software integration but just as importantly through helping people work together in large-scale projects across disciplines."

researchers in the emergent discipline of social informatics, Geoffrey Bowker of UC San Diego’s Department of Communication and sociology graduate student David Ribes, are participating in GEON to study the impacts of revolutionary IT advances on scientific research, and the reciprocal impacts of the scientific problems on IT research.

"When information revolutions take place, the scientific questions being posed and the work done can change dramatically," said Bowker. "Beyond the technical aspects, much social, organizational, and cultural work has to be done."

For example, as scientific data sets are made increasingly available online, issues of the timing and manner of data sharing become important. "Hopefully, incentives in the sciences will evolve to match the growing need for data sharing, and to support work that, beyond individual researchers, benefits the whole community," said Virginia Tech’s Sinha. Increasingly, funding agencies require publication of data along with papers.

GEON has an ambitious educational component, which is being led by DLESE. "education that teaches modern cyberinfrastructure is indispensable for the geosciences," said Sinha. "Introducing students early to geoinformatics speeds community adoption of these radically different technologies and working methods."

"And ‘big picture’ exploration isn’t just for students," said Baru. "Outreach resources can also be useful to help scientists from different disciplines understand each other."

GEON Grid Infrastructure

The GEON Grid will include the three core components of a national cyberinfrastructure–grid computing, data management, and visualization. GEON will also include a portal to provide Web access to the GEON environment. A constant challenge for any grid-enabled testbed is the distribution and updating of a large, evolving software collection. GEON will build on existing deployment efforts for Open Grid Services Architecture-compliant software, including NPACI Rocks for cluster management and the NSF Middleware Initiative for grid software. The initial GEON Grid will span some of the participating universities, as well as Lawrence Livermore National Laboratory, USGS locations, and DLESE, and represents a prototype of a future full-scale GEON.

A Model for Other Disciplines

GEON has been conceived as a prototype that is designed from the beginning to scale up. A pioneering project in "democratizing" grid technologies, researchers plan to help GEON grow into a national network that will reach a wide range of users from scientists and educators to government policy makers and engineers. This broader community participation will expand as the geosciences community sees that the technology is effective end-to-end, producing previously unattainable results in published papers. As for the future, Baru notes that "beyond the geosciences, GEON will also have value and portability as a demonstration of how a discipline cyberinfrastructure can work."

Paul Tooby is a senior science writer at the San Diego Supercomputer Center.