|
By Paul Tooby
From space, we can recognize that the Earth is a single,
unified system. But geoscientists who study the planet are
still able to glimpse only partial views through the windows
of their separate disciplines. Now, researchers at the San
Diego Supercomputer Center (SDSC)
are collaborating with scientists who study the solid Earth
to build a prototype national Geosciences NetworkGEON.
"GEON cyberinfrastructure will help weave the separate
strands of the solid Earth sciences disciplines and data into
a unified fabric," said Chaitan Baru, co-director of
SDSCs Data and Knowledge Systems (DAKS)
program, who coordinates information technology in GEON. "This
will give the geosciences an IT head start for
viewing the complex dynamics of the Earth system as an interrelated
whole." (See Guest Commentary.)
A five-year multi-institutional National Science Foundation
(NSF) large Information
Technology Research project, GEON grew out of a workshop
at the 1999 Geological
Society of America meeting. "We kept rediscovering
that we had data from different disciplines that we wanted
to integrate, but we werent getting anywhere talking
with other geologists," said professor Krishna Sinha
of Virginia Tech, who coordinates
the geosciences component of GEON. The researchers realized
they needed to talk to IT experts, and subsequent NSF-sponsored
workshops and contacts with SDSC matured into the GEON collaboration.
Project
leaders
Chaitan Baru, SDSC
A. Krishna Sinha, Virginia Tech
Participants
Ramon Arrowsmith,
Arizona State
Maria Crawford, Bryn Mawr
Mary Marlino, Rajul Pandya, Michael Wright, DLESE
Boyan Brodaric, Geological Survey of Canada
David Archbell, Michael Bailey, Karan Bhatia, John
Helly, Jane Kwon, Kai Lin, Bertram Ludäscher,
Ashraf Memon, Viswanath Nandigam, Phil Papadopoulos,
Dogan Seber, Ilya Zaslavsky, SDSC
Eric Frost,
San Diego State University
Mark Gahegan, Penn State
Karl Flessa, Allister Rees, University of Arizona
Geof Bowker, Yannis Papakonstantinou, David Ribes,
UC San Diego
John Oldow,
University of Idaho
Tony Gary, Paul Sikora, University of Utah
Mian Liu, University of Missouri-Columbia
Chuck Meertens, UNAVCO
Tom Gunther, Marc Levine, Mike McCreer, David Soller,
Mary Lou Zoback, USGS
Ann Gates, Randy Keller,
Vladik Kreinovich, UTEP |
|
The IT component of GEON includes researchers from SDSC,
Penn State University, the
Geological
Survey of Canada, and San
Diego State University. The geoscience research component
involves researchers from 10 institutionsArizona
State University, Bryn
Mawr College, Cornell
University, Rice University,
University of Arizona,
University of Idaho,
University of
Missouri-Columbia, University
of Texas at El Paso, University
of Utah, and UNAVCO,
with the Digital Library for Earth Sciences Education (DLESE)
coordinating GEON education and outreach.
Major partners in GEON include the U.S. Geological Survey
(USGS), which maintains
a wide range of Earth system databases. "Were very
interested in working with GEON, both to provide access to
major USGS data sets, and to participate in the development
and use of the cyberinfrastructure," said Tom Gunther,
Enterprise Information Program Coordinator at the USGS. The
California Institute for
Telecommunications and Information Technologies, Lawrence
Livermore National Lab (LLNL),
and the Geological Survey of Canada (GSC) are also partners,
and ESRI, the major provider
of GIS, or geographic information systems software, and Georeference
Online are corporate partners.
"New partners continue to join GEON, a sign that the
time is right for this project," said Baru. GEON will
also collaborate with other major geosciences initiatives
such as Earthscope
and IRIS, the Incorporated
Research Institutions for Seismology.
GEON is being designed as a scientist-centered cyberinfrastructure,
freeing researchers to think and be creative by relieving
them of onerous data-management tasks. Through a scalable
and interoperable network, the project will provide scientists
with a growing array of tools they can use without having
to be IT experts. These include data integration mechanisms,
as well as computational resources and integrated software
for analysis, modeling, and visualization. In this way, GEON
will bridge traditional disciplinesan indispensable
step in understanding the Earth as a unified system.
In addition to benefiting scientific research, GEON will
enable policymakers to answer the call for more science-based
decision making. Nonspecialists will be able to obtain answers
to questions ranging from flood and landslide potential to
groundwater problems, volcanic activity, and soil quality,
all based on the best data available.
The integration of scientific disciplines is being driven
by the rapid progress of information technologies, which are
themselves merging into unified grid environments that support
integration of data and grid services, or "virtualization,"
with transparency extending across data location and name,
replication, heterogeneous formats, access permissions, and
more. And beyond discipline grids is the larger-scale cyberinfrastructure
of the NPACI Grid, and
the NSF
Extended Terascale Facility (ETF), in which SDSC plays a core
role.
"Its no accident that GEON is called a network,"
said Dogan Seber, GEON project manager and director of the
new Geoinformatics
Lab in SDSCs DAKS program. "This indicates
both the collaborative nature of the project and the architecture
that connects separate disciplines and sites into a unified
environment."
As scientific research grows larger in scale, projects such
as GEON must simultaneously address multiple challenges. There
are complex, intertwined IT and geosciences questions, which
are being explored in testbeds to ensure the systematic growth
of the national cyberinfrastructure. Additional challenges
include integrating the databases of diverse disciplines spread
across many sites, the cultural and social issues in building
effective interdisciplinary collaborations, and designing
and building the GEON Grid infrastructure.
Real-World Testbeds
 |
| View Larger Image |
|
Known terrane boundaries in the mid-Atlantic GEON testbed
region are indicated as colored regions. Red lines are
mapped faults
in the region.
|
To demonstrate GEON cyberinfrastructure development on real
scientific questions, the researchers have identified two
testbed areas, the Mid-Atlantic region of the Eastern U.S.,
and the Rocky Mountain region.
"The Rockies are in the middle of the North American
plate, and we want to understand why there is so much intraplate
deformation there, because deformation is usually focused
at plate margins," said professor Randy Keller of the
University of Texas at El Paso.
"Our goals are to use multiple types of data to understand
the underlying geological processes in order to find the origin
of this tectonic activity."
Integrating diverse data sets will help geoscientists better
understand the dynamics of crustal deformation in the Rockies.
"We can derive a picture of the deformation from Global
Positioning System (GPS) data, from earthquake data and the
slip rate of recent faulting, and from geological records
in deformed rock formations, each of which is obtained in
different ways," said Mian Liu, professor of geological
sciences at the University
of Missouri-Columbia. "These results wont always
agree because they represent different time scales, and this
is where you find new sciencefor example, wed
like to understand why some places are deforming at a high
rate now that didnt historically."
The mid-Atlantic region reflects a complete Wilson tectonic
cycle (the opening and closing of the Atlantic Ocean basin
as continental plates rift apart and later rejoin, accompanied
by mountain formation). A fundamental question in plate tectonics,
the major paradigm of the solid Earth sciences, is to understand
the dynamics of how pieces of continents break off and rejoin
in the Wilson cycle. To explore this, GEON scientists are
focusing on the basic unit of continentsthe terranea
crustal unit that preserves a characteristic geological history,
is distinct (in physical, geochemical, and other ways) from
adjacent areas, and is usually bounded by faults.
In the mid-Atlantic testbed, the researchers are addressing
the basic question of how to characterize and recognize terranes.
To determine whether a given fault is a terrane boundary,
geoscientists want to ask questions such as: What is the distribution
and age of rock types across the fault? What are the geochemical
signatures of these rocks? Are different types of ore deposits
present across the faults?
GEON will allow researchers to ask these questions by querying
databases from different disciplines for characteristics such
as rock types and their ages, filtering results for certain
properties, representing terranes in 3-D, performing analyses
and other tests, and finally, visualizing the resultsthings
they have never been able to do before in an integrated way.
These capabilities will both broaden the scope of scientific
questions and speed the process greatly, transforming the
way science is done.
Databases the researchers plan to integrate through GEON
activities range from geological and terrane maps to geochemical,
ore deposit, stratigraphic, geophysical, and other data types,
giving geoscientists additional "windows" into the
Earths past and the hidden world beneath our feet.
Connecting the Dots
The power of GEON will come from linking many databases and
research tools, but this involves a number of challenges.
Differences in physical location as well as hardware and software
protocols and permissions are being addressed by such methods
as XML-based standards for data exchange and data integration
technologies.
Beyond these obvious differences are more subtle discrepancies
in the underlying terminology and concepts scientists use
that give the data meaningeven data from nearby disciplines
can contain "hidden semantics" with differing conventions
and terminologies that complicate efforts to integrate the
data into a global geosciences environment.
For example, depending on where they were trained and their
discipline, different geoscientists may call the same rock
type by different names, or two different rock types may have
the same name. Both situations introduce serious ambiguities
into database integration.
To overcome these problems, the GEON knowledge representation
working group is using semantic mediation, which links diverse
data sets by reconciling or "mapping across" the
differences and idiosyncrasies. Being able to do this relies
on concept spaces that describe relevant concepts and their
relationships, and ontologies, which can be seen as formal
machine-processable counterparts of those concept spaces.
"To implement ontologies in GEON, we are harvesting the
agreements on names and concepts among geoscientists and putting
them in computer-usable form," said Bertram Ludäscher,
director of the DAKS Knowledge-based Information Systems lab
at SDSC. To make GEON more broadly useful, researchers are
also taking care to develop knowledge representation techniques
that are compatible with W3C
standards for the Semantic Web.
Concept Spaces for Smarter Queries
To help geoscientists study the Rocky Mountains, Ludäscher,
SDSC postdoctoral research fellow Kai Lin, and GSC researcher
Boyan Brodaric have built an ontology-based geological map
integration prototype which brings together in a single unified
queriable map interface, the geologic maps from nine state
geologic surveys in the Rocky Mountain region. Previously,
those data sets and maps could only be accessed and queried
separately, requiring far more labor and time.
By aligning the database formats, capturing the geological
concepts underlying the databases into ontologies whose concepts
can be interrelated, and applying semantic mediation techniques,
the researchers have made the data uniformly accessible.
"Our integration ontology enables researchers to query
the data by five major geological factorsage, composition,
rock fabric, texture, and genesis," said Ludäscher.
"And we want to provide views of the data based on accepted
standards, so that geologists can rely on the answers they
get." To do this, the researchers incorporated rock type
classifications for the various geological factors, and with
the collaboration of Brodaric turned them into ontologies
on which the system can operate.
In the architecture, the global ontology covering nine states
contains the local ontologies for the rock type data of each
state, and reconciles any ambiguities introduced through differences
in names and other factors.
GEON researchers will also have to balance the stability
of ontologieswhich enables the interoperation of disparate
data setswith the need for flexibility to allow ontology
evolution. "In building GEON, we are keeping in mind
the need to refresh, update, and allow our knowledge and ontologies
to evolve," said Mark Gahegan, a computer scientist and
professor of geography at Penn State. A related issue is to
develop methods to identify data and knowledge that may initially
appear anomalous or incorrect, but may contain the seeds for
new insights and paradigms.
Pioneering Collaboration
The GEON cyberinfrastructure is part of a broader information-technology-driven
integration occurring across all of the sciences. There is
a need for human-level integration, because cyberinfrastructure
involves more than hardware and software. GEON includes such
activities as face-to-face meetings, online collaboration,
and overviews of both IT and geosciences components to assist
participants from different disciplines in "speaking
each others languages."
"Modern science is increasingly collaborative,"
said Alan Blatecky, executive director of SDSC. "Part
of the value of emerging cyberinfrastructure is in making
this possible, not only through the technical advances of
hardware and software integration but just as importantly
through helping people work together in large-scale projects
across disciplines."
researchers in the emergent discipline of social informatics,
Geoffrey Bowker of UC San Diegos Department
of Communication and sociology graduate student David
Ribes, are participating in GEON to study the impacts of revolutionary
IT advances on scientific research, and the reciprocal impacts
of the scientific problems on IT research.
"When information revolutions take place, the scientific
questions being posed and the work done can change dramatically,"
said Bowker. "Beyond the technical aspects, much social,
organizational, and cultural work has to be done."
For example, as scientific data sets are made increasingly
available online, issues of the timing and manner of data
sharing become important. "Hopefully, incentives in the
sciences will evolve to match the growing need for data sharing,
and to support work that, beyond individual researchers, benefits
the whole community," said Virginia Techs Sinha.
Increasingly, funding agencies require publication of data
along with papers.
GEON has an ambitious educational component, which is being
led by DLESE. "education
that teaches modern cyberinfrastructure is indispensable for
the geosciences," said Sinha. "Introducing students
early to geoinformatics speeds community adoption of these
radically different technologies and working methods."
"And big picture exploration isnt
just for students," said Baru. "Outreach resources
can also be useful to help scientists from different disciplines
understand each other."
GEON Grid Infrastructure
The GEON Grid will include the three core components of a
national cyberinfrastructuregrid computing, data management,
and visualization. GEON will also include a portal to provide
Web access to the GEON environment. A constant challenge for
any grid-enabled testbed is the distribution and updating
of a large, evolving software collection. GEON will build
on existing deployment efforts for Open
Grid Services Architecture-compliant software, including
NPACI Rocks for
cluster management and the NSF
Middleware Initiative for grid software. The initial GEON
Grid will span some of the participating universities, as
well as Lawrence Livermore National Laboratory, USGS locations,
and DLESE, and represents a prototype of a future full-scale
GEON.
A Model for Other Disciplines
GEON has been conceived as a prototype that is designed from
the beginning to scale up. A pioneering project in "democratizing"
grid technologies, researchers plan to help GEON grow into
a national network that will reach a wide range of users from
scientists and educators to government policy makers and engineers.
This broader community participation will expand as the geosciences
community sees that the technology is effective end-to-end,
producing previously unattainable results in published papers.
As for the future, Baru notes that "beyond the geosciences,
GEON will also have value and portability as a demonstration
of how a discipline cyberinfrastructure can work."
Paul Tooby is a senior science writer
at the San Diego Supercomputer Center.
|