Skip to content

Understanding Biodiversity through Environmental Informatics

SDSC RESEARCH |Contents | Next
David Stockwell

Tony Fountain

A key feature of every living ecosystem is biodiversity, the variety of plants and animals living in a habitat. Biodiversity provides genetic resources for crops and drugs, promotes ecosystem stability and resilience, and is a major recreational and aesthetic resource. But today, a majority of biologists believe that a mass extinction of species is underway, posing a major threat to human society in the next century. SDSC researcher David Stockwell and colleagues are using advanced computational technologies in the new field of environmental informatics, providing researchers with more powerful tools to gather and analyze information about the environment to better manage the Earth's biodiversity.





FIG 1 mscp-cmyk Figure 1. Multiple Species Conservation
Web interface for the demonstration Multiple Species Conservation Plan (MSCP) in southern California accesses XML-formatted management and monitoring information from multiple sources and presents an integrated view in the user's browser.


"We're in a sort of race," says David Stockwell, a research scientist at SDSC. "To our advantage, SDSC research is producing tools for gathering, analyzing, monitoring, and displaying environmental information that are more powerful than ever. But we urgently need to develop an infrastructure of environmental information and analysis to support scientifically sound management of biodiversity."

Despite centuries of study, scientists' knowledge of biodiversity is surprisingly poor. Many environments have not been comprehensively surveyed, and a significant fraction of smaller species such as insects and microorganisms has never been described. "However, through our work in developing environmental infrastructure components, we're able to make better use of existing biodiversity information, both in museum collections and related environmental disciplines," Stockwell says.

Stockwell, SDSC colleagues, and other NPACI partners are working in the promising new discipline of environmental informatics, with the goal of providing a common information infrastructure resource for both research and education. Through environmental informatics, scientists can build on the work of centuries of biologists as well as the latest environmental data, using new computational technologies for accessing, analyzing, and visualizing data.

Top| Contents | Next


Typical biodiversity data sets are drawn from isolated surveys by amateur and professional biologists at universities, museums, and government agencies. They usually cover small geographic areas over short time spans--an incompleteness that seriously limits the research questions that can be answered.

"The relatively small amount of data in a given biodiversity data set means that you don't have a randomized sample," Stockwell says. "As a side effect, predictive models may indicate lower diversity in areas that have been undersampled." Large amounts of new data may be needed to develop a reliable prediction.

One of the most important contributions of environmental informatics tools is the ability to integrate these small, isolated data sets. "This is definitely a case where the whole is greater than the sum of the parts," Stockwell says. An integrated data set can fill in the gaps sufficiently to reflect the actual patterns of biodiversity, letting scientists see a more complete picture and enabling them to ask new research questions.

One example involves museum collections data. New environmental informatics tools have dramatically increased researchers' access to the large amount of data that already exist in museum collections, forming a "virtual collection" of species information spanning the world and going back hundreds of years. The Species Analyst, a tool developed by NPACI partner David Vieglais at the University of Kansas Natural History Museum, is putting more of these data at scientists' fingertips. In response to a query through a Web interface, the Species Analyst seamlessly integrates records from many databases located in natural history museums across the country--and eventually around the world--even when they are on incompatible platforms, software, and data formats.

"Underutilized museum data are becoming more available through tools like the Species Analyst, but many people don't yet know the power of these tools," Stockwell says. "It's now possible to save a month in the field collecting new specimens by spending an afternoon at a computer accessing existing collections."

An example of how integrating information from a variety of sources can help the general public understand biodiversity can be seen in a Web site that provides information on the status of a nature conservation program. Using an Extensible Markup Language (XML)–based tool developed by Ilya Zaslavsky of SDSC's Data-Intensive Computing Environments (DICE) group, a demonstration project shows how management and monitoring information about the San Diego Multiple Species Conservation Program (MSCP) can be accessed from different organizations, assembled, and displayed on the user's Internet browser (Figure 1). By accessing the data at the separate locations where they are maintained, this approach ensures that the information provided is up to date and eliminates the need for a centralized database.

SDSC and NPACI are addressing demanding computational problems across many fields with tools for heterogeneous distributed computing and database systems. "The new tools of environmental informatics that integrate databases and analytical methods are increasing the usefulness of biased and isolated data sets and improving biodiversity predictions, which is new for a lot of environmental data," Stockwell says. "Now we can reduce the cost of surveys and get value out of data that we haven't considered before."

Top| Contents | Next


D. Stockwell, P. Arzberger,
T. Fountain, and J. Helly (2000): An interface between computing, ecology, and biodiversity: Environmental informatics. Korean Journal of Ecology, 23(2): 101-106.

D.R.B. Stockwell (1999): Genetic Algorithms II. In A.H. Fielding, ed., Machine Learning Methods for Ecological Applications. Kluwer Academic Publishers, Boston.


Environment problems are among the most complex in the scientific world. By their very nature, environmental data are heterogeneous, multiscalar, and multidisciplinary--coming from biology and ecology, geochemistry and climate science, anthropology and prehistory. Data sets may also be very large and changing, such as the results of ecosystem modeling data, satellite imagery, and environmental sensor data, and span a vast range of space and time.

At the same time, both the natural and human worlds are changing rapidly, as are the computer and information technologies on which the tools of environmental informatics rely, placing their development beyond the scope of any single investigator and often beyond the mission, infrastructure, or expertise of any single institution.

To meet these challenges, SDSC is participating in innovative collaborations in environmental informatics databases to provide the integrated information essential to a comprehensive understanding of biodiversity. These collaborations include a project on the biodiversity of the oceans with post-doctoral researcher Karen Stocks of Rutgers University and the Scripps Institution of Oceanography, and another project with Earth-systems modelers at Colorado State University developed in association with the NSF Long-Term Ecological Research (LTER) stations (Figure 2).

Top| Contents | Next
env-informatics2 Figure 2. Insights into Earth Processes
Detailed environmental simulations including the effects of clouds on photosynthesis can produce improved data sets for modeling biodiversity. Clouds lower photosynthesis rate (ellipses), while high clouds do not (square in upper left). Photosynthesis rate increases from dark blue to light blue to green. (Joe Eastman, LTER. Graphics produced using the Environmental WorkBench, EWB, compliments of SSESCO).


Another tool for environmental informatics involves computationally intensive methods from artificial intelligence research including data-mining and machine learning, which have proven highly useful in analyzing data from diverse environmental sources. For example, the Biodiversity Species Workshop (BSW), developed by David Stockwell and hosted by SDSC for several years, uses a genetic algorithm to evolve sets of models that map species data and predict species distribution. "The experience of researchers using BSW has shown that when they're able to easily integrate their data with robust analysis techniques in a Web-based portal, they produce creative new research results," Stockwell says.

And the BSW has already produced important new science. A. Townsend Peterson at the Kansas Museum of Natural History used the BSW to produce new insights into how species form over evolutionary time. Looking at 37 pairs of closely related species of forest birds in Mexico, separated by an arid barrier, BSW tools established that the geographic barriers were the major force driving the formation of new species; that is, the geographic barrier, rather than factors like climate or ecology, was the critical factor in speciation.

Other uses for this software have included tracking bird migration, designing endangered species preserves, and predicting and managing harmful invasive species. The BSW has recently been evaluated by Melissa Haltuch of the USGS for use in the Ohio Gap Analysis Program (GAP) to predict fish distributions in Ohio rivers and streams. GAP provides policymakers with advice on conservation priorities by identifying "gaps" in the current reserve system. Environmental informatics tools like the BSW are helping researchers begin to see what is needed to understand and predict ecosystem behavior, and these tools are also facilitating development of wiser policies for preserving biodiversity.

Bayesian network induction (BK2) is another artificial intelligence technique recently developed by student programmer Mark Diggory and environmental informatics team leaders Stockwell and Tony Fountain. BK2 is being used to mine ecological data sets for theories on the processes and relationships that link climatic, physical, and chemical systems to biodiversity. BK2 derives a network of relationships between environmental and biological variables from survey data at field sites using an extensive computational search for probabilistic dependencies among the many variables. Finding the best network is computationally challenging, because the number of possible networks increases faster than exponentially as the number of variables increases.

"The network structure represents how many biologists think about the environment, and large parallel computers are an important tool in pursuing the problem of discovering the underlying causal structure given the computational complexity of the problem," Stockwell said.

In the future, Stockwell looks forward to a broadening of environmental science. "Traditionally, when people study the environment and ecosystems, they study the ecosystem itself and the impact of human society on the environment. But that's only one direction of the dynamic. To close the circle, so to speak, we also need to include the effect of the environment--of changes in things like biodiversity and climate--on human society, which can impact human health and economic stability." --PT *

Top| Contents | Next