
This report is the result of a September 1995 workshop
jointly sponsored by
the San Diego Supercomputer Center,
the National Center for Ecological Analysis,
and the National Center for Supercomputing Applications
with funding provided by
the National Science Foundation.

Computational ecology is a field devoted to the quantitative description and analysis of ecological systems using empirical data, mathematical models (including statistical models), and computational technology. While the components of this field are not new, there is a new emphasis on the integrated treatment of the area. To a large degree, this emphasis is precipitated by the expansion of our local, national, and international computational infrastructure, coupled with the heightened social awareness of ecological and environmental issues and its effects on research funding. In an attempt to consolidate what is known about the state-of-practice in computational ecology, a workshop was held gathering together ecologists and computer scientists for the purpose of identifying those technology issues which impede the progress of ecological research. This workshop was held at the San Diego Supercomputer Center in September 1995.
The results of the workshop are expressed in three areas: data management, visualization, and modeling. Common to these areas is the need to develop standards to facilitate the sharing of data, the comparison of results, and the integration of model components. Unique to the problem of data sharing is the issue of the proprietary nature of research data and the lack of institutional incentives for data sharing along with the usual issues of intellectual property. The use of visualization continues to be inhibited by the highly specialized knowledge still required to effectively utilize current visualization packages along with fundamental questions such as how should statistical error be represented in visual presentations. The modeling community is challenged by questions such as how multi-scale, multi-resolution models may be integrated across disciplines, as well as the desire for collections of standard realizations of model components to minimize redundant software development and facilitate the comparison of modeling analyses.

Since before the introduction of the term ecosystem in 1935 by Tansley [10], ecologists have struggled with conceptual questions such as: do communities of species coalesce into functionally integrated higher-order units or do species respond more individually across space and time? Do population sizes exhibit fluctuations in response to apparently stochastic processes in the physical environment or does density-dependence regulate and stabilize population trajectories? If such regulation occurs, how does it work? Does it come from trophic interactions above, below, or within a community? So far it often seems that the answers vary across species, across places, and across time.
Increasingly, ecologists must make their knowledge useful to the practical necessities of conservation and the management of natural resources while continuing to address fundamental questions about how nature works. Lubchenco, et al [21] identified three preeminent focal areas of ecology for the coming decades: global climate change; biodiversity; and sustainability of ecosystem resources. All three issues share a need for mechanistic explanations and understanding. Ecologists continue to search for measurable properties of individual organisms affected by their local environment, which result in aggregated, large-scale phenomena. Increasingly, these studies are driven by the need to understand the effect of human impacts prospectively, requiring an improved predictive capability.
The answers to ecological questions have rarely, if ever, been general laws. Since ecologists deal with in situ, living systems, the study of ecology requires analytical methods and experiments which often cannot be controlled in the precise way physical experiments usually are. Consider what it would be like to conduct a physics or chemistry experiment if the scientist were only a few angstroms in size and lived for only a few nanoseconds. Could such a scientist control or decipher the macroscopic course of chemical reactions from the random collisions of molecules or the relationships between subatomic particles? The phenomena that ecologists study frequently operate on spatial and temporal scales larger and longer than any one individual can effectively study.
The problem of linking processes at the scale of the individual organism to processes at the community and regional scale raises issues regarding the sufficient characterization of the interactions between individuals, populations, and the physico-chemical environment. The obvious difficulty of manipulating and replicating experiments at the landscape scale raises issues about experimental design, regularity and longevity of sampling, and the appropriate integration of empirical data. Furthermore, the difficulty of translating theoretical models to real landscapes with potentially huge numbers of parameters to be Þt from data collected at varying spatial and temporal scales pose both theoretical and practical problems.
With a shift to mechanistic and large-scale practical questions has come a need for ecologists to think deeply about issues that present bottlenecks to solving their problems. Increasingly large interdisciplinary groups of scientists must collaborate and share data to address questions that span large ranges of physical and temporal scale. This creates both cultural and technical difficulties.
Ecology arose and flourished in a tradition of independent investigators studying relatively small, well-defined systems with a hope that the results could be generalized to larger, more complex systems. Attempts to understand comprehensive processes in ecology are now seen to require increasingly larger groups of collaborators, forcing consideration of issues of priority, data ownership, communication, coordination of effort, and data consistency. For example, the Long Term Ecological Research (LTER) concept developed by the National Science Foundation (NSF) created a network of sites intended to measure a core set of ecological processes to develop a regional and global understanding of existence and nature of global change. Burdened by its cultural legacy of independent, individual investigators, the program has tended toward the old mold, with autonomous research programs capitalizing on the unique characteristics of each particular site. Improving collaboration, to make it possible to answerlarger-scale questions, requires extensive changes in the cultural milieu of academic incentives and the research grant system that rewards individual accomplishments preferentially to collaborative accomplishments [12]. This shortcoming has been recognized in a recent national review [33] and is being addressed by the new generation of ecologists inside and outside the LTER system.
Technological needs for interdisciplinary research teams must be met. For example, the Internet (and its progeny) is making an immense impact on the way ecologists conduct research by not only enhancing the speed and facility of communication in general, but also allowing the parallel posting and retrieval of data, creating dynamic, multi-authored data sets rather than the static data sets characteristic of a typical journal article. Increasingly faster computers, with increasingly lower costs, allow a level of realism in modeling efforts that could not even be imagined 20 years ago. Higher resolution satellite imagery and other forms of remotely sensed data can be processed and interpreted by new types of software to produce maps that for the first time allow us to visualize ecological variables across huge spatial scales.
Underlying all these issues is the need to link dynamic processes operating across differing spatial domains, and with different rates. How can we link the natural and man-made forces that influence demand for biological resources with the population dynamics of these resources? How can we link large-scale atmospheric models with the behavior of individual organisms that may even influence weather patterns? How much averaging and smoothing of very fine-scale biological data must be made in order to match the coarser scale of the geophysical data while preserving the essence of the phenomenon being modeled? The answers clearly depend on what we want to predict. It may be quite reasonable to predict the species composition of a forest, and even the statistical properties of the spacing patterns of individual trees, but it may be fruitless to ask whether the position of a particular individual at a particular time can be predicted.

In the preface of her book, Mathematical Ecology [32], E. C. Pielou described the integrative nature of ecology in terms of three different approaches to the analysis of ecological questions. These are:
These categories of endeavor that Pielou envisioned as mathematical-plus-statistical ecology have been recast here into what we now call computational ecology. As such it is an interdisciplinary field devoted to the quantitative description and analysis of ecological systems using empirical data, mathematical models (including statistical models), and computational technology. The measure of success and progress in this field is the increasing ability to reliably explain and predict the behavior of the systems under study. The long-term scientific goal is the development of methods to predict the response of ecosystems to changes in their physical, biological and chemical components. The computational aspects arise from the need to express and manipulate the data and theories that we have about these complex, expansive and poorly understood systems. We endeavor to develop ways to discern patterns and principles from empirical data and mathematics through the machinery of computation. This leads to three general areas of research in computer science relevant to computational ecology: mathematical modeling, data management, and visualization.
Each has its own organizing principles and basic problems but they are related to each other by their relevance across a wide range of ecological research problems.
Ecological data management differs in many ways from other types of data management as a result of the irregular character and relative sparseness of the data confounded by widely ranging scales of measurement in time and space. In addition, the information describing the data and its proper use (the "metadata" or "codata") is as diverse as the data itself [4]. This poses research questions regarding how this diversity can be accommodated in the least number of representations possible, yet support the wide range of software and hardware combinations used by researchers around the world. There are also significant issues regarding intellectual property rights and ownership of data that are becoming increasingly pressing [28].
Mathematical models provide a means of expressing our understanding of the mechanisms governing the structure and function of natural populations, communities, and ecosystems in a testable manner through the use of computer simulations. The results of a computer simulation are usually compared with empirical data to evaluate the predictive power of a given mathematical model and the limits of our understanding of the thing being modeled. Because many ecological processes occur over long time scales, ecologists often use models of underlying processes to explain observed patterns. Inference from process to pattern has been limited to relatively simple models because of the high computational burden of relaxing assumptions and incorporating greater detail. While simple models lend some insight, they are typically very sensitive to often unrealistic assumptions. Until recently ecologists had to use these models, because there was insufficient computational power to run stochastic, individually-based, spatially explicit models. We now have sufficient power to investigate some of these models. One cannot be satisfied, however, simply to make models more complicated; it is essential to simplify, to explore, and to address the questions raised earlier about the transfer of information across scales.
Visual modeling uses techniques of scientific visualization to compose multi-dimensional, computer-generated scenes that can be used to express empirical data and modeling results in an intuitive and viscerally-appealing presentation. This can be done not only in the true spatio-temporal context but also in other parameterizations that reveal critical features of data or model sensitivities. Visual models can be presented with an emphasis on particular themes. For example, the spatial distribution of vegetation by altitude and season, or the variation in salinity around an ice floe and the spatial distribution of sea life can be depicted as snapshots in time or through animated sequences. Generalized capabilities now exist to manipulate visualizations to allow the observer to alter the point of view at will and to color-code parametric values to emphasize particular features in data.

Global change, biodiversity, and sustainability represent core research issues for the next century [21] that present numerous scientific, sociological, and technological challenges [11]. Wise management and decision-making related to sustainability and global change will entail integration and timely analysis of data from the physical, chemical, biological, and social sciences. The needs for seamless integration of data within and among disparate disciplines and rapid transformation of those data into the information and knowledge bases required by resource managers, decision-makers, and the scientific community, require that social and natural scientists revolutionize the ways that data are collected, managed, and analyzed [11], [26], [24], [5], [31].
Specific challenges related to managing the data and information required for addressing global change, biodiversity, and sustainability include: timely identification and acquisition of relevant data; determining data "fitness-for-use" for meeting specific objectives; development of powerful, flexible, and user-friendly data and database management systems; mechanisms and repositories for data archival; and new mechanisms for transforming data into usable information and knowledge. Identification and acquisition of data relevant to a specific problem is frequently hindered by both sociological and technological obstacles.
Despite the continual improvements in hardware and software, technological impediments still exist with respect to identifying data resources and communicating large volumes of data. This is typified by the arbitrary manner in which data are organized during acquisition. Despite the existence of quasi-standard data export capabilities such as comma-separated values (CSV), rich-text format (RTF), and their ilk, the essential definition of data content is arbitrary and subject to the experience and inclinations of the originator of the data. This is, not surprisingly, the result of the expedients encountered by the individual research group in balancing project-specific versus long-term data needs. The problem is exacerbated by the lack of guidelines for standardization and limited experience of the researchers.
Ecological data are typically collected to meet needs specific to a single project. Despite the fact that these data frequently are useful for meeting broader needs of the scientific community (e.g., scaling site-specific studies up to broader spatial and temporal scales, etc.), these benefits accrue only when scientists adhere to common standards for data models, quality assurance and quality control (QA/QC), archival, and so forth. Agencies that perform or support environmental research and management could benefit from identifying key gaps in available databases, and seeking to fill such gaps by creating new research initiatives, expanding existing research projects to encompass broader spatial and temporal scales or, in many cases, by simply providing the additional support that would be necessary to convert a project-specific database into a high quality, well-documented, and archived database that is easily accessible to the scientific community. Development of cost-share arrangements among agencies to produce and distribute common data sets could mitigate many of these problems.
Standardization efforts are generally treated with reluctance by the scientific community when they are initially proposed. Nonetheless, much of our progress in science can be directly related to the development and adoption of standards. Several ongoing activities in the ecological community provide relevant examples of progress that is being made in standardizing data collection, processing and management, and analysis. Significant progress in understanding biodiversity will require unprecedented collaboration, database development, and data sharing among the ecology and systematics communities. Addressing issues related to biodiversity will require access to voucher and museum specimens. Progress is being made in capturing specimen data in electronic form. For example, internationally 2 million specimen records were available free on the Internet in 1993 [27], and more than 3 million specimen records are currently available (Miller, unpublished data). However, much work remains, as only 5% of the 400 million museum specimens in the US are available in electronic form, and only 2% of the specimens are geocoded. Whereas it may cost only $5 tocapture specimen data in electronic form, recapturing the "data" in the field may exceed $100 per specimen [1]. Transferring the wealth of museum specimen and associated data into electronic form (images and text) represents a significant challenge. Furthermore, despite numerous nascent efforts to develop databases of described species, the lack of a master database for the more than 1 million of them remains a major handicap to all who use species names as keys to information about organisms and ecosystems. Developments in library information science may serve as an evolutionary model for how standardization, data sharing, cataloging, and archival efforts can develop and, ultimately, benefit science and society.
One of the most difficult limitations to improved data sharing is the result of sociological constraints. Data providers are heavily concerned about piracy of their data, the potential for misuse or misinterpretation of their data and, in general, the larger legal issues surrounding intellectual property rights and liability. Liability issues frequently impede or prevent the free flow of data. Scientists and agencies may have real and justified concerns about releasing data associated with endangered or threatened species and communities. Data quality can easily become a legal issue requiring concerted attention to quality assurance and quality control that may exceed resource availability and result in lengthy time delays. Inaccurate mappings of locations of endangered species or flood zones represent just two examples of data quality problems that may result in economic harm to landowners and subsequent recourse through the legal system.
In some cases, individual scientists and agencies may equate data with power, thereby attempting to hold and protect data, as opposed to sharing data [34]. Although this notion may be naive or even counter to agency mandates, it is also true that frequently there are few incentives for data sharing. It is clear that the challenging research questions that are facing ecologists will demand unprecedented access to data that are collected by others. Significant attention has been focused on developing mechanisms for disseminating scientific results. Consequently, hundreds of journals now publish findings of ecological studies. Perhaps it is time for the scientific community to devote similar attention to developing and promoting the incentives necessary to facilitate data sharing. Electronic journals that specialize in publishing data and metadata, ascribing value and prestige to data submission to national databases, and other mechanisms may warrant consideration by agencies and scientific societies.