Environmental Statistics: Challenges for the Future

Features

  • Author: Christopher K. Wikle
  • Date: 09 Jul 2013
  • Copyright: Image appears courtesy of iStock Photo

The “environment” is often defined to be the physical and biological surroundings of an organism or physical process, as well as the interactions that occur between the organism/process and these surroundings. Indeed, the earth is a complex system of interacting physical and biological processes. For example, when looking at global satellite observations of the earth’s systems, one can observe many scales of variability in space and time, some of which are evident across multiple platforms. If one examines the tropical Pacific Ocean, there are westward propagating eddies evident in the infrared and visible cloud imagery, as well as in the associated sea surface temperature imagery. Perhaps more surprising, similar features are evident in the ecosystem as well, as these same propagating features are evident in ocean color observations, which are a proxy for ocean phytoplankton. Thus, there is ample evidence that even on very large scales, the atmosphere, ocean, and ecosystem are linked in terms of their spatio-temporal dynamics across multiples scales of variability.

thumbnail image: Environmental Statistics: Challenges for the Future

Environmental statistics covers a wide variety of scientific issues, including (but not limited to) air and water pollution, climate, weather, oceanography, chemistry, ecology, wildlife, fisheries, forestry, food security, soil conservation, epidemiology, etc. Not surprisingly, in order to consider such a wide variety of disciplines, environmental statistics also covers a wide variety of statistical methodologies. These include extreme value statistics, change points, sampling and design, time series, spatial statistics, spatio-temporal statistics, non-parametrics, Bayesian statistics, generalized linear mixed models, risk assessment, decision theory, to name just a few. Clearly, environmental statistics, or “environmetrics” is a huge topic and interested readers can find excellent descriptions of the discipline in El-Shaarawi and Piegorsch (2012), Barnett (2004), and Manly (2001), as well as many other specialized books that consider various subcomponents of the discipline (e.g., Cressie and Wikle, 2011; Mateu and Müller, 2013). 

...it is clear that the “big data” problem is only going to be more pronounced in environmetrics, given the ever-increasing availability of data from remote sensing, data storage tags, deterministic models, and sensor networks.

An informal examination of recent journal publications associated with environmental statistics, both methodology and applications, finds that the vast majority of the work being done in the field is associated with the following broad categories:

• Climate: downscaling (physical and statistical); paleo-climate analysis; climate model inter-comparison
• Ecological statistics: spatial capture-recapture; animal abundance estimation; occupancy models; animal movement; ecological community analysis
• Environmental health impacts and spatial epidemiology
• Use of deterministic models in statistical analyses; physical-statistical models; mechanistically motivated models; statistical surrogates/emulators
• Spatial and spatio-temporal design of monitoring networks
• Accommodating very large “big data” problems in spatial and spatio-temporal statistics
• Dynamical spatio-temporal models

These topics are, in many instances, related. For example, a common theme in recent years concerns “big data,” especially with regards to large-scale spatial and spatio-temporal processes. These processes are prevalent in most climate and ecological applications, and play a role in many of the other topics listed above, such as the statistical emulation of deterministic computer output.

Without going too far out on a limb in terms of forecasting the future, it is clear that the “big data” problem is only going to be more pronounced in environmetrics, given the ever-increasing availability of data from remote sensing, data storage tags, deterministic models, and sensor networks. In addition to classical data mining methodologies, specific methods must be developed to consider the unique spatial and temporal nature of environmental data. As an example, Yang et al. (2013) consider high-frequency depth and temperature time signals from data storage tags recovered from sturgeon tagged in the Missouri river. These signals have a huge number of observations and can be quite non-stationary in time. As such, they were converted to a time-frequency representation, which were then treated as “images” or “spatial” processes (because many of the “image” pixels are then quite dependent locally in time and frequency). Yang et al. (2013) were then able to incorporate reduced-rank spatial methods and Bayesian stochastic search variable selection to use these signals as predictors for spawning success. Thus, a very high resolution/high dimensional process can be used to predict a process at lower time resolution.

The future of environmental statistics will also rely strongly on the development of new computational methods, to make use of parallel computing architectures such as graphical processing units (GPUs), as well as to approximate many of the fully Bayesian methodologies that have been developed in recent years.

The future of environmental statistics will also rely strongly on the development of new computational methods, to make use of parallel computing architectures such as graphical processing units (GPUs), as well as to approximate many of the fully Bayesian methodologies that have been developed in recent years. Many of these methodologies are quite powerful, but difficult to scale up to larger problems. The use of approximate methods is starting to improve the flexibility of this modeling paradigm.

We should also expect to see many more network-based models, as well as agent-based and individual-based models for ecological and health-related processes. In addition, we are likely to see methodologies that seek to improve how we model the interaction of multiple biological processes across scales and with the physical environment. This will necessitate the development of novel multivariate spatio-temporal process and data models. In particular, there will be a need to develop new classes of nonlinear dynamic spatio-temporal models to account for realistic variability of complex interactions across time and space. The recent general quadratic nonlinear model (see Cressie and Wikle, 2011 for a review) is one such parametric example that accommodates a wide range of physical and biological process variability. Lastly, the methods of environmental statistics will increasingly incorporate formal decision theory components, in response to a growing need by regulators and managers to answer questions given analyses that contain appropriate uncertainty quantification.

Environmental statistics is one of the most vibrant areas of statistics. It encompasses a wide variety of disciplines and methodologies, and requires innovative solutions when applied in practice. It has a storied past, and with the increase in concern over anthropogenic effects on the global and local environment, it is sure to play a significant role in the future. As new data platforms come on line, and data volume increases exponentially, environmetrics will be even more challenging, with myriad opportunities to develop new and innovative approaches to answer questions of critical societal importance.


References

Barnett, V. (2004) Environmental Statistics: Methods and Applications. John Wiley & Sons, Chichester.

Cressie, N. and C.K. Wikle (2011) Statistics for Spatio-Temporal Data. John Wiley & Sons, Hoboken, NJ.

El-Shaarawi, A.H. and Piegorsch, W.W. (eds) (2012) Encyclopedia of Environmetrics, 2nd Edition, 6 Volume Set. John Wiley & Sons. Hoboken, NJ.

Mateu, J. and W.G. Müller (eds) (2013) Spatio-Temporal Design: Advances in Efficient Data Acquisition. John Wiley & Sons. Chichester.

Yang, W.-H., Wikle, C.K., Holan, S.H., and M.L. Wildhaber (2013) Ecological prediction with nonlinear multivariate time-frequency functional data models. Journal of Agricultural, Biological, and Environmental Statistics. DOI: 10.1007/s13253-013-0142-1.

Christopher K. Wikle is Professor of Statistics at the Department of Statistics, University of Missouri-Columbia, 146 Middlebush Hall, Columbia, MO 65211, USA.

Related Topics

Related Publications

Related Content

Site Footer

Address:

This website is provided by John Wiley & Sons Limited, The Atrium, Southern Gate, Chichester, West Sussex PO19 8SQ (Company No: 00641132, VAT No: 376766987)

Published features on StatisticsViews.com are checked for statistical accuracy by a panel from the European Network for Business and Industrial Statistics (ENBIS)   to whom Wiley and StatisticsViews.com express their gratitude. This panel are: Ron Kenett, David Steinberg, Shirley Coleman, Irena Ograjenšek, Fabrizio Ruggeri, Rainer Göb, Philippe Castagliola, Xavier Tort-Martorell, Bart De Ketelaere, Antonio Pievatolo, Martina Vandebroek, Lance Mitchell, Gilbert Saporta, Helmut Waldl and Stelios Psarakis.