# Approximate Inference for High-Dimensional Spatio-Temporal Systems Using the Ensemble Kalman Filter

## Features

**Author:**Christopher K. Wikle, Jonathan R. Stroud and Matthias Katzfuss**Date:**13 Nov 2015**Copyright:**Image appears courtesy of Getty Images

With the expansion of data sources such as those from remote sensing platforms, GPS data tags, automated sensor networks, deterministic models, and the associated computational challenges, statisticians working with spatio-temporal processes have increasingly become interested in, and tolerant of, approximate inference and prediction methods. This is even more imperative given the potential non-Gaussianity and nonlinearity that often characterize such systems. Although hierarchical models in Bayesian implementations have proven very useful in these settings, they have traditionally been applied in more retrospective analyses (3).

Many spatio-temporal problems, such as those characterized by combining observations with mechanistic models (i.e., data assimilation), are often considered in a sequential updating environment, in which one’s prediction of the distribution associated with the underlying spatio-temporal state process of interest is updated as new observations become available. It has been known since the seminal work of Kalman (9) that this can be easily and efficiently accomplished in the case of Gaussian error distributions and linear measurement and state process evolution equations via filtering (i.e., the “Kalman filter”). It was similarly known, in principle, how to do this in the case on non-Gaussian and nonlinear data and processes, but implementation often required unrealistic approximations to the underlying models (3). In the presence of high-dimensional state spaces and large amounts of data, this non-Gaussian/nonlinear filtering problem still presents significant challenges.

Starting in the late 1990s, sequential Monte Carlo inference and prediction methods, such as particle filtering (4) proved successful when applied to the non-Gaussian/nonlinear filtering problem, but only in relatively low-dimensional settings. These methods work by sampling particles from the relevant prior (forecast) distribution and reweighting those particles given the latest data in order to obtain the posterior (analysis) distribution. Unfortunately, this reweighting or resampling eventually leads to particle degeneracy in high-dimensions so that one particle gets all the weight, which prohibits the adequate characterization of the posterior distribution. In contrast, the ensemble Kalman filter (EnKF), developed in the geophysics literature by Evensen (5), is an approximate Monte Carlo filtering method that, rather than reweighting ensemble members, updates the ensemble by a linear shift based on the assumption of a linear Gaussian state-space model. Thus, the key difference between the EnKF and other sequential Monte Carlo algorithms is the use of a linear updating rule that converts the prior ensemble to a posterior ensemble after each observation. To date, the EnKF is in some sense the only way to do filtering-based inference for many high-dimensional complex systems (1,11). Perhaps surprisingly, with relatively few exceptions, the EnKF methodology is still largely unknown in Statistics. To address this, we have recently written a tutorial paper (10) to illustrate the method, and the relevant mathematical details presented from a statistics point of view can be found there. Here, we present a brief heuristic view of the EnKF.

In general, a state-space system for a spatio-temporal process includes a measurement model (distribution) that maps the observations to the underlying latent state process (i.e., the data is conditioned upon the unknown true state of the system), and a process model (distribution) that describes the Markovian evolution of the state process. Consider interest in the state process at time t; then the prior distribution of the state process given data up to but not including time t is called the forecast distribution. We can easily draw samples (ensemble members) from this prior distribution. Then, in the presence of the observations from time t, we shift these ensemble members from the forecast distribution (linearly) in response to the new observations – this gives the new posterior distribution (often called the analysis distribution.) There are two primary ways to accomplish this shift: stochastically and deterministically. The stochastic version relies on simulated or “perturbed” observations to shift the forecast distribution based on the proximity of these perturbed observations to the actual observations. Alternatively, the deterministic algorithms shift the forecast ensemble without the need for simulated observations. In particular, these algorithms adjust the forecast samples by scaling and shifting them towards the data, based on a decomposition of the prediction covariance matrix, thereby ensuring that they have a smaller variance after learning from the new observations. There are many variants of both the stochastic and deterministic EnKF algorithms (11). Figure 1 shows graphically an example of how these methods might work in one dimension.

Figure 1: Illustration of various filtering procedures for a simple one-dimensional Gaussian state-space model. The top panel (a) shows the observed data (red “X”) for the first ten time periods along with the true states (black circles), five ensemble members from the EnKF (blue lines), and the filter distributions from the Kalman filter (gray curves). The bottom panel (b) provides a schematic of the updates for the first time period for a Kalman filter, a stochastic EnKF, a deterministic EnKF, and a sequential importance sampler (particle filter). The vertical black bars associated with the particles/ensemble members are proportional to the weights. The same five equally weighted prior samples were used for each method. Note that even in this simple one-dimensional Gaussian example, the importance sampler weights degenerate to one particle receiving all of the weight, whereas the EnKF procedures shift the particles and are giving a better representation of the filtering distribution.

In practice, one typically has relatively few ensemble members in an EnKF implementation compared to the number of state variables. This presents a problem when one is estimating the predictive covariance matrix and the linear shift matrix (called the gain matrix in the Kalman filter literature). These are biased low and unstable (at best) without adjustments – namely, so-called covariance inflation and tapering. The latter is important as it provides a localization so that the observations that affect the update at a particular location in state space only depend on nearby observations. In addition, for high-dimensional problems, one can apply a serial updating approach so that the ensemble is updated based on each scalar observation in series, rather than updating based on all of the current observations at once.

As an illustration of EnKF on real-world data, Figure 2 shows an example of filtering cloud data from a regional climate model over the central USA during March 1979. Specifically, the data represent cloud water content (g/kg) along a one-dimensional longitudinal transect with 60 equally spaced (52 km) spatial locations and 80 equally spaced time periods (6 hr intervals) (12). The EnKF was run with 50 ensemble members using various tuning methods (i.e., state augmentation, covariance inflation and localization).

Figure 2: Left panel - Cloud water content (g/kg) from a March 1979 regional climate model simulation in a one-dimensional longitudinal transect over the central USA for 80 time periods (6 hourly), where time increases from the bottom of the plot to the top; Center panel - EnKF filtered mean associated with the cloud data; Right panel – EnKF filter standard deviation associated with the cloud data.

One can also apply the EnKF methodology to the so-called smoothing problem, where one seeks distributions of the state process at a specific time given ALL of the observations, both before and after that time. In addition, there is currently interest in developing efficient approaches in the EnKF framework to do parameter estimation. The two main approaches to this are based on state augmentation, whereby the parameters are added to the state process and thereby updated sequentially in the same manner as the state process, and methods based on approximate likelihoods obtained from the EnKF output.

As stated previously, a major strength of the EnKF approach is that it has shown remarkable success when applied to nonlinear models, despite the fact that it typically is only focused on estimating and shifting the first two moments of the analysis (posterior) distribution. This is similar in spirit to so-called linear Bayesian estimation, which also only relies on the first two moments of the prior and posterior distributions (6,7). When such methods are not satisfactory, it has been shown that the use of normal mixtures can work quite well in practice (2). Recently, there is increasing interest in hybrid methodologies that combine EnKF and particle filters for non-Gaussian and non-linear systems (8).

In summary, the EnKF has proven to be a useful and flexible approximation for many high-dimensional data assimilation problems, particularly in the geosciences. Despite this success, statisticians remain largely unaware of the method and its strengths and weaknesses. There are many avenues of statistical research to explore related to the EnKF, and given the increasing size of datasets and prediction domains, it is likely to be an important and critical area of inquiry in spatio-temporal statistics over the next few years.

**References:**

(1) Anderson, J. L. (2009). Ensemble Kalman filters for large geophysical applications. *IEEE Control Systems Magazine*, 29:66–82.

(2) Bengtsson, T., Snyder, C., and Nychka, D. (2003). Toward a nonlinear ensemble filter for high-dimensional systems. *Journal of Geophysical Research*, 108(D24):8775.

(3) Cressie, N., & Wikle, C. K. (2011). *Statistics for Spatio-Temporal Data*. John Wiley & Sons, Hoboken.

(4) Doucet, A., De Freitas, N., & Gordon, N. (2001). An introduction to sequential Monte Carlo methods. In *Sequential Monte Carlo Methods in Practice*. Springer, New York.

(5) Evensen, G. (1994). Sequential data assimilation with a nonlinear quasi-geostrophic model using Monte Carlo methods to forecast error statistics. *Journal of Geophysical Research*, 99:10143–10162.

(6) Goldstein, M. and Wooff, D. (2007). *Bayes Linear Statistics, Theory & Methods*. John Wiley & Sons.

(7) Hartigan, J. (1969). Linear Bayesian methods. *Journal of the Royal Statistical Society. Series B*, 446–454.

(8) Hoteit, I., Pham, D. T., Triantafyllou, G., and Korres, G. (2008). A new approximate solution of the optimal nonlinear filter for data assimilation in meteorology and oceanography. *Monthly Weather Revie*w, 136:317–334.

(9) Kalman, R. E. (1960). A new approach to linear filtering and prediction problems. *Journal of Basic Engineering*, 82(1), 35-45.

(10) Katzfuss, M., Stroud, J.R., and Wikle, C.K. (2015) Understanding the ensemble Kalman filter. *The American Statistician* (in review).

(11) Nychka, D. and Anderson, J. L. (2010). Data assimilation. In Gelfand, A., Diggle, P., Guttorp, P., and Fuentes, M., editors, *Handbook of Spatial Statistics*. Chapman & Hall/CRC, New York, NY.

(12) Wikle, C. K. (2002). A kernel-based spectral model for non-Gaussian spatio-temporal processes. *Statistical Modelling*, 2:299-314.

**About the authors:**

Christopher Wikle - (to whom correspondence should be addressed) Department of Statistics, University of Missouri, 146 Middlebush Hall, Columbia, MO 65203, wiklec@missouri.edu

Jonathan R. Stroud - McDonough School of Business, Georgetown University, Washington DC, 20057, jrs390@georgetown.edu

Matthias Katzfuss - Department of Statistics, Texas A&M University, College Station, TX 77843, katzfuss@gmail.com

## Connect: