Alexandre Thiéry - Ensemble Kalman Smoother (EnKS)

Consider a linear-Gaussian state space model with \(\mathbb{R}^{D_x}\)-valued dynamics \(X_{t+1} \sim F \, X_t + \mathcal{N}(0,Q)\) and \(\mathbb{R}^{D_y}\)-valued observations \(Y_t \sim H X_t + \mathcal{N}(0,R)\). Assuming a Gaussian initial distribution, the filtering distributions \(\mathbb{P}(X_t \in dx \, | Y_{1:t}) \equiv \mathcal{N}(\mu_{t|t}, P_{t|t})\) are Gaussian and can be sequentially computed with the Kalman Filter. Similarly, the predictive distributions \(\mathbb{P}(X_{t+1} \in dx \, | Y_{1:t}) \equiv \mathcal{N}(\mu_{t+1|t}, P_{t+1|t})\) are straightforward to obtain from the filtering distributions: \(\mu_{t+1|t} = F \, \mu_{t|t}\) and \(P_{t+1|t} = F \, P_{t|t} \, F^\top + Q\). Given observations \(y_{1:T} \equiv (y_1, \ldots, y_T)\) and \(1 \leq t \leq T\), the smoothing distributions \(\mathbb{P}(X_t \in dx \, | Y_{1:T}) \equiv \mathcal{N}(\mu_{t|T}, P_{t|T})\) can computed by performing a “backward pass”. Since everything is linear and Gaussian, it is just an exercise in Linear Algebra & Gaussian-conditioning, as described by the Rauch-Tung-Striebel (Rauch, Tung, and Striebel 1965) smoothing recursions. The backward recursion reads

\[ \left\{ \begin{aligned} \mu_{t|T} &= \mu_{t|t} + B_t \, {\left( \mu_{t+1|T} - \mu_{t+1|t} \right)} \\ P_{t|T} &= P_{t|t} + B_t {\left( P_{t+1|T} - P_{t+1|t} \right)} B^\top_{t} \end{aligned} \right. \tag{1}\]

and allows one to compute the smoothing means and covariances matrices \((\mu_{t|T}, P_{t|T})\) for \(1 \leq t \leq T\) starting from the knowledge of \((\mu_{T|T}, P_{T|T})\). In Equation 1, the smoothing gain matrix \(B_t\) is given by

\[ \begin{align} B_t &= \mathop{\mathrm{Cov}}(X_t, X_{t+1} \, | y_{1:t}) \, \mathop{\mathrm{Var}}(X_{t+1} \, | y_{1:t})^{-1} \\ &= P_{t|t} F^\top \, {\left( F \, P_{t|t} \, F^\top + Q \right)} ^{-1}. \end{align} \tag{2}\]

The Ensemble Kalman Filter (EnKF) is a non-linear equivalent of the Kalman filter, and the purpose of these notes is to derive the equivalent “ensemble version” of the backward recursion Equation 1. For this purpose, it is important to understand slightly better the role of the smoothing gain matrix \(B_t\). Consider the pair of random variable \((X^f_t, X^p_{t+1})\) distributed according to the joint distribution between the filtering distribution at time \(t\) and the predictive distribution at time \(t+1\) in the sense that

\[ (X^f_t, X^p_t) \; \underbrace{=}_{\text{(law)}}\; (X_t, X_{t+1} \, \mid \, y_{1:t}). \]

This means that \(X^f_t \sim \mathcal{N}(\mu_{t|t}, P_{t|t})\) and \(X^p_{t+1} \sim \mathcal{N}(\mu_{t+1|t}, P_{t+1|t})\) and \(X^p_t = F \, X^f_t + \mathcal{N}(0, Q)\). Furthermore, Equation 2 and the standard gaussian conditional probabilities formulas give that the conditional means and covariances are given by

\[ \left\{ \begin{align} \textrm{Mean} {\left( X^f_t | (X^p_{t+1}=x_{t+1}) \right)} \; &= \; \mu_{t|t} + B_t (x_{t+1} - \mu_{t+1|t}) \\ \textrm{Cov} {\left( X^f_t | (X^p_{t+1}=x_{t+1}) \right)} \; &= \; P_{t|t} - B_t \, P_{t+1|t} \, B_t^\top. \end{align} \right. \tag{3}\]

The above expression for the conditional mean also shows that the matrix \(B_t\) is a minimizer of the loss

\[ M \; \mapsto \; \mathbb{E} {\left( \left\| (X^f_t - \mu_{t|t}) - B (X^p_{t+1} - \mu_{t+1|t}) \right\|^2 \right)} \]

over all matrices \(M \in \mathbb{R}^{D_x, D_x}\). Heuristically, this shows that the smoothing gain matrix \(B_t\) can easily be computed by regressing \(X^f_t\) against \(X^p_{t+1}\). We can use this remark to build an ensemble version of the backward recursion Equation 1. Recall that when running a EnKF for filtering the observations \(y_{1:T}\), the final stage proceeds in two steps:

Obtain an ensemble of particles \(X^{i,p}_{T} = F \, X^{i,f}_{T-1} + \mathcal{N}(0,Q)\) that approximate the predictive distribution \(\mathbb{P}(X_T | y_{1:T-1})\).
Assimilate the last observation \(y_T\) using the Kalman gain matrix \(K_T\) and the correction \(\Delta_T^i = K_T \, (\tilde{y}_{i,\star} - H \, X^{i,p}_T)\) by setting \[ X^{i,s}_T = X^{i,p}_T + \Delta_T^i. \tag{4}\] The particles \(X^{i,s}_T\) approximate the smoothing distribution \(\mathbb{P}(X_T | y_{1:T})\).

Following our discussion of the smoothing gain matrix \(B_{t}\) and Equation 4, it seems sensible to set

\[ \begin{align} X^{i,s}_{T-1} &= X^{i,f}_{T-1} + B_{T-1} \, \Delta^i_T\\ &= X^{i,f}_{T-1} + B_{T-1} \, (X^{i,s}_{T} - X^{i,p}_{T}) \end{align} \tag{5}\]

and hope that the ensemble of updated particles \(X^{i,s}_{T-1}\) approximate the smoothing distribution \(\mathbb{P}(X_{T-1} | y_{1:T})\). In words, the particle \(X^{i,s}_{T-1}\) is obtained by “pulling” the correction term \(\Delta^i_{T} = X^{i,s}_{T} - X^{i,p}_{T}\) back to \(X^{i,f}_{T-1}\) through the “regression” smoothing gain matrix \(B_{T-1}\). To check that the particles \(X^{i,s}_{T-1}\) indeed approximate the smoothing distribution \(\mathbb{P}(X_{T-1} \,|y_{1:T})\), it suffices to compute the mean/variance and verify that they are matching the one given by Equation 1. Recall that Equation 3 gives that the filtering/predictive distributions satisfy

\[ X^f_{T-1} = \mu_{T-1|T-1} + B_{T-1} \, (X^p_{T} - \mu_{T|T-1}) + \varepsilon_t \]

where \(\varepsilon_t \sim \mathcal{N}(0, P_{T-1|T-1} - B_{T-1} \, P_{T|T-1} \, B_{T-1}^\top)\) is independent from all other sources of randomness. Plugging this into Equation 5 gives that

\[ X^{i,s}_{T-1} = \mu_{T-1|T-1} + B_{T-1} \, (X^{i,s}_{T} - \mu_{T|T-1}) + \varepsilon_t. \]

Since the \(X^{i,s}_{T}\) are distributed according to the smoothing distribution, i.e. \(X^{i,s}_{T} \sim \mathcal{N}(\mu_{T|T}, P_{T|T})\), this immediately shows that \(X^{i,s}_{T-1}\) is Gaussian with

\[ \left\{ \begin{align} \textrm{Mean} &= \mu_{T-1|T} = \mu_{T-1|T-1} + B_{T-1} \, {\left( \mu_{T|T} - \mu_{T|T-1} \right)} \\ \textrm{Covariance} &= P_{T-1|T} = P_{T-1|T-1} + B_{T-1} {\left( P_{T|T} - P_{T|T-1} \right)} B^\top_{T-1}, \end{align} \right. \]

as it should. One can then iterate this construction to obtain particle approximations of the smoothing distributions \(\mathbb{P}(X_t | y_{1:T})\) for \(1 \leq t \leq T\) by running a backward pass and recursively setting

\[ X^{i,s}_t \; = \; X^{i,f}_t + B_t \, {\left( X^{i,s}_{t+1} - X^{i,p}_{t+1} \right)} . \]

The ensemble of particles \(X^{i,s}_t\) approximates the smoothing distribution \(\mathbb{P}(X_t | y_{1:T})\). In a nonlinear setting, it suffices to approximate the smoothing gain matrices with

\[ \widehat{B}_t = \mathop{\mathrm{Cov}} {\left( x^f_{t,i}, x^p_{t+1,i} \right)} \, \mathop{\mathrm{Var}} {\left( x^p_{t+1,i} \right)} ^{-1}. \]

[Experiments: TODO]

References

Rauch, Herbert E, F Tung, and Charlotte T Striebel. 1965. “Maximum Likelihood Estimates of Linear Dynamic Systems.” AIAA Journal 3 (8): 1445–50.