Ensemble Kalman Updates
Assume a prior Gaussian prior distribution \(\pi_0 \equiv \mathcal{N}(m_0,P_0)\) and a noisy observation \(y_\star \in \mathbb{R}^{D_y}\) with
\[ y_\star = H x + \xi \qquad \text{with} \qquad \xi \sim \mathcal{N}(0,R) \]
where \(x \in \mathbb{R}^{D_x}\) is an unknown quantity of interest. The posterior distribution \(\pi \equiv \mathcal{N}(m,P)\) is Gaussian and is given by
\[ \begin{align} \left\{ \begin{aligned} m &= m_0 + P_0 H^\top {\left( H P_0 H^\top + R \right)} ^{-1}(y_\star - H m)\\ P &= P_0 - P_0 H^\top {\left( H P_0 H^\top + R \right)} ^{-1} \, H P_0, \end{aligned} \right. \end{align} \]
as standard Gaussian conditioning shows it. This can also be written as
\[ \begin{align} \left\{ \begin{aligned} m &= m_0 + K \, (y_\star - H m_0)\\ P &= (I - K \, H) P_0, \end{aligned} \right. \end{align} \]
for Kalman Gain Matrix \(K \in \mathbb{R}^{D_x,D_y}\) defined as
\[ K \; = \; P_0 H^\top {\left( H P_0 H^\top + R \right)} ^{-1}. \tag{1}\]
which can also be expressed as
\[ K \; = \; \mathop{\mathrm{Cov}}(X, HX) \, \mathop{\mathrm{Var}}(Y)^{-1}. \]
for \(X \sim \pi_0\) and \(Y = HX + \mathcal{N}(0,R)\); this point of view can be a useful for establishing generalization to non-linear scenarios. The important remark is that the posterior covariance matrix \(P \in \mathbb{R}^{D_x,D_x}\) and the posterior mean \(m \in \mathbb{R}^{D_x}\) can also be expressed as
\[ \begin{align} \left\{ \begin{aligned} m &= \textcolor{red}{(I - K \, H)} \, m_0 + \textcolor{blue}{K} \, y_\star\\ P &= \; \textcolor{red}{(I - K \, H)} \, P_0 \, \textcolor{red}{(I - K \, H)^\top} + \textcolor{blue}{K} R \textcolor{blue}{K^\top}. \end{aligned} \right. \end{align} \tag{2}\]
This shows that \(P\) is indeed positive semi-definite. More importantly, this gives a mechanism for transforming samples from the prior distributions into samples from the posterior distributions. Indeed, consider \(N\) iid samples from the prior distribution, \(x_1, \ldots, x_N \sim \pi_0(dx)\), and set
\[ \widetilde{x}_i = \textcolor{red}{(I - K \, H)} \, x_i + \textcolor{blue}{K}(y_\star + \xi_i) \]
for iid noise terms \(\xi_i \sim \mathcal{N}(0,R)\). From Equation 2 it is clear that \(\widetilde{x}_1, \ldots, \widetilde{x}_N\) are iid samples from the Gaussian posterior distribution \(\pi = \mathcal{N}(m,P)\). It is more intuitive to write this as
\[ \widetilde{x}_i = x_i + K \, ( \textcolor{green}{ \widetilde{y}_{i,\star} } - H \, x_i) \]
where \(\widetilde{y}_{i,\star} \in \mathbb{R}^{D_y}\) are fake observations that are obtained by perturbing the actual observation \(y_\star\) with additive Gaussian noise terms with covariance \(R\),
\[ \textcolor{green}{\widetilde{y}_{i,\star} \, = \, y_\star + \xi_i}. \]
Empirical version: non-linearity and non-Gaussianity
Suppose that we would like to estimate \(x \in \mathbb{R}^{D_x}\) from the noisy observation
\[ y_\star = \mathcal{H}(x) + \xi \qquad \text{with} \qquad \xi \sim \mathcal{N}(0,R) \]
and possibly-nonlinear observation operator \(\mathcal{H}: \mathbb{R}^{D_x} \to \mathbb{R}^{D_y}\). Assume that we also have \(N\) samples \(x_1, \ldots, x_N\) generated from some (unkown) prior distribution. For example, these samples could come from another numerical procedure. In order to obtain \(N\) approximate samples from the posterior distribution, one can set
\[ \widetilde{x}_i = x_i + \widehat{K} \, [ \textcolor{green}{ \widetilde{y}_{i,\star} } - \mathcal{H}(x_i)] \]
for fake observations \( \textcolor{green}{\widetilde{y}_{i,\star} } = y_\star + \xi_i\). The approximate Kalman gain matrix \(\widehat{K}\) is obtained by noting that in Equation 1 giving
\[ K \; = \; P_0 H^\top {\left( H P_0 H^\top + R \right)} ^{-1} \]
we have \(P_0 H^\top = \mathop{\mathrm{Cov}}(X,HX)\) and \(H P_0 H^\top = \mathop{\mathrm{Var}}(HX)\) for \(X \sim \pi_0\). This means that an approximate Kalman matrix can be obtained using empirical estimates of these covariance matrices:
\[ \widehat{K} \; = \; \widehat{\mathop{\mathrm{Cov}}}([x_i]_i, [\mathcal{H}(x_i)]_i) \, {\left( \widehat{\mathop{\mathrm{Var}}}([\mathcal{H}(x_i)]_i) + R \right)} ^{-1}. \]
These updates form the basis of the Ensemble Kalman filter (EnKF), and very successful and scalable approach to data-assimilation of high-dimensional dynamical systems. This method is operationally employed across various weather forecasting centers across the globe.
Derivative-Free optimization
Interestingly enough, the remarks above can be design in a relatively principled manner a derivative free optimizer (Huang et al. 2022). For example, assume one would like to minimize a functional of the type
\[ \Psi(x) \; = \; \|y_\star - \mathcal{H}(x)\|^2_{R^{-1}}. \]
One can start with a cloud of particles \(x_1, \ldots, x_N\) and keep updating them by assuming that one assimilates the noisy observation \(y_\star\) generated from a postulated observation process
\[ y_\star = \mathcal{H}(x) + \varepsilon^{-1} \, \xi \]
for \(\xi \sim \mathcal{N}(0,R)\) and \(\varepsilon\ll 1\) a “step-size”. Each assimilation step steers the cloud of points towards the rights direction. Careful choice of the step-size \(\varepsilon\) is often crucial, as in any optimization procedure. It is indeed related to Information-Geometric Optimization Algorithms (IGO): the article (Ollivier et al. 2017) is beautiful!