Bridge Matching Sampler – Alexandre Thiéry

Consider a controlled SDE on \([0,T]\) with scalar noise schedule \(\sigma_t > 0\),

\[ dX_t = \sigma_t \, u(X_t, t) \, dt + \sigma_t \, dW_t, \quad X_0 \sim p_0. \tag{1}\]

The goal: find \(u\) such that \(X_T \sim \pi\), where \(\pi(x) = \rho(x)/\mathcal{Z}\) is a target density known up to normalization. Adjoint sampling solves this with a Dirac prior and memoryless condition; adding a corrector for arbitrary priors requires alternating optimization. The Bridge Matching Sampler (BMS) identifies a single coupling, the independent coupling, that makes the regression target fully tractable and removes the need for alternation.

Nelson’s relation

Let \(\mathbb{P}^u\) denote the path measure of Equation 1 with time marginals \(p_t\). Write the Euler discretization with step \(\delta\):

\[ X_{t+\delta} = X_t + \sigma_t \, u(X_t,t) \, \delta + \sigma_t \sqrt{\delta} \, \mathbf{n}, \qquad \mathbf{n}\sim \mathcal{N}(0,I). \tag{2}\]

The forward conditional mean is \(\mathbb{E}[X_{t+\delta} \mid X_t = x] = x + \sigma_t \, u(x,t) \, \delta\). Now compute the backward conditional mean \(\mathbb{E}[X_t \mid X_{t+\delta} = y]\) using Bayes’ rule, exactly as in the reverse diffusions note. For \(\delta \ll 1\):

\[ \mathbb{P}(X_t \in dx \mid X_{t+\delta} = y) \;\propto\; p_t(x) \, \exp {\left\{ -\frac{\|y - x - \sigma_t \, u(x,t) \, \delta\|^2}{2 \sigma_t^2 \, \delta} \right\}} . \]

Expanding \(p_t(x) \approx p_t(y) \exp(\left< \nabla \log p_t(y), x - y \right>)\) and completing the square, the conditional mean is

\[ \mathbb{E}[X_t \mid X_{t+\delta} = y] = y - \sigma_t \, u(y,t) \, \delta + \sigma_t^2 \, \nabla \log p_t(y) \, \delta + O(\delta^2). \tag{3}\]

(The second-order correction to \(\log p_t\) affects the conditional variance at \(O(\delta)\) but not the conditional mean, which is all we need.)

Completing the square:

Drop multiplicative constants independent of \(x\). The exponent in the posterior is

\[ \left< \nabla \log p_t(y), x - y \right> - \frac{\|y - x - \sigma_t u \delta\|^2}{2\sigma_t^2 \delta}. \]

Write \(z = x - y\). The quadratic piece is \(-\|z + \sigma_t u \delta\|^2/(2\sigma_t^2\delta)\), with mean at \(z = -\sigma_t u \delta\). The linear piece \(\left< \nabla \log p_t(y), z \right>\) shifts the Gaussian mean by \(\sigma_t^2 \delta \, \nabla \log p_t(y)\) (the standard “linear tilt of a Gaussian” identity: if \(f(z) \propto \exp(-\|z - m\|^2/(2s^2) + \left< a,z \right>)\), then the mean shifts from \(m\) to \(m + s^2 a\), with \(s^2 = \sigma_t^2\delta\) and \(a = \nabla \log p_t(y)\)). So \(\mathbb{E}[z] = -\sigma_t u \delta + \sigma_t^2 \nabla \log p_t(y)\delta + O(\delta^2)\), giving \(\mathbb{E}[X_t \mid X_{t+\delta} = y] = y - \sigma_t u \delta + \sigma_t^2 \nabla \log p_t(y)\delta\).

From the reverse diffusions note, the time-reversed process \(\overleftarrow{X}_s = X_{T-s}\) satisfies

\[ d\overleftarrow{X}_s = {\left[ -\sigma_{T-s} \, u(\overleftarrow{X}_s, T-s) + \sigma_{T-s}^2 \, \nabla \log p_{T-s}(\overleftarrow{X}_s) \right]} ds + \sigma_{T-s} \, d\overleftarrow{B}_s. \]

The reversed drift is \(-\sigma_t u + \sigma_t^2 \nabla \log p_t\). Define \(v\) by writing this reversed drift as \(\sigma_t v\), so that the reversed SDE takes the form \(d\overleftarrow{X}_s = \sigma_{T-s} v(\overleftarrow{X}_s, T-s) ds + \sigma_{T-s} d\overleftarrow{B}_s\). Then \(\sigma_t v = -\sigma_t u + \sigma_t^2 \nabla \log p_t\), i.e. \(v = -u + \sigma_t \nabla \log p_t\). Rearranging:

This is Nelson’s relation:

\[ \textcolor{blue}{u(x,t) + v(x,t) = \sigma_t \, \nabla \log p_t(x).} \tag{4}\]

As a sanity check, Equation 3 confirms this: the backward conditional mean \(y - \sigma_t u \delta + \sigma_t^2 \nabla \log p_t \delta\) identifies the reversed drift as \(-\sigma_t u + \sigma_t^2 \nabla \log p_t = \sigma_t v\), consistent with the time-reversal formula.

This holds for any Markov diffusion of the form Equation 1 with scalar noise schedule \(\sigma_t\) and marginals \(p_t\).

Reciprocal class and Markovian projection

Let \(\mathbb{P}\) denote the reference (uncontrolled) process \(dX_t = \sigma_t \, dW_t\). Write \(\mathbb{P}_{|0,T}(\cdot | x_0, x_T)\) for the law of the reference process conditioned on \(X_0 = x_0, X_T = x_T\). A path measure \(\Pi\) belongs to the reciprocal class \(\mathcal{R}(\mathbb{P})\) if it has the form \(\Pi = \Pi_{0,T} \, \mathbb{P}_{|0,T}\), where \(\Pi_{0,T}\) is an endpoint coupling and \(\mathbb{P}_{|0,T}\) is the reference bridge (Brownian bridge for Brownian \(\mathbb{P}\)); see the Schrodinger bridges note.

A reciprocal measure is generally non-Markovian: the bridge drift depends on \(X_T\). The Markovian projection finds a Markovian drift \(u^\star\) whose time marginals match those of \(\Pi^\star\) (see (Brunick and Shreve 2013) for existence/uniqueness). (Existence and uniqueness of the Markovian projection requires the path-dependent drift to satisfy a linear growth condition; see Brunick and Shreve (2013).) This is an \(L^2\) projection: if \(\xi(X,t)\) is the path-dependent drift of \(\Pi^\star\), then

\[ u^\star(x,t) = \mathbb{E}_{\Pi^\star} {\left[ \xi(X,t) \mid X_t = x \right]} . \tag{5}\]

Why? For any Markovian \(u\), expand \(\mathbb{E}_{\Pi^\star}[\|\xi - u(X_t,t)\|^2]\) and use the tower property:

\[ \mathbb{E}_{\Pi^\star} {\left[ \|\xi - u\|^2 \right]} = \mathbb{E}_{\Pi^\star} {\left[ \|\xi - u^\star\|^2 \right]} + \mathbb{E}_{\Pi^\star} {\left[ \|u^\star - u\|^2 \right]} . \]

The cross-term vanishes because \(\mathbb{E}_{\Pi^\star}[\xi - u^\star \mid X_t] = 0\) by definition of \(u^\star\). So \(u^\star\) minimizes the matching loss

\[ u^\star = \mathop{\mathrm{argmin}}_{u} \; \mathbb{E}_{\Pi^\star} {\left[ \int_0^T \frac{1}{2} \| \xi(X,t) - u(X_t,t) \|^2 \, dt \right]} . \tag{6}\]

Fixed-point iteration

All three methods (adjoint sampling with Dirac prior, adjoint sampling with corrector, BMS) follow the same template. Starting from some control \(u_0\):

Simulate the current SDE \(\mathbb{P}^{u_i}\) to generate endpoint pairs \((X_0, X_T)\).
Reciprocal projection: form a coupling \(\Pi^i_{0,T}\) from the endpoints, define \(\Pi^i = \Pi^i_{0,T} \, \mathbb{P}_{|0,T}\).
Markovianize: update \(u_{i+1}\) by regressing onto the bridge drift via Equation 6.

If \(u_i = u^\star\), then \(\Pi^i = \Pi^\star\) and \(u_{i+1} = u^\star\), so \(u^\star\) is a fixed point. Convergence of this iteration is not guaranteed in general; the BMS paper treats it as empirically effective. What distinguishes the methods is the coupling in step 2, which determines the regression target \(\xi\).

Target score identity

To get a tractable \(\xi\), we need the score \(\nabla \log \Pi^\star_t(x)\). Define the cumulative variance \(\nu_t = \int_0^t \sigma_s^2 \, ds\) and \(\gamma_t = \nu_t/\nu_T\). The bridge \(\mathbb{P}_{|0,T}\) is Gaussian: for \(t \in (0,T)\),

\[ X_t \mid (X_0, X_T) \;\sim\; \mathcal{N} {\left( (1-\gamma_t) X_0 + \gamma_t X_T, \;\; \nu_T \gamma_t(1-\gamma_t) \, I \right)} . \tag{7}\]

The marginal density of \(\Pi^\star\) at time \(t\) is

\[ \Pi^\star_t(x) = \int \mathbb{P}_{t|0,T}(x \mid x_0, x_T) \, \Pi^\star_{0,T}(x_0, x_T) \, dx_0 \, dx_T. \tag{8}\]

Differentiate \(\log \Pi^\star_t(x)\):

\[ \nabla_x \log \Pi^\star_t(x) = \frac{\int \nabla_x \mathbb{P}_{t|0,T}(x \mid x_0, x_T) \, \Pi^\star_{0,T}(x_0, x_T) \, dx_0 \, dx_T}{\Pi^\star_t(x)}. \]

Since \(\mathbb{P}_{t|0,T}\) is Gaussian with mean \((1-\gamma_t)x_0 + \gamma_t \, x_T\),

\[ \nabla_x \log \mathbb{P}_{t|0,T}(x \mid x_0, x_T) = -\frac{x - (1-\gamma_t)x_0 - \gamma_t \, x_T}{\nu_T\gamma_t(1-\gamma_t)}. \]

Integration by parts to swap gradients:

The identity \(\nabla_x \mathbb{P}_{t|0,T} = -\frac{1}{1-\gamma_t} \nabla_{x_0} \mathbb{P}_{t|0,T}\) relies on the Gaussian bridge having a mean that is affine in \((x_0, x_T)\) with endpoint-independent variance. Shifting \(x\) by \(\varepsilon\) is equivalent to shifting \(x_0\) by \(-\varepsilon/(1-\gamma_t)\), which holds because the mean is \((1-\gamma_t)x_0 + \gamma_t x_T\). Beyond Gaussian references, the TSI still holds conceptually but the gradient swap takes a different form.

Integrate by parts in \(x_0\):

\[ \nabla_x \log \Pi^\star_t(x) = \frac{1}{1-\gamma_t} \frac{\int \mathbb{P}_{t|0,T}(x \mid x_0, x_T) \, \nabla_{x_0} \Pi^\star_{0,T}(x_0, x_T) \, dx_0 \, dx_T}{\Pi^\star_t(x)}, \]

where we moved \(\nabla_{x_0}\) from \(\mathbb{P}_{t|0,T}\) onto \(\Pi^\star_{0,T}\) (boundary terms vanish by decay of \(\Pi^\star_{0,T} \cdot \mathbb{P}_{t|0,T}\) at infinity). (The minus from \(\nabla_x \mathbb{P}= -\frac{1}{1-\gamma_t} \nabla_{x_0} \mathbb{P}\) and the minus from integration by parts cancel, giving a positive coefficient \(+\frac{1}{1-\gamma_t}\).) Write \(\Pi^\star_{0,T|t}\) for the conditional distribution of \((X_0, X_T)\) given \(X_t = x\) under \(\Pi^\star\). Recognizing the conditional expectation:

\[ \nabla_x \log \Pi^\star_t(x) = \mathbb{E}_{\Pi^\star_{0,T|t}} {\left[ \frac{1}{1-\gamma_t} \nabla_{X_0} \log \Pi^\star_{0,T}(X_0,X_T) \;\Big|\; X_t = x \right]} . \]

The same argument with integration by parts in \(x_T\) gives

\[ \nabla_x \log \Pi^\star_t(x) = \mathbb{E}_{\Pi^\star_{0,T|t}} {\left[ \frac{1}{\gamma_t} \nabla_{X_T} \log \Pi^\star_{0,T}(X_0,X_T) \;\Big|\; X_t = x \right]} . \]

Since both expressions equal the same score, any convex combination \((1-c(t))\) times the first plus \(c(t)\) times the second is also valid, for any \(c(t) \in (0,1]\).

This gives the generalized target score identity: for any \(c(t) \in (0,1]\),

\[ \nabla \log \Pi^\star_t(x) = \mathbb{E}_{\Pi^\star_{0,T|t}} {\left[ \frac{1-c(t)}{1-\gamma_t} \nabla_{X_0} \log \Pi^\star_{0,T} + \frac{c(t)}{\gamma_t} \nabla_{X_T} \log \Pi^\star_{0,T} \;\Big|\; X_t = x \right]} . \tag{9}\]

(The endpoints \(c = 0\) and \(c = 1\) are excluded because they would divide by \(\gamma_t\) or \(1 - \gamma_t\), which vanish at \(t = 0\) or \(t = T\).)

(The gradient-integral interchange and vanishing boundary terms require regularity of \(\Pi^\star_{0,T}\) and decay of the integrand at infinity; these hold for sub-Gaussian couplings.)

General regression target

Now combine everything. For a reciprocal process \(\Pi^\star = \Pi^\star_{0,T} \mathbb{P}_{|0,T}\), the bridge drift decomposes into forward and backward pieces: \(\nabla_{X_t} \log \mathbb{P}_{t|0,T}(X_t | X_0, X_T) = \nabla_{X_t} \log \mathbb{P}_{t|0}(X_t | X_0) + \nabla_{X_t} \log \mathbb{P}_{T|t}(X_T | X_t)\). The first piece points back toward \(X_0\); the second points forward toward \(X_T\). The backward Markovian drift \(v^\star\) extracts the backward piece and averages it over \(X_0 | X_t\):

\[ v^\star(x,t) = \mathbb{E}_{\Pi^\star_{0|t}} {\left[ \sigma_t \, \nabla_{X_t} \log \mathbb{P}_{t|0}(X_t \mid X_0) \;\Big|\; X_t = x \right]} . \tag{10}\]

Since \(\mathbb{P}_{t|0}\) is Gaussian \(\mathcal{N}(X_0, \nu_t I)\), this is \(-\sigma_t(X_t - X_0)/\nu_t\) averaged over \(X_0 \mid X_t\). This follows from the bridge drift decomposition above: the \(\nabla \log \mathbb{P}_{T|t}\) piece, when averaged over \(X_T | X_t\), gives the forward drift \(u^\star\) by the Doob h-transform. So \(v^\star\) is the remaining backward piece, averaged over \(X_0 | X_t\).

Nelson’s relation Equation 4 was derived for \(\mathbb{P}^u\) with marginals \(p_t\). Here we apply it to \(\mathbb{P}^{u^\star}\), which has marginals \(\Pi^\star_t\) (the Markovian projection preserves marginals). This gives the Markovian forward drift: \(u^\star = \sigma_t \nabla \log \Pi^\star_t - v^\star\). Both the score \(\nabla \log \Pi^\star_t\) and the backward drift \(v^\star\) are conditional expectations (from Equation 9 and Equation 10 respectively). To get a regression target for the matching loss Equation 6, we need a non-Markovian drift \(\xi(X,t)\) whose conditional expectation \(\mathbb{E}[\xi \mid X_t]\) equals \(u^\star\). Substituting the integrands (before conditioning) from Equation 9 and Equation 10 into Nelson gives such a \(\xi\):

\[ \sigma_t^{-1} \, \xi(X,t) = \frac{1-c(t)}{1-\gamma_t} \nabla_{X_0} \log \Pi^\star_{0,T}(X_0,X_T) + \frac{c(t)}{\gamma_t} \nabla_{X_T} \log \Pi^\star_{0,T}(X_0,X_T) - \nabla_{X_t} \log \mathbb{P}_{t|0}(X_t \mid X_0). \tag{11}\]

The first two terms are the TSI integrand (the quantity inside \(\mathbb{E}[\cdot \mid X_t]\) in Equation 9, which conditions on \(X_t\) to give \(\sigma_t \nabla \log \Pi^\star_t\)). The third term is the backward drift integrand from Equation 10 (conditioning on \(X_t\) gives \(v^\star\)). Their combination, before conditioning, is a valid \(\xi\) with \(\mathbb{E}_{\Pi^\star}[\xi \mid X_t] = u^\star\). Both the TSI integrand (depending on \((X_0, X_T)\)) and the backward drift integrand (depending on \(X_0\)) are expectations under \(\Pi^\star\) conditioned on \(X_t\). Since \(X_0\) is a marginal of \((X_0, X_T)\), the two integrands live in the same conditional probability space and can be combined before taking \(\mathbb{E}[\cdot | X_t]\).

Here \((X_0, X_t, X_T)\) are all determined by the bridge path: \(X_0, X_T\) from the coupling, \(X_t\) from Equation 7. The Markovianization step fits \(u(X_t,t)\) to \(\mathbb{E}[\xi(X,t) \mid X_t]\) via Equation 6.

The tractability of Equation 11 depends entirely on the coupling scores \(\nabla \log \Pi^\star_{0,T}\).

Three couplings, three algorithms

Half-bridge / adjoint sampling. Set \(\Pi^\star_{0,T} = \delta_{x_0} \otimes \pi\) (Dirac prior, memoryless condition). With \(x_0 = 0\) and \(c(t) = \gamma_t\): since \(\Pi^\star_0 = \delta_0\), we use only the \(X_T\) branch of the TSI (the \(X_0\) integration-by-parts is not needed when \(X_0\) is deterministic), and Equation 11 reduces to Equation 12. The paper derives this in Prop 4 (Appendix C.2); the key steps are below.

Reduction from Equation 11 to Equation 12:

Start from Equation 11 with \(\Pi^\star_{0,T} = \delta_0 \otimes \pi\). Since \(\Pi^\star_0 = \delta_0\), we use only the \(X_T\) branch of the TSI (the \(X_0\) integration-by-parts is not needed when \(X_0\) is deterministic). With \(c(t) = \gamma_t\):

\[ \sigma_t^{-1} \xi = \nabla \log \pi(X_T) - \nabla_{X_t} \log \mathbb{P}_{t|0}(X_t \mid 0). \]

Since \(\mathbb{P}_{t|0}(\cdot \mid 0) = \mathcal{N}(0, \nu_t I)\), its score at \(X_t\) is \(-X_t/\nu_t\). So

\[ \sigma_t^{-1} \xi = \nabla \log \pi(X_T) + \frac{X_t}{\nu_t}. \]

Since \(\mathbb{P}_T = \mathcal{N}(0, \nu_T I)\), its score is \(\nabla \log \mathbb{P}_T(x) = -x/\nu_T\), so \(\nabla \log \pi(X_T) + X_t/\nu_t\) can be rewritten after Markovianization. Under the half-bridge coupling \(\Pi^\star = \delta_0 \otimes \pi \cdot \mathbb{P}_{|0,T}\), the bridge from \(0\) to \(X_T\) gives \(X_t \mid X_T \sim \mathcal{N}(\gamma_t X_T, \nu_T\gamma_t(1-\gamma_t)I)\). The conditional expectation \(\mathbb{E}[X_t/\nu_t \mid X_T] = \gamma_t X_T / \nu_t = X_T/\nu_T\) (using \(\gamma_t = \nu_t/\nu_T\)). So \(\nabla \log \pi(X_T) + X_T/\nu_T = \nabla \log \pi(X_T) - \nabla \log \mathbb{P}_T(X_T) = \nabla \log[\pi/\mathbb{P}_T](X_T)\). After Markovianization:

\[ u^\star(x,t) = \sigma_t\mathbb{E} {\left[ \nabla \log \pi(X_T) + \frac{X_T}{\nu_T} \;\Big|\; X_t = x \right]} = \sigma_t\mathbb{E} {\left[ \nabla \log \frac{\pi(X_T)}{\mathbb{P}_T(X_T)} \;\Big|\; X_t = x \right]} , \]

since \(\nabla \log \mathbb{P}_T(X_T) = -X_T/\nu_T\) for \(\mathbb{P}_T = \mathcal{N}(0,\nu_T I)\). The non-Markovian \(\xi\) that gives this after conditioning is simply \(\sigma_t \nabla_{X_T} \log[\pi/\mathbb{P}_T](X_T)\), which depends only on \(X_T\). The \(X_t\)-dependent bridge score \(X_t/\nu_t\) cancels against the \(X_t\)-dependent part of \(\mathbb{E}[\nabla \log \pi(X_T) \mid X_t]\) under the Markovian projection, leaving only a function of \(X_T\).

Alternatively: the paper shows directly that setting \(c(t) = \gamma_t\) in the general SHB formula (Lemma C.2) causes all \(X_0\)-dependent terms to cancel, leaving \(\sigma_t^{-1}\xi = \nabla_{X_T}\log[\pi/\mathbb{P}_T](X_T)\).

\[ \sigma_t^{-1} \, \xi(X,t) = \nabla_{X_T} \log \frac{\pi(X_T)}{\mathbb{P}_T(X_T)}, \tag{12}\]

where \(\mathbb{P}_T = \mathcal{N}(0, \nu_T I)\) (the terminal marginal of the reference). Simple, but requires Dirac prior and large \(\sigma_t\) for exploration.

Full Schrodinger bridge / adjoint sampling with corrector. Set \(\Pi^\star_{0,T} = \hat\varphi_0(x_0) \, \mathbb{P}_{T|0}(x_T \mid x_0) \, \varphi_T(x_T)\), the Schrodinger bridge coupling. The drift becomes

\[ \sigma_t^{-1} \, \xi(X,t) = \nabla_{X_T} \log \frac{\pi(X_T)}{\hat\varphi_T(X_T)}, \tag{13}\]

where \(\hat\varphi_T\) is the backward Schrodinger potential. This allows arbitrary priors, but \(\hat\varphi_T\) is unknown and must be learned alongside \(u\), requiring alternating IPF-style updates.

Independent coupling / BMS. Set

\[ \Pi^\star_{0,T} = p_0 \otimes \pi. \tag{14}\]

Plug into Equation 11. The coupling scores factor trivially: \(\nabla_{X_0} \log \Pi^\star_{0,T} = \nabla \log p_0(X_0)\) and \(\nabla_{X_T} \log \Pi^\star_{0,T} = \nabla \log \pi(X_T)\). The regression target becomes

\[ \sigma_t^{-1} \, \xi(X,t) = \frac{1-c(t)}{1-\gamma_t} \nabla \log p_0(X_0) + \frac{c(t)}{\gamma_t} \nabla \log \pi(X_T) - \frac{X_t - X_0}{\nu_t}. \tag{15}\]

Every term on the right is known: \(\nabla \log p_0\) is the prior score (assumed known, e.g. Gaussian), \(\nabla \log \pi = \nabla \log \rho\) is the target score (computable from the unnormalized density), and \((X_t - X_0)/\nu_t\) is the Gaussian transition score. No unknown potentials, no alternation.

The independent coupling: why it works

The independent coupling \(p_0 \otimes \pi\) satisfies the boundary constraints by construction: marginalizing over \(X_T\) gives \(p_0\), marginalizing over \(X_0\) gives \(\pi\). The terminal marginal of \(\Pi^\star = (p_0 \otimes \pi) \mathbb{P}_{|0,T}\) is \(\pi\): since \(\mathbb{P}_{T|0,T}(x \mid x_0, x_T) = \delta(x - x_T)\) (the bridge is pinned at its endpoint),

\[ \Pi^\star_T(x) = \int \mathbb{P}_{T|0,T}(x \mid x_0, x_T) \, p_0(x_0) \, \pi(x_T) \, dx_0 \, dx_T = \int \delta(x - x_T) \, \pi(x_T) \, dx_T = \pi(x). \]

The Markovian projection preserves time marginals, so \(\mathbb{P}^{u^\star}_T = \pi\): the controlled SDE with drift \(u^\star\) hits the target at time \(T\). Combined with \(\mathbb{P}^{u^\star}_0 = p_0\) (from the initial condition), \(u^\star\) is a fixed point.

The Schrodinger bridge coupling minimizes path-space KL (as shown in the SB notes). The independent coupling sacrifices this optimality for a fully tractable regression target.

At each iteration, the coupling is \(\Pi^i_{0,T} = \mathbb{P}^{u_i}_0 \otimes \mathbb{P}^{u_i}_T\): independently resample \(X_0\) and \(X_T\) from their marginals under the current SDE. In practice, simulate trajectories, then randomly pair the initial and terminal samples.

Sampling the bridge is cheap: given \((x_0, x_T)\), draw \(X_t\) from Equation 7 and evaluate Equation 15. No full trajectory simulation needed during regression.

Damped iteration

The undamped iteration \(u_{i+1} = \Phi(u_i)\) can overshoot in high dimensions. The damped version uses step size \(\alpha \in (0,1]\):

\[ u_{i+1} = \alpha \, \Phi(u_i) + (1-\alpha) \, u_i. \tag{16}\]

Setting \(\eta = (1-\alpha)/\alpha\), this solves

\[ u_{i+1} = \mathop{\mathrm{argmin}}_u \; {\left\{ \mathbb{E}_{\Pi^i} {\left[ \int_0^T \frac{1}{2} \| \xi - u(X_t,t) \|^2 \, dt \right]} \;+\; \textcolor{blue}{\eta \, \mathbb{E}_{\Pi^i} {\left[ \int_0^T \frac{1}{2} \| u_i(X_t,t) - u(X_t,t) \|^2 \, dt \right]} } \right\}} . \tag{17}\]

Deriving Equation 16 from Equation 17:

Apply the bias-variance decomposition (Pythagorean identity from the Markovian projection) to the first term:

\[ \mathbb{E}_{\Pi^i} {\left[ \|\xi - u\|^2 \right]} = \mathbb{E}_{\Pi^i} {\left[ \|\xi - \Phi(u_i)\|^2 \right]} + \mathbb{E}_{\Pi^i} {\left[ \|\Phi(u_i) - u\|^2 \right]} , \]

where \(\Phi(u_i) = \mathbb{E}_{\Pi^i}[\xi \mid X_t]\). The first piece is independent of \(u\) (irreducible noise from the non-Markovian \(\xi\)). Dropping it, Equation 17 reduces to

\[ u_{i+1} = \mathop{\mathrm{argmin}}_u \; \mathbb{E}_{\Pi^i} {\left[ \tfrac{1}{2}\|\Phi(u_i) - u\|^2 + \tfrac{\eta}{2}\|u_i - u\|^2 \right]} . \]

Pointwise first-order condition: \(-(\Phi(u_i) - u) - \eta(u_i - u) = 0\), giving \((1+\eta)u = \Phi(u_i) + \eta \, u_i\). So \(u = \frac{1}{1+\eta}\Phi(u_i) + \frac{\eta}{1+\eta}u_i\). With \(\eta = (1-\alpha)/\alpha\): \(\frac{1}{1+\eta} = \alpha\) and \(\frac{\eta}{1+\eta} = 1-\alpha\), recovering Equation 16.

The \( \textcolor{blue}{\text{second term}}\) penalizes deviation from the previous iterate. Each step balances fitting new bridge data against staying close to \(u_i\), preventing mode collapse from aggressive updates.

Summary

Method	Coupling \(\Pi^i_{0,T}\)	Regression target \(\sigma_t^{-1} \xi\)	Limitation
AS	\(\delta_{x_0} \otimes \mathbb{P}^{u_i}_T\)	\(\nabla \log [\pi/\mathbb{P}_T](X_T)\)	Dirac prior
AS + corrector	\(\mathbb{P}^{u_i}_{0,T}\)	\(\nabla \log [\pi/\hat\varphi_T](X_T)\)	Alternating opt.
BMS	\(\mathbb{P}^{u_i}_0 \otimes \mathbb{P}^{u_i}_T\)	Equation 15	None (single obj.)

All three converge to a fixed point \(u^\star\) transporting \(p_0\) to \(\pi\). The matching loss Equation 6 is a forward KL objective: \(u^\star = \mathop{\mathrm{argmin}}_u D_{\text{KL}}(\Pi^\star \mid \mathbb{P}^u)\). This follows from the Girsanov KL decomposition: \(D_{\text{KL}}(\Pi^\star \| \mathbb{P}^u) = \text{(irreducible variance)} + \mathbb{E}_{\Pi^\star}[\int \frac{1}{2}\|\xi - u\|^2 dt]\), so minimizing the matching loss over \(u\) is equivalent to minimizing \(D_{\text{KL}}(\Pi^\star \| \mathbb{P}^u)\). Forward KL is mode-covering (it penalizes placing zero mass where \(\Pi^\star\) has mass); since the Markovian projection preserves time marginals, mode coverage at the path level implies mode coverage at the terminal marginal level, which drives mode diversity in practice.

References

Brunick, Gerard, and Steven Shreve. 2013. “Mimicking an Itô Process by a Solution of a Stochastic Differential Equation.” The Annals of Applied Probability 23 (4). Institute of Mathematical Statistics. doi:10.1214/12-aap881.