Bridge Matching Sampler – Alexandre Thiéry

Consider a controlled SDE on \([0,T]\) with scalar noise schedule \(\sigma_t > 0\),

\[ dX_t = \sigma_t \, u(X_t, t) \, dt + \sigma_t \, dW_t, \quad X_0 \sim p_0. \tag{1}\]

The goal: find \(u\) such that \(X_T \sim \pi\), where \(\pi(x) = \rho(x)/\mathcal{Z}\) is a target density known up to normalization. Adjoint sampling solves this with a Dirac prior and memoryless condition; adding a corrector for arbitrary priors requires alternating optimization. The Bridge Matching Sampler (BMS) identifies a single coupling, the independent coupling, that makes the regression target fully tractable and removes the need for alternation.

Nelson’s relation

Let \(\mathbb{P}^u\) denote the path measure of Equation 1 with time marginals \(p_t\). Write the Euler discretization with step \(\delta\):

\[ X_{t+\delta} = X_t + \sigma_t \, u(X_t,t) \, \delta + \sigma_t \sqrt{\delta} \, \mathbf{n}, \qquad \mathbf{n}\sim \mathcal{N}(0,I). \tag{2}\]

The forward conditional mean is \(\mathbb{E}[X_{t+\delta} \mid X_t = x] = x + \sigma_t \, u(x,t) \, \delta\). Now compute the backward conditional mean \(\mathbb{E}[X_t \mid X_{t+\delta} = y]\) using Bayes’ rule, exactly as in the reverse diffusions note. For \(\delta \ll 1\):

\[ \mathbb{P}(X_t \in dx \mid X_{t+\delta} = y) \;\propto\; p_t(x) \, \exp {\left\{ -\frac{\|y - x - \sigma_t \, u(x,t) \, \delta\|^2}{2 \sigma_t^2 \, \delta} \right\}} . \]

Expanding \(p_t(x) \approx p_t(y) \exp(\left< \nabla \log p_t(y), x - y \right>)\) and completing the square, the conditional mean is

\[ \mathbb{E}[X_t \mid X_{t+\delta} = y] = y - \sigma_t \, u(y,t) \, \delta + \sigma_t^2 \, \nabla \log p_t(y) \, \delta + O(\delta^2). \]

The backward conditional mean from Equation 2 (reversing time, as in the reverse diffusions note) gives the backward drift. Define \(v(y,t)\) as the drift of the time-reversed process, with the sign convention that

\[u(x,t) + v(x,t) = \sigma_t \nabla \log p_t(x).\]

Reading off from the backward conditional mean: the forward drift contributes \(\sigma_t u\) and the score correction is \(\sigma_t^2 \nabla \log p_t\). Their difference gives \(v(x,t) = \sigma_t \nabla \log p_t(x) - u(x,t)\): the backward drift points toward the prior (via the score) and against the forward drift.

This is Nelson’s relation:

\[ \textcolor{blue}{u(x,t) + v(x,t) = \sigma_t \, \nabla \log p_t(x).} \tag{3}\]

Intuitively, \(v = \sigma_t \nabla \log p_t - u\) is the drift that would reverse the forward process. From the reverse diffusions note, the time-reversed SDE has drift \(-\sigma_t u + \sigma_t^2 \nabla \log p_t\); dividing by \(\sigma_t\) gives \(v = -u + \sigma_t \nabla \log p_t\), confirming Nelson’s relation.

Equivalently, the time-reversed process satisfies \(d\overleftarrow{X}_s = \sigma_{T-s} v(\overleftarrow{X}_s, T-s) ds + \sigma_{T-s} d\overleftarrow{B}_s\) (cf. the reverse diffusions note).

This holds for any Markov diffusion of the form Equation 1 with scalar noise schedule \(\sigma_t\) and marginals \(p_t\).

Reciprocal class and Markovian projection

Let \(\mathbb{P}\) denote the reference (uncontrolled) process \(dX_t = \sigma_t \, dW_t\). Write \(\mathbb{P}_{|0,T}(\cdot | x_0, x_T)\) for the law of the reference process conditioned on \(X_0 = x_0, X_T = x_T\). A path measure \(\Pi\) belongs to the reciprocal class \(\mathcal{R}(\mathbb{P})\) if it has the form \(\Pi = \Pi_{0,T} \, \mathbb{P}_{|0,T}\), where \(\Pi_{0,T}\) is an endpoint coupling and \(\mathbb{P}_{|0,T}\) is the reference bridge (Brownian bridge for Brownian \(\mathbb{P}\)); see the Schrodinger bridges note.

A reciprocal measure is generally non-Markovian: the bridge drift depends on \(X_T\). The Markovian projection finds a Markovian drift \(u^\star\) whose time marginals match those of \(\Pi^\star\). This is an \(L^2\) projection: if \(\xi(X,t)\) is the path-dependent drift of \(\Pi^\star\), then

\[ u^\star(x,t) = \mathbb{E}_{\Pi^\star} {\left[ \xi(X,t) \mid X_t = x \right]} . \tag{4}\]

Why? For any Markovian \(u\), expand \(\mathbb{E}_{\Pi^\star}[\|\xi - u(X_t,t)\|^2]\) and use the tower property:

\[ \mathbb{E}_{\Pi^\star} {\left[ \|\xi - u\|^2 \right]} = \mathbb{E}_{\Pi^\star} {\left[ \|\xi - u^\star\|^2 \right]} + \mathbb{E}_{\Pi^\star} {\left[ \|u^\star - u\|^2 \right]} . \]

The cross-term vanishes because \(\mathbb{E}_{\Pi^\star}[\xi - u^\star \mid X_t] = 0\) by definition of \(u^\star\). So \(u^\star\) minimizes the matching loss

\[ u^\star = \mathop{\mathrm{argmin}}_{u} \; \mathbb{E}_{\Pi^\star} {\left[ \int_0^T \frac{1}{2} \| \xi(X,t) - u(X_t,t) \|^2 \, dt \right]} . \tag{5}\]

Fixed-point iteration

All three methods (adjoint sampling with Dirac prior, adjoint sampling with corrector, BMS) follow the same template. Starting from some control \(u_0\):

Simulate the current SDE \(\mathbb{P}^{u_i}\) to generate endpoint pairs \((X_0, X_T)\).
Reciprocal projection: form a coupling \(\Pi^i_{0,T}\) from the endpoints, define \(\Pi^i = \Pi^i_{0,T} \, \mathbb{P}_{|0,T}\).
Markovianize: update \(u_{i+1}\) by regressing onto the bridge drift via Equation 5.

If \(u_i = u^\star\), then \(\Pi^i = \Pi^\star\) and \(u_{i+1} = u^\star\), so \(u^\star\) is a fixed point. What distinguishes the methods is the coupling in step 2, which determines the regression target \(\xi\).

Target score identity

To get a tractable \(\xi\), we need the score \(\nabla \log \Pi^\star_t(x)\). Define the cumulative variance \(\nu_t = \int_0^t \sigma_s^2 \, ds\) and \(\gamma_t = \nu_t/\nu_T\). The bridge \(\mathbb{P}_{|0,T}\) is Gaussian: for \(t \in (0,T)\),

\[ X_t \mid (X_0, X_T) \;\sim\; \mathcal{N} {\left( (1-\gamma_t) X_0 + \gamma_t X_T, \;\; \nu_T \gamma_t(1-\gamma_t) \, I \right)} . \tag{6}\]

The marginal density of \(\Pi^\star\) at time \(t\) is

\[ \Pi^\star_t(x) = \int \mathbb{P}_{t|0,T}(x \mid x_0, x_T) \, \Pi^\star_{0,T}(x_0, x_T) \, dx_0 \, dx_T. \tag{7}\]

Differentiate \(\log \Pi^\star_t(x)\):

\[ \nabla_x \log \Pi^\star_t(x) = \frac{\int \nabla_x \mathbb{P}_{t|0,T}(x \mid x_0, x_T) \, \Pi^\star_{0,T}(x_0, x_T) \, dx_0 \, dx_T}{\Pi^\star_t(x)}. \]

Since \(\mathbb{P}_{t|0,T}\) is Gaussian with mean \((1-\gamma_t)x_0 + \gamma_t \, x_T\),

\[ \nabla_x \log \mathbb{P}_{t|0,T}(x \mid x_0, x_T) = -\frac{x - (1-\gamma_t)x_0 - \gamma_t \, x_T}{\nu_T\gamma_t(1-\gamma_t)}. \]

Integration by parts to swap gradients:

Integrate by parts in \(x_0\). Since the Gaussian mean is linear in \(x_0\) with coefficient \((1-\gamma_t)\), shifting \(x\) by \(\varepsilon\) is equivalent to shifting \(x_0\) by \(-\varepsilon/(1-\gamma_t)\). That is, \(\nabla_x \mathbb{P}_{t|0,T} = -\frac{1}{1-\gamma_t} \nabla_{x_0} \mathbb{P}_{t|0,T}\). So

\[ \nabla_x \log \Pi^\star_t(x) = \frac{1}{1-\gamma_t} \frac{\int \mathbb{P}_{t|0,T}(x \mid x_0, x_T) \, \nabla_{x_0} \Pi^\star_{0,T}(x_0, x_T) \, dx_0 \, dx_T}{\Pi^\star_t(x)}, \]

where we integrated by parts, moving \(\nabla_{x_0}\) from \(\mathbb{P}_{t|0,T}\) onto \(\Pi^\star_{0,T}\) (boundary terms vanish). Write \(\Pi^\star_{0,T|t}\) for the conditional distribution of \((X_0, X_T)\) given \(X_t = x\) under \(\Pi^\star\). Recognizing the conditional expectation:

\[ \nabla_x \log \Pi^\star_t(x) = \mathbb{E}_{\Pi^\star_{0,T|t}} {\left[ \frac{1}{1-\gamma_t} \nabla_{X_0} \log \Pi^\star_{0,T}(X_0,X_T) \;\Big|\; X_t = x \right]} . \]

The same argument with integration by parts in \(x_T\) gives

\[ \nabla_x \log \Pi^\star_t(x) = \mathbb{E}_{\Pi^\star_{0,T|t}} {\left[ \frac{1}{\gamma_t} \nabla_{X_T} \log \Pi^\star_{0,T}(X_0,X_T) \;\Big|\; X_t = x \right]} . \]

Since both expressions equal the same score, any convex combination \((1-c(t))\) times the first plus \(c(t)\) times the second is also valid, for any \(c(t) \in (0,1]\).

This gives the generalized target score identity: for any \(c(t) \in (0,1]\),

\[ \nabla \log \Pi^\star_t(x) = \mathbb{E}_{\Pi^\star_{0,T|t}} {\left[ \frac{1-c(t)}{1-\gamma_t} \nabla_{X_0} \log \Pi^\star_{0,T} + \frac{c(t)}{\gamma_t} \nabla_{X_T} \log \Pi^\star_{0,T} \;\Big|\; X_t = x \right]} . \tag{8}\]

General regression target

Now combine everything. For a reciprocal process \(\Pi^\star = \Pi^\star_{0,T} \mathbb{P}_{|0,T}\), the bridge from \(X_0\) to \(X_T\) has backward drift \(\sigma_t \nabla_{X_t} \log \mathbb{P}_{t|0}(X_t | X_0)\) (the score of the forward transition kernel, pointing back toward \(X_0\)). Since \(X_0\) is random under \(\Pi^\star\), the Markovian backward drift averages over \(X_0\):

\[ v^\star(x,t) = \mathbb{E}_{\Pi^\star_{0|t}} {\left[ \sigma_t \, \nabla_{X_t} \log \mathbb{P}_{t|0}(X_t \mid X_0) \;\Big|\; X_t = x \right]} . \tag{9}\]

Since \(\mathbb{P}_{t|0}\) is Gaussian \(\mathcal{N}(X_0, \nu_t I)\), this is \(-\sigma_t(X_t - X_0)/\nu_t\) averaged over \(X_0 \mid X_t\). Using Nelson’s relation \(u^\star = \sigma_t \nabla \log \Pi^\star_t - v^\star\) from Equation 3, and substituting Equation 8 for the score and Equation 9 for \(v^\star\), the non-Markovian drift \(\xi\) that we need to regress onto satisfies

\[ \sigma_t^{-1} \, \xi(X,t) = \frac{1-c(t)}{1-\gamma_t} \nabla_{X_0} \log \Pi^\star_{0,T}(X_0,X_T) + \frac{c(t)}{\gamma_t} \nabla_{X_T} \log \Pi^\star_{0,T}(X_0,X_T) - \nabla_{X_t} \log \mathbb{P}_{t|0}(X_t \mid X_0). \tag{10}\]

Here \((X_0, X_t, X_T)\) are all determined by the bridge path: \(X_0, X_T\) from the coupling, \(X_t\) from Equation 6. The Markovianization step fits \(u(X_t,t)\) to \(\mathbb{E}[\xi(X,t) \mid X_t]\) via Equation 5.

The tractability of Equation 10 depends entirely on the coupling scores \(\nabla \log \Pi^\star_{0,T}\).

Three couplings, three algorithms

Half-bridge / adjoint sampling. Set \(\Pi^\star_{0,T} = \delta_{x_0} \otimes \pi\) (Dirac prior, memoryless condition). With \(\Pi^\star_{0,T} = \delta_{x_0} \otimes \pi\), the endpoint \(X_0 = x_0\) is deterministic, so the \(\nabla_{X_0}\) term in Equation 10 is absent (there is no randomness in \(X_0\) to average over). Setting \(x_0 = 0\) and \(c(t) = \gamma_t\), the remaining terms in Equation 10 simplify:

Simplification of Equation 10 for adjoint sampling:

Start from Equation 10 with only the \(X_T\) and bridge terms (the \(X_0\) term drops since \(X_0\) is a Dirac):

\[ \sigma_t^{-1} \xi = \frac{c(t)}{\gamma_t} \nabla \log \pi(X_T) - \frac{X_t - x_0}{\nu_t}. \]

With \(c(t) = \gamma_t\) and \(x_0 = 0\): \(\sigma_t^{-1} \xi = \nabla \log \pi(X_T) - X_t / \nu_t\). The term \(X_t/\nu_t\) is \(-\nabla_{X_t} \log \mathbb{P}_{t|0}(X_t \mid 0)\), the score of the base transition from \(0\). After Markovianization (taking \(\mathbb{E}[\cdot \mid X_t]\)), the \(X_T\)-dependent piece averages over the bridge, while \(X_t/\nu_t\) stays fixed. The resulting Markovian drift is \(\nabla \log [\pi / \mathbb{P}_T](X_T)\) evaluated through the bridge expectation.

\[ \sigma_t^{-1} \, \xi(X,t) = \nabla_{X_T} \log \frac{\pi(X_T)}{\mathbb{P}_T(X_T)}, \tag{11}\]

where \(\mathbb{P}_T = \mathcal{N}(0, \nu_T I)\) (the terminal marginal of the reference). Simple, but requires Dirac prior and large \(\sigma_t\) for exploration.

Full Schrodinger bridge / adjoint sampling with corrector. Set \(\Pi^\star_{0,T} = \hat\varphi_0(x_0) \, \mathbb{P}_{T|0}(x_T \mid x_0) \, \varphi_T(x_T)\), the Schrodinger bridge coupling. The drift becomes

\[ \sigma_t^{-1} \, \xi(X,t) = \nabla_{X_T} \log \frac{\pi(X_T)}{\hat\varphi_T(X_T)}, \tag{12}\]

where \(\hat\varphi_T\) is the backward Schrodinger potential. This allows arbitrary priors, but \(\hat\varphi_T\) is unknown and must be learned alongside \(u\), requiring alternating IPF-style updates.

Independent coupling / BMS. Set

\[ \Pi^\star_{0,T} = p_0 \otimes \pi. \tag{13}\]

Plug into Equation 10. The coupling scores factor trivially: \(\nabla_{X_0} \log \Pi^\star_{0,T} = \nabla \log p_0(X_0)\) and \(\nabla_{X_T} \log \Pi^\star_{0,T} = \nabla \log \pi(X_T)\). The regression target becomes

\[ \sigma_t^{-1} \, \xi(X,t) = \frac{1-c(t)}{1-\gamma_t} \nabla \log p_0(X_0) + \frac{c(t)}{\gamma_t} \nabla \log \pi(X_T) - \frac{X_t - X_0}{\nu_t}. \tag{14}\]

Every term on the right is known: \(\nabla \log p_0\) is the prior score (Gaussian), \(\nabla \log \pi = \nabla \log \rho\) is the target score (computable), and \((X_t - X_0)/\nu_t\) is the Gaussian transition score. No unknown potentials, no alternation.

The independent coupling: why it works

The independent coupling \(p_0 \otimes \pi\) satisfies the boundary constraints by construction: marginalizing over \(X_T\) gives \(p_0\), marginalizing over \(X_0\) gives \(\pi\). If \(u^\star\) is the Markovian projection of \((p_0 \otimes \pi) \mathbb{P}_{|0,T}\), then running the SDE with \(u^\star\) gives \(X_0 \sim p_0\) and \(X_T \sim \pi\) by construction, so \(u^\star\) is a fixed point. The Schrodinger bridge coupling minimizes path-space KL. The independent coupling sacrifices this optimality for a fully tractable regression target.

At each iteration, the coupling is \(\Pi^i_{0,T} = \mathbb{P}^{u_i}_0 \otimes \mathbb{P}^{u_i}_T\): independently resample \(X_0\) and \(X_T\) from their marginals under the current SDE. In practice, simulate trajectories, then randomly pair the initial and terminal samples.

Sampling the bridge is cheap: given \((x_0, x_T)\), draw \(X_t\) from Equation 6 and evaluate Equation 14. No full trajectory simulation needed during regression.

Damped iteration

The undamped iteration \(u_{i+1} = \Phi(u_i)\) can overshoot in high dimensions. The damped version uses step size \(\alpha \in (0,1]\):

\[ u_{i+1} = \alpha \, \Phi(u_i) + (1-\alpha) \, u_i. \tag{15}\]

Setting \(\eta = (1-\alpha)/\alpha\), this solves

\[ u_{i+1} = \mathop{\mathrm{argmin}}_u \; {\left\{ \mathbb{E}_{\Pi^i} {\left[ \int_0^T \frac{1}{2} \| \xi - u(X_t,t) \|^2 \, dt \right]} \;+\; \textcolor{blue}{\eta \, \mathbb{E}_{\Pi^i} {\left[ \int_0^T \frac{1}{2} \| u_i(X_t,t) - u(X_t,t) \|^2 \, dt \right]} } \right\}} . \tag{16}\]

The \( \textcolor{blue}{\text{second term}}\) penalizes deviation from the previous iterate. Each step balances fitting new bridge data against staying close to \(u_i\), preventing mode collapse from aggressive updates.

Summary

Method	Coupling \(\Pi^i_{0,T}\)	Regression target \(\sigma_t^{-1} \xi\)	Limitation
AS	\(\delta_{x_0} \otimes \mathbb{P}^{u_i}_T\)	\(\nabla \log [\pi/\mathbb{P}_T](X_T)\)	Dirac prior
AS + corrector	\(\mathbb{P}^{u_i}_{0,T}\)	\(\nabla \log [\pi/\hat\varphi_T](X_T)\)	Alternating opt.
BMS	\(\mathbb{P}^{u_i}_0 \otimes \mathbb{P}^{u_i}_T\)	Equation 14	None (single obj.)

All three converge to a fixed point \(u^\star\) transporting \(p_0\) to \(\pi\). The matching loss Equation 5 is a forward KL objective: \(u^\star = \mathop{\mathrm{argmin}}_u D_{\text{KL}}(\Pi^\star \mid \mathbb{P}^u)\). Forward KL is mode-covering (it penalizes placing zero mass where \(\Pi^\star\) has mass), which drives mode diversity in practice.