Sampling via controlled Brownian motion
Consider a controlled Brownian motion starting at the origin,
\[ dX_t = \sigma_t \, u(t, X_t) \, dt + \sigma_t \, dW_t, \quad X_0 = 0, \quad t \in [0,1], \tag{1}\]
where \(\sigma_t > 0\) is a noise schedule and \(u: [0,1] \times \mathbb{R}^d \to \mathbb{R}^d\) is the control. With \(u \equiv 0\), this is a scaled Brownian motion with marginals \(p^\text{base}_t(x) = \mathcal{N}(0, \nu_t I)\) where \(\nu_t = \int_0^t \sigma_s^2 \, ds\). The goal is to find \(u\) so that the terminal distribution \(p^u_1\) equals a target Boltzmann distribution \(\pi(x) \propto \exp(-E(x)/\tau)\), where \(E\) is an energy function and \(\tau > 0\) is a temperature. As described in the SOC notes, minimizing the KL divergence from the controlled path measure \(\mathbb{P}^u\) to the Schrodinger bridge \(\mathbb{P}^\star(\boldsymbol{X}) = \mathbb{P}(\boldsymbol{X} \mid X_1) \, \pi(X_1)\) gives the SOC problem
\[ \min_u \; J(u) \; = \; \min_u \; \mathbb{E}_{\mathbb{P}^u} {\left[ \int_0^1 \tfrac{1}{2} \|u(t, X_t)\|^2 \, dt + g(X_1) \right]} , \tag{2}\]
where \(g(x) = \log p^\text{base}_1(x) + E(x)/\tau\) is the terminal cost (up to an additive constant \(\log \mathcal{Z}\) that does not affect the optimal control). Indeed, \(\pi \propto \exp(-E/\tau)\) so \(\log(p^\text{base}_1/\pi) = \log p^\text{base}_1 + E/\tau + \log \mathcal{Z}\).
The Girsanov theorem gives \(D_{\text{KL}}(\mathbb{P}^u, \mathbb{P}) = \frac{1}{2} \mathbb{E}\int_0^1 \|u\|^2 \, dt\), so the SOC objective trades off terminal cost against control effort. The HJB notes use a maximization convention; our minimization Equation 2 corresponds to \(f = 0\) and \(g_\text{HJB} = -g\). The optimal control is \(u^\star(t,x) = \sigma_t \nabla \log h(t,x)\) where
\[ h(t,x) = \mathbb{E} {\left[ \exp(-g(X_1)) \mid X_t = x \right]} \tag{3}\]
under the base process. This is the Doob h-transform with \(g_{\text{Doob}} = -g\): the function \(h\) tilts the path measure toward low-cost terminal states. The optimal control \(u^\star\) points towards regions where \(h\) is large, i.e. where the energy \(E\) is low. The function \(\log h\) acts as the reward-to-go; the SOC cost is \(J(u^\star) = -\log h(0,0)\), and at optimality \(D_{\text{KL}}(\mathbb{P}^{u^\star}, \mathbb{P}^\star) = 0\).
The adjoint is constant
The adjoint matching framework turns the SOC problem into a regression. For a controlled SDE with base drift \(b\) and running cost \(f\), the adjoint method gives the lean adjoint (the \(u\)-independent part of the full adjoint ODE, as derived in adjoint matching):
\[ \frac{d}{dt} \tilde{a}(t; \boldsymbol{X}) = -\nabla_x f - (\nabla_x b)^\top \tilde{a}(t; \boldsymbol{X}), \quad \tilde{a}(1; \boldsymbol{X}) = \nabla g(X_1). \tag{4}\]
Now set \( \textcolor{blue}{b = 0}\) and \( \textcolor{blue}{f = 0}\) (our base process is pure Brownian motion, and the running cost is just the control penalty which does not enter the lean adjoint). Both forcing terms vanish: \(\frac{d}{dt}\tilde{a} = 0\). So \(\tilde{a}\) is constant along each trajectory:
\[ \textcolor{blue}{\tilde{a}(t; \boldsymbol{X}) = \nabla g(X_1)}, \quad \text{for all } t \in [0,1]. \tag{5}\]
No backward ODE to solve. The regression target at every time \(t\) is just \(\nabla g(X_1)\). The adjoint matching loss becomes
\[ L_\text{AM}(u) = \mathbb{E}_{\boldsymbol{X} \sim \mathbb{P}^{\bar{u}}} {\left[ \int_0^1 \tfrac{1}{2} \| u(t, X_t) + \sigma_t \, \nabla g(X_1) \|^2 \, dt \right]} , \tag{6}\]
where \(\bar{u} = \texttt{stopgrad}(u)\). This regresses the control \(u(t, X_t)\) onto \(-\sigma_t \, \nabla g(X_1)\) for each intermediate state \(X_t\) along a trajectory. At the fixed point, \(u^\star(t,x) = -\sigma_t \mathbb{E}_{\mathbb{P}^{u^\star}}[\nabla g(X_1) \mid X_t = x]\) (the expectation is under the optimally controlled process; under \(\mathbb{P}^{u^\star}\), the conditional distribution of \(X_1 \mid X_t\) includes the \(\exp(-g)\) tilt). Comparing with \(u^\star = \sigma_t \nabla \log h\) from Equation 3, the adjoint target \(\nabla g(X_1)\) is a stochastic estimate of \(-\nabla \log h(t, X_t)\): the regression averages many noisy terminal gradients to recover the gradient of \(\log h\).
Why is the adjoint constant? A direct argument.
With \(b = 0\), a trajectory of the base process is \(X_t = \int_0^t \sigma_s \, dW_s\), so \(X_1 = X_t + \int_t^1 \sigma_s \, dW_s\) where the two pieces are independent. Along a fixed trajectory (the Brownian increments after \(t\) are held constant), perturbing \(X_t\) by \(\delta X_t\) shifts \(X_1\) by the same \(\delta X_t\). Hence \(\nabla_{X_t} g(X_1) = \nabla g(X_1)\) along the trajectory, confirming Equation 5.
From trajectories to pairs
The loss Equation 6 only depends on the pair \((X_t, X_1)\), not the full trajectory:
\[ L_\text{AM}(u) = \int_0^1 \mathbb{E}_{(X_t, X_1) \sim p^{\bar{u}}_{t,1}} {\left[ \tfrac{1}{2} \| u(t, X_t) + \sigma_t \, \nabla g(X_1) \|^2 \right]} \, dt. \tag{7}\]
This is still expensive: sampling \((X_t, X_1) \sim p^{\bar{u}}_{t,1}\) requires simulating the controlled SDE. The next section removes this cost.
Reciprocal projection
At the optimal solution \(u^\star\), the controlled path measure equals the Schrodinger bridge: \(\mathbb{P}^{u^\star} = \mathbb{P}^\star\). Since \(\mathbb{P}^\star(\boldsymbol{X}) = \mathbb{P}(\boldsymbol{X} \mid X_1) \, \pi(X_1)\), marginalizing over all times except \(t\) and \(1\) gives
\[ p^{u^\star}_{t,1}(x_t, x_1) = p^\text{base}_{t \mid 1}(x_t \mid x_1) \, \pi(x_1). \tag{8}\]
If you know \(X_1\), the intermediate state \(X_t\) is distributed as a Brownian bridge conditioned on the endpoint. Away from optimality, this factorization does not hold for \(p^u_{t,1}\) because the controlled dynamics create correlations between \(X_t\) and \(X_1\) that differ from those of the base process. The reciprocal projection replaces \(p^{\bar{u}}_{t,1}(x_t, x_1)\) by
\[ p^\text{base}_{t \mid 1}(x_t \mid x_1) \, p^{\bar{u}}_1(x_1). \tag{9}\]
We keep the terminal marginal \(p^{\bar{u}}_1\) from the current control but fill in the interior with the Brownian bridge. This is a projection onto the reciprocal class of the base process: the set of path measures that share the bridge structure of \(\mathbb{P}\).
What does \(p^\text{base}_{t \mid 1}\) look like? The base process \(X_t = \int_0^t \sigma_s \, dW_s\) has \(X_t \sim \mathcal{N}(0, \nu_t)\) and \(X_1 \sim \mathcal{N}(0, \nu_1)\). Since \(X_1 = X_t + \int_t^1 \sigma_s \, dW_s\) with independent increments, \(\mathop{\mathrm{Cov}}(X_t, X_1) = \nu_t I\). Standard Gaussian conditioning gives
\[ X_t \mid X_1 = x_1 \; \sim \; \mathcal{N} {\left( \frac{\nu_t}{\nu_1} x_1, \;\; \frac{\nu_t(\nu_1 - \nu_t)}{\nu_1} \, I \right)} . \tag{10}\]
For constant \(\sigma_t = \sigma\), this simplifies to \(\nu_t = \sigma^2 t\) and \(X_t \mid X_1 \sim \mathcal{N}(t \, x_1, \; \sigma^2 t(1-t) \, I)\), the usual Brownian bridge. Sampling from Equation 10 is a single Gaussian draw.
Substituting Equation 9 into Equation 7 gives the Reciprocal Adjoint Matching (RAM) loss:
\[ L_\text{RAM}(u) = \int_0^1 \lambda(t) \, \mathbb{E}_{X_1 \sim p^{\bar{u}}_1, \; X_t \sim p^\text{base}_{t \mid 1}(\cdot \mid X_1)} {\left[ \tfrac{1}{2} \| u(t, X_t) + \sigma_t \, \nabla g(X_1) \|^2 \right]} \, dt, \tag{11}\]
where \(\lambda(t) = 1/\sigma_t^2\) is a time weighting that normalizes the regression target magnitude across time (does not change the optimum).
Sampling the pair \((X_t, X_1)\) now proceeds as: (1) generate \(X_1\) from the current model by running the controlled SDE forward (expensive, done infrequently), (2) for each \(X_1\), sample \(t \sim \text{Uniform}([0,1])\) and \(X_t \sim p^\text{base}_{t \mid 1}(\cdot \mid X_1)\) via Equation 10 (one cheap Gaussian draw). Many gradient updates per energy evaluation.
Why the projection helps
Define the projection operator \(\Pi\) that maps a control \(u\) to the optimal control for the Schrodinger bridge with terminal marginal \(p^u_1\):
\[ \Pi(u) = \mathop{\mathrm{argmin}}_v \; \mathbb{E}_{\mathbb{P}^v} {\left[ \int_0^1 \tfrac{1}{2} \|v\|^2 \, dt + \log \frac{p^\text{base}_1(X_1)}{p^u_1(X_1)} \right]} . \tag{12}\]
The projection \(\Pi(u)\) solves the SOC problem of reaching the current terminal distribution \(p^u_1\) with minimal effort. Two properties follow from the definition.
First, \(J(u) \geq J(\Pi(u))\): projecting never increases the SOC cost.
Proof sketch.
Split the terminal cost \(g = g_1 + g_2\) where \(g_1(x) = \log(p^\text{base}_1(x) / p^u_1(x))\) and \(g_2(x) = \log(p^u_1(x) / \pi(x))\). Then
\[ J(u) = \mathbb{E}_{\mathbb{P}^u} {\left[ \tfrac{1}{2}\|u\|^2 + g_1(X_1) \right]} + \mathbb{E}_{p^u_1} {\left[ g_2(X_1) \right]} . \]
The first term equals \(D_{\text{KL}}\!\big(\mathbb{P}^u,\; \mathbb{P}(\cdot \mid X_1) \, p^u_1(X_1)\big)\) by Girsanov, and is minimized over all controls sharing terminal marginal \(p^u_1\) by \(\Pi(u)\). Since \(\Pi(u)\) preserves the terminal marginal, \(p^{\Pi(u)}_1 = p^u_1\), the second term is unchanged. Hence \(J(\Pi(u)) \leq J(u)\).Second, after projection, RAM and AM coincide: \(L_\text{RAM}(\Pi(u)) = L_\text{AM}(\Pi(u))\). This holds because \(\Pi(u)\) is itself a Schrodinger bridge (with terminal marginal \(p^u_1\)), so its joint \((X_t, X_1)\) already factorizes as bridge times marginal, and the reciprocal projection is the identity.
The algorithm
The full Adjoint Sampling (havens2025adjoint?) algorithm alternates between two phases.
Outer loop (expensive): simulate the controlled SDE to produce terminal samples \(\{X_1^{(i)}\}\). Evaluate \(\nabla g(X_1^{(i)})\) for each sample and store the pairs in a replay buffer \(\cB\).
Inner loop (cheap): draw \((X_1, \nabla g)\) from the buffer, sample \(t \sim \text{Uniform}([0,1])\) and \(X_t\) from Equation 10, and update \(u\) by gradient descent on the RAM loss Equation 11.
The inner loop runs for many iterations without touching the energy function. The buffer is refreshed when the control has drifted enough that \(p^u_1\) no longer matches the stored samples.
The alternating scheme has a clean fixed-point interpretation. Each outer step implicitly performs the reciprocal projection \(\Pi\), and each inner step minimizes the AM loss on the projected control. Concretely, if \(u_i\) is the current control and we fully converge the RAM loss with \(X_1\) samples from \(p^{u_i}_1\), the update satisfies
\[ u_{i+1} = \Pi(u_i) - \frac{\delta L_\text{AM}}{\delta u}(\Pi(u_i)). \]
The fixed point \(u = \Pi(u)\) with \(\frac{\delta L_\text{AM}}{\delta u}(u) = 0\) is exactly the optimal control \(u^\star\). In practice, the inner loop is not run to convergence and the buffer mixes samples from several prior iterations, smoothing the optimization.
The same regression target \(-\sigma_t \nabla g(X_1)\) appears in PDDS (phillips2024particle?) and TSM (de2024target?). The difference is in the expectation: those methods sample \(X_1\) from the target \(\pi\) (using SMC or importance sampling), while Adjoint Sampling samples \(X_1\) from the current model \(p^u_1\). This makes Adjoint Sampling on-policy with a moving target that converges to the fixed point, rather than a single regression against an approximate sample from \(\pi\).