Functional Adjoint Sampler – Alexandre Thiéry

Motivation

Adjoint sampling learns a control drift \(u(t,x)\) that steers a Brownian motion in \(\mathbb{R}^d\) toward a target \(\pi(x) \propto \exp(-E(x))\). Some problems require sampling paths rather than points. Transition path sampling (TPS) is the canonical example: a molecular system has two metastable states \(A\) and \(B\), and the goal is to sample reactive trajectories \(\mathbf{x}: [0,L] \to \mathbb{R}^d\) connecting them. The target is a Gibbs measure over paths, \[ \pi[\mathbf{x}] \propto \exp {\left( -S[\mathbf{x}] \right)} , \qquad S[\mathbf{x}] = \int_0^L E(\mathbf{x}(s)) \, ds. \]

The Functional Adjoint Sampler (FAS) (bae2025functional?) applies adjoint sampling to this setting with an OU reference process instead of Brownian motion. After discretization, the “path” is just a vector in \(\mathbb{R}^{dK}\). FAS is adjoint sampling in \(\mathbb{R}^{dK}\) with an OU reference process. The infinite-dimensional formulation in the original paper is a presentation choice, not a mathematical necessity. Strip away the infinite-dimensional machinery and the core idea is: adjoint sampling on paths instead of points, with an OU reference instead of Brownian motion, and Dirichlet boundary conditions to pin endpoints.

(The other notes use \([0,1]\); here we use \([0,T]\) to match the FAS paper.)

Setup: SOC on path space

Discretize the path into \(K\) interior nodes: \(\mathbf{x} = (x_1, \ldots, x_K) \in \mathbb{R}^{dK}\), with boundary conditions \(x_0 = a \in A\) and \(x_{K+1} = b \in B\) fixed. The target density on the interior nodes is \(\pi(\mathbf{x}) \propto \exp(-S(\mathbf{x})) \, \nu(\mathbf{x})\) for some reference measure \(\nu\). As in adjoint sampling, the terminal cost absorbs the mismatch: \(g(\mathbf{x}) = S(\mathbf{x}) + \log q_T(\mathbf{x}_0, \mathbf{x}) - \log \nu_\infty(\mathbf{x})\), where \(q_T\) is the Radon-Nikodym derivative of the OU marginal at time \(T\) against the invariant measure \(\nu_\infty\). The log-RND term \(\log q_T\) corrects for the mismatch between the OU marginal and the invariant measure, analogous to the path-space reweighting in the Jarzynski notes.

(Sign convention: \(g\) is a cost, minimized; consistent with adjoint sampling. The adjoint matching note uses a reward \(r = -g\).)

The reference process in adjoint sampling is Brownian motion. For paths, the natural reference is an Ornstein-Uhlenbeck (OU) process. Compared to Brownian motion, the OU reference adds a mean-reverting drift \(-A\mathbf{x}\). Two reasons to prefer it: (a) the eigenbasis parameterization with Dirichlet eigenfunctions enforces boundary conditions exactly, and (b) the OU invariant measure concentrates on smooth paths, providing a better prior for path sampling than Brownian motion (which generates rough paths). The reference SDE on \(\mathbb{R}^{dK}\) is \[ d\mathbf{X}_t = -A \, \mathbf{X}_t \, dt + \sigma_t \, dW_t^Q, \tag{1}\] where \(A = -\Delta_h\) is the negative discrete Laplacian (positive-definite, eigenvalues \(\lambda_k > 0\)): the \(K \times K\) tridiagonal matrix with \(2\) on the diagonal and \(-1\) on the off-diagonals, incorporating Dirichlet boundary conditions. Here \(W_t^Q\) is a \(Q\)-Wiener process with \(\text{Cov}(dW^Q) = Q\,dt\) and \(Q = A^{-s}\) for some \(s > d/2\).

The Laplacian \(A\) has eigenvectors \(\varphi_k\) (discrete sine functions) with eigenvalues \(\lambda_k > 0\). Expanding \(\mathbf{X}_t = \sum_k c_k(t) \, \varphi_k\), each mode evolves independently: \[ dc_k(t) = -\lambda_k \, c_k(t) \, dt + \sigma_t \, \lambda_k^{-s/2} \, dB_k(t), \tag{2}\] where \(B_k\) are independent scalar Brownian motions and \(\lambda_k^{-s/2}\) is the noise amplitude from the covariance operator \(Q = A^{-s}\). Each coefficient follows a scalar OU process. In the eigenbasis, the problem decomposes into \(K\) independent 1D SDEs, each a standard scalar SOC problem.

The controlled SDE adds a drift: \[ d\mathbf{X}_t = {\left[ -A \, \mathbf{X}_t + \sigma_t \, Q^{1/2} \, u(t, \mathbf{X}_t) \right]} \, dt + \sigma_t \, dW_t^Q. \tag{3}\] Same structure as adjoint sampling, with base drift \(b(\mathbf{x}) = -A\mathbf{x}\) instead of \(b = 0\). The optimal control \(u^\star = \sigma_t \nabla V\) is a Doob h-transform of the OU process, conditioning it to produce paths with low action.

The adjoint on path space

The lean adjoint from the adjoint matching notes satisfies \(\dot{\tilde{a}}(t) = -(\nabla_{\mathbf{x}} b)^\top \tilde{a}(t)\) with \(\tilde{a}(T) = \nabla g(\mathbf{X}_T)\).

Here \(b(\mathbf{x}) = -A\mathbf{x}\), so \(\nabla_{\mathbf{x}} b = -A\) and the lean adjoint ODE becomes: \[ \dot{\tilde{a}}(t) = A^\top \tilde{a}(t), \qquad \tilde{a}(T) = \nabla g(\mathbf{X}_T). \tag{4}\] This is a linear ODE with solution \[ \tilde{a}(t) = e^{ \textcolor{red}{-(T-t)}A^\top} \nabla g(\mathbf{X}_T). \tag{5}\]

Since \(A\) is symmetric positive-definite with eigenvalues \(\lambda_k\), the matrix exponential \(e^{-(T-t)A^\top}\) has eigenvalues \(e^{-(T-t)\lambda_k}\). It is diagonal in the eigenbasis: just multiply each mode by the scalar \(e^{-(T-t)\lambda_k}\), no matrix operations needed. In the eigenbasis: \[ \tilde{a}_k(t) = e^{-(T-t)\lambda_k} \cdot [\nabla g(\mathbf{X}_T)]_k. \] Low-frequency modes (small \(\lambda_k\)) get weight close to 1, nearly identical to the Brownian case. High-frequency modes (large \(\lambda_k\)) are exponentially : their adjoint contribution at early times is negligible.

Why is the adjoint no longer constant? For Brownian motion (\(b = 0\)), perturbing \(X_t\) by \(\delta X_t\) shifts \(X_T\) by the same amount (independent increments), so \(\tilde{a}(t) = \nabla g(X_T)\) for all \(t\). For OU, the drift \(-A\mathbf{x}\) pulls perturbations back toward zero. A perturbation \(\delta X_t\) at time \(t\) decays by \(e^{-(T-t)A}\) before reaching \(T\). The terminal cost’s sensitivity to that perturbation is therefore \(e^{-(T-t)A^\top} \nabla g(X_T)\), not \(\nabla g(X_T)\). High-frequency modes are strongly damped by the OU reference, so the control has less leverage over them at early times: perturbing them costs energy but has minimal terminal effect.

The FAS loss is the adjoint matching loss with this non-constant adjoint: \[ L_\text{FAS}(u) = \mathbb{E} {\left[ \int_0^T \big\|u(t, \mathbf{X}_t) + \sigma_t \, Q^{1/2} \, \textcolor{blue}{e^{-(T-t)A^\top}} \, \nabla g(\mathbf{X}_T)\big\|^2 \, dt \right]} , \tag{6}\] where the expectation is over trajectories of the controlled SDE Equation 3 (with stop-gradient). The \(Q^{1/2}\) factor appears because the control enters the SDE as \(\sigma_t Q^{1/2} u\); the Girsanov change of measure produces \(\|Q^{1/2} u\|^2\), not \(\|u\|^2\), so the adjoint target picks up \(Q^{1/2}\). The \( \textcolor{blue}{e^{-(T-t)A^\top}}\) factor is the OU semigroup, which damps the terminal gradient according to how much the base dynamics attenuate perturbations.

Boundary conditions for free

The Dirichlet eigenfunctions of \(A\) satisfy \(\varphi_k(0) = \varphi_k(L) = 0\). Any linear combination \(\mathbf{x} = \sum_k c_k \, \varphi_k\) automatically satisfies \(\mathbf{x}(0) = 0\) and \(\mathbf{x}(L) = 0\).

To enforce \(\mathbf{x}(0) = a\) and \(\mathbf{x}(L) = b\), write \(\mathbf{x}(s) = \bar{\mathbf{x}}(s) + \sum_k c_k \, \varphi_k(s)\) where \(\bar{\mathbf{x}}\) is any fixed path connecting \(a\) to \(b\) (e.g., linear interpolation). The OU process evolves the \(c_k\) coefficients; the boundary values are fixed by \(\bar{\mathbf{x}}\). No penalty terms, no projections. The controlled process respects boundary conditions by construction.

Discretization-invariant sampling. Train the control \(u\) on a coarse grid (small \(K\)). At inference, sample at arbitrary resolution by adding more eigenmodes. The high-frequency modes (\(k > K\)) receive zero control and simply follow the OU reference, which already assigns them low variance (\(\lambda_k^{-s}\) decays with \(k\)). This is a standard property of Galerkin truncation for linear reference processes; FAS inherits it rather than inventing it.

Connection to existing framework

Set \(A = 0\): the OU reference becomes Brownian motion, \(Q^{1/2} = I\), \(e^{0} = I\), and the FAS loss Equation 6 reduces to the adjoint sampling loss \(\|u + \sigma_t \nabla g(X_1)\|^2\). Standard adjoint sampling is the special case with no spatial structure in the base dynamics.

The reciprocal projection from adjoint sampling carries over directly. Replace the Brownian bridge with an OU bridge: given terminal state \(\mathbf{X}_T\), sample \(\mathbf{X}_t \mid \mathbf{X}_T\) from the OU bridge. Since each mode \(c_k\) is an independent scalar OU process, the bridge decomposes into \(K\) independent 1D Gaussian conditionals:

OU bridge formula in the eigenbasis

For a scalar OU process \(dc_k = -\lambda_k c_k \, dt + \sigma_t \lambda_k^{-s/2} \, dB_k\) started at \(c_k(0) = 0\), the bridge is standard Gaussian conditioning (cf. ?@eq-bridge of adjoint sampling): \[ c_k(t) \mid c_k(T) \;\sim\; \mathcal{N} {\left( \frac{\mathop{\mathrm{Cov}}(c_k(t), c_k(T))}{v_k(T)} \, c_k(T), \;\; v_k(t) - \frac{\mathop{\mathrm{Cov}}(c_k(t), c_k(T))^2}{v_k(T)} \right)} , \] with \(v_k(t) = \lambda_k^{-s} \int_0^t \sigma_r^2 \, e^{-2\lambda_k(t-r)} \, dr\) and \(\mathop{\mathrm{Cov}}(c_k(t), c_k(T)) = e^{-\lambda_k(T-t)} v_k(t)\).

The buffer-based training from adjoint sampling transfers directly: simulate forward to fill a replay buffer, then sample from the OU bridge for cheap inner-loop gradient steps.