Adjoint Sampling

SDE
markov
sampling
Schrodinger bridge
Published

22 03 2026

Modified

22 03 2026

Sampling via controlled Brownian motion

Consider a controlled Brownian motion starting at the origin,

\[ dX_t = \sigma_t \, u(t, X_t) \, dt + \sigma_t \, dW_t, \quad X_0 = 0, \quad t \in [0,1], \tag{1}\]

where \(\sigma_t > 0\) is a noise schedule and \(u: [0,1] \times \mathbb{R}^d \to \mathbb{R}^d\) is the control. With \(u \equiv 0\), this is a scaled Brownian motion with marginals \(p^\text{base}_t(x) = \mathcal{N}(0, \nu_t I)\) where \(\nu_t = \int_0^t \sigma_s^2 \, ds\). The goal is to find \(u\) so that the terminal distribution \(p^u_1\) equals a target Boltzmann distribution \(\pi(x) \propto \exp(-E(x)/\tau)\), where \(E\) is an energy function and \(\tau > 0\) is a temperature. As described in the SOC notes, minimizing the KL divergence from the controlled path measure \(\mathbb{P}^u\) to the Schrodinger bridge \(\mathbb{P}^\star(\boldsymbol{X}) = \mathbb{P}(\boldsymbol{X} \mid X_1) \, \pi(X_1)\) gives the SOC problem

\[ \min_u \; J(u) \; = \; \min_u \; \mathbb{E}_{\mathbb{P}^u} {\left[ \int_0^1 \tfrac{1}{2} \|u(t, X_t)\|^2 \, dt + g(X_1) \right]} , \tag{2}\]

where \(g(x) = \log p^\text{base}_1(x) + E(x)/\tau\) is the terminal cost (up to an additive constant \(\log \mathcal{Z}\) that does not affect the optimal control). Indeed, \(\pi \propto \exp(-E/\tau)\) so \(\log(p^\text{base}_1/\pi) = \log p^\text{base}_1 + E/\tau + \log \mathcal{Z}\).

The Girsanov theorem gives \(D_{\text{KL}}(\mathbb{P}^u, \mathbb{P}) = \frac{1}{2} \mathbb{E}\int_0^1 \|u\|^2 \, dt\), so the SOC objective trades off terminal cost against control effort. The HJB notes use a maximization convention; our minimization Equation 2 corresponds to \(f = 0\) and \(g_\text{HJB} = -g\). The optimal control is \(u^\star(t,x) = \sigma_t \nabla \log h(t,x)\) where

\[ h(t,x) = \mathbb{E} {\left[ \exp(-g(X_1)) \mid X_t = x \right]} \tag{3}\]

under the base process. This is the Doob h-transform with \(g_{\text{Doob}} = -g\): the function \(h\) tilts the path measure toward low-cost terminal states. The optimal control \(u^\star\) points towards regions where \(h\) is large, i.e. where the energy \(E\) is low. The function \(\log h\) acts as the reward-to-go; the SOC cost is \(J(u^\star) = -\log h(0,0)\), and at optimality \(D_{\text{KL}}(\mathbb{P}^{u^\star}, \mathbb{P}^\star) = 0\).

The adjoint is constant

The adjoint matching framework turns the SOC problem into a regression. For a controlled SDE with base drift \(b\) and running cost \(f\), the adjoint method gives the lean adjoint (the \(u\)-independent part of the full adjoint ODE, as derived in adjoint matching):

\[ \frac{d}{dt} \tilde{a}(t; \boldsymbol{X}) = -\nabla_x f - (\nabla_x b)^\top \tilde{a}(t; \boldsymbol{X}), \quad \tilde{a}(1; \boldsymbol{X}) = \nabla g(X_1). \tag{4}\]

Now set \( \textcolor{blue}{b = 0}\) and \( \textcolor{blue}{f = 0}\) (our base process is pure Brownian motion, and the running cost is just the control penalty which does not enter the lean adjoint). Both forcing terms vanish: \(\frac{d}{dt}\tilde{a} = 0\). So \(\tilde{a}\) is constant along each trajectory:

\[ \textcolor{blue}{\tilde{a}(t; \boldsymbol{X}) = \nabla g(X_1)}, \quad \text{for all } t \in [0,1]. \tag{5}\]

No backward ODE to solve. The regression target at every time \(t\) is just \(\nabla g(X_1)\). The adjoint matching loss becomes

\[ L_\text{AM}(u) = \mathbb{E}_{\boldsymbol{X} \sim \mathbb{P}^{\bar{u}}} {\left[ \int_0^1 \tfrac{1}{2} \| u(t, X_t) + \sigma_t \, \nabla g(X_1) \|^2 \, dt \right]} , \tag{6}\]

where \(\bar{u} = \texttt{stopgrad}(u)\). This regresses the control \(u(t, X_t)\) onto \(-\sigma_t \, \nabla g(X_1)\) for each intermediate state \(X_t\) along a trajectory. At the fixed point, \(u^\star(t,x) = -\sigma_t \mathbb{E}_{\mathbb{P}^{u^\star}}[\nabla g(X_1) \mid X_t = x]\) (the expectation is under the optimally controlled process; under \(\mathbb{P}^{u^\star}\), the conditional distribution of \(X_1 \mid X_t\) includes the \(\exp(-g)\) tilt). Comparing with \(u^\star = \sigma_t \nabla \log h\) from Equation 3, the adjoint target \(\nabla g(X_1)\) is a stochastic estimate of \(-\nabla \log h(t, X_t)\): the regression averages many noisy terminal gradients to recover the gradient of \(\log h\).

Why is the adjoint constant? A direct argument.

With \(b = 0\), a trajectory of the base process is \(X_t = \int_0^t \sigma_s \, dW_s\), so \(X_1 = X_t + \int_t^1 \sigma_s \, dW_s\) where the two pieces are independent. Along a fixed trajectory (the Brownian increments after \(t\) are held constant), perturbing \(X_t\) by \(\delta X_t\) shifts \(X_1\) by the same \(\delta X_t\). Hence \(\nabla_{X_t} g(X_1) = \nabla g(X_1)\) along the trajectory, confirming Equation 5.

From trajectories to pairs

The loss Equation 6 only depends on the pair \((X_t, X_1)\), not the full trajectory:

\[ L_\text{AM}(u) = \int_0^1 \mathbb{E}_{(X_t, X_1) \sim p^{\bar{u}}_{t,1}} {\left[ \tfrac{1}{2} \| u(t, X_t) + \sigma_t \, \nabla g(X_1) \|^2 \right]} \, dt. \tag{7}\]

This is still expensive: sampling \((X_t, X_1) \sim p^{\bar{u}}_{t,1}\) requires simulating the controlled SDE. The next section removes this cost.

Reciprocal projection

At the optimal solution \(u^\star\), the controlled path measure equals the Schrodinger bridge: \(\mathbb{P}^{u^\star} = \mathbb{P}^\star\). Since \(\mathbb{P}^\star(\boldsymbol{X}) = \mathbb{P}(\boldsymbol{X} \mid X_1) \, \pi(X_1)\), marginalizing over all times except \(t\) and \(1\) gives

\[ p^{u^\star}_{t,1}(x_t, x_1) = p^\text{base}_{t \mid 1}(x_t \mid x_1) \, \pi(x_1). \tag{8}\]

If you know \(X_1\), the intermediate state \(X_t\) is distributed as a Brownian bridge conditioned on the endpoint. Away from optimality, this factorization does not hold for \(p^u_{t,1}\) because the controlled dynamics create correlations between \(X_t\) and \(X_1\) that differ from those of the base process. The reciprocal projection replaces \(p^{\bar{u}}_{t,1}(x_t, x_1)\) by

\[ p^\text{base}_{t \mid 1}(x_t \mid x_1) \, p^{\bar{u}}_1(x_1). \tag{9}\]

We keep the terminal marginal \(p^{\bar{u}}_1\) from the current control but fill in the interior with the Brownian bridge. This is a projection onto the reciprocal class of the base process: the set of path measures that share the bridge structure of \(\mathbb{P}\).

What does \(p^\text{base}_{t \mid 1}\) look like? The base process \(X_t = \int_0^t \sigma_s \, dW_s\) has \(X_t \sim \mathcal{N}(0, \nu_t)\) and \(X_1 \sim \mathcal{N}(0, \nu_1)\). Since \(X_1 = X_t + \int_t^1 \sigma_s \, dW_s\) with independent increments, \(\mathop{\mathrm{Cov}}(X_t, X_1) = \nu_t I\). Standard Gaussian conditioning gives

\[ X_t \mid X_1 = x_1 \; \sim \; \mathcal{N} {\left( \frac{\nu_t}{\nu_1} x_1, \;\; \frac{\nu_t(\nu_1 - \nu_t)}{\nu_1} \, I \right)} . \tag{10}\]

For constant \(\sigma_t = \sigma\), this simplifies to \(\nu_t = \sigma^2 t\) and \(X_t \mid X_1 \sim \mathcal{N}(t \, x_1, \; \sigma^2 t(1-t) \, I)\), the usual Brownian bridge. Sampling from Equation 10 is a single Gaussian draw.

Substituting Equation 9 into Equation 7 gives the Reciprocal Adjoint Matching (RAM) loss:

\[ L_\text{RAM}(u) = \int_0^1 \lambda(t) \, \mathbb{E}_{X_1 \sim p^{\bar{u}}_1, \; X_t \sim p^\text{base}_{t \mid 1}(\cdot \mid X_1)} {\left[ \tfrac{1}{2} \| u(t, X_t) + \sigma_t \, \nabla g(X_1) \|^2 \right]} \, dt, \tag{11}\]

where \(\lambda(t) = 1/\sigma_t^2\) is a time weighting that normalizes the regression target magnitude across time (does not change the optimum).

Sampling the pair \((X_t, X_1)\) now proceeds as: (1) generate \(X_1\) from the current model by running the controlled SDE forward (expensive, done infrequently), (2) for each \(X_1\), sample \(t \sim \text{Uniform}([0,1])\) and \(X_t \sim p^\text{base}_{t \mid 1}(\cdot \mid X_1)\) via Equation 10 (one cheap Gaussian draw). Many gradient updates per energy evaluation.

Why the projection helps

Define the projection operator \(\Pi\) that maps a control \(u\) to the optimal control for the Schrodinger bridge with terminal marginal \(p^u_1\):

\[ \Pi(u) = \mathop{\mathrm{argmin}}_v \; \mathbb{E}_{\mathbb{P}^v} {\left[ \int_0^1 \tfrac{1}{2} \|v\|^2 \, dt + \log \frac{p^\text{base}_1(X_1)}{p^u_1(X_1)} \right]} . \tag{12}\]

The projection \(\Pi(u)\) solves the SOC problem of reaching the current terminal distribution \(p^u_1\) with minimal effort. Two properties follow from the definition.

First, \(J(u) \geq J(\Pi(u))\): projecting never increases the SOC cost.

Proof sketch.

Split the terminal cost \(g = g_1 + g_2\) where \(g_1(x) = \log(p^\text{base}_1(x) / p^u_1(x))\) and \(g_2(x) = \log(p^u_1(x) / \pi(x))\). Then

\[ J(u) = \mathbb{E}_{\mathbb{P}^u} {\left[ \tfrac{1}{2}\|u\|^2 + g_1(X_1) \right]} + \mathbb{E}_{p^u_1} {\left[ g_2(X_1) \right]} . \]

The first term equals \(D_{\text{KL}}\!\big(\mathbb{P}^u,\; \mathbb{P}(\cdot \mid X_1) \, p^u_1(X_1)\big)\) by Girsanov, and is minimized over all controls sharing terminal marginal \(p^u_1\) by \(\Pi(u)\). Since \(\Pi(u)\) preserves the terminal marginal, \(p^{\Pi(u)}_1 = p^u_1\), the second term is unchanged. Hence \(J(\Pi(u)) \leq J(u)\).

Second, after projection, RAM and AM coincide: \(L_\text{RAM}(\Pi(u)) = L_\text{AM}(\Pi(u))\). This holds because \(\Pi(u)\) is itself a Schrodinger bridge (with terminal marginal \(p^u_1\)), so its joint \((X_t, X_1)\) already factorizes as bridge times marginal, and the reciprocal projection is the identity.

The algorithm

The full Adjoint Sampling (havens2025adjoint?) algorithm alternates between two phases.

Outer loop (expensive): simulate the controlled SDE to produce terminal samples \(\{X_1^{(i)}\}\). Evaluate \(\nabla g(X_1^{(i)})\) for each sample and store the pairs in a replay buffer \(\cB\).

Inner loop (cheap): draw \((X_1, \nabla g)\) from the buffer, sample \(t \sim \text{Uniform}([0,1])\) and \(X_t\) from Equation 10, and update \(u\) by gradient descent on the RAM loss Equation 11.

The inner loop runs for many iterations without touching the energy function. The buffer is refreshed when the control has drifted enough that \(p^u_1\) no longer matches the stored samples.

The alternating scheme has a clean fixed-point interpretation. Each outer step implicitly performs the reciprocal projection \(\Pi\), and each inner step minimizes the AM loss on the projected control. Concretely, if \(u_i\) is the current control and we fully converge the RAM loss with \(X_1\) samples from \(p^{u_i}_1\), the update satisfies

\[ u_{i+1} = \Pi(u_i) - \frac{\delta L_\text{AM}}{\delta u}(\Pi(u_i)). \]

The fixed point \(u = \Pi(u)\) with \(\frac{\delta L_\text{AM}}{\delta u}(u) = 0\) is exactly the optimal control \(u^\star\).

Beyond the Dirac prior

The setup above forces \(X_0 = 0\). What about a Gaussian prior, or a harmonic oscillator prior for molecules? With a non-trivial prior \(\mu\), the base process couples \(X_0\) and \(X_1\), and the SOC solution produces a biased terminal marginal (the initial value function \(V_0(X_0)\) leaks into \(p^\star(X_1)\), as described in the adjoint matching note). The fix: solve a full Schrodinger bridge instead of a plain SOC problem.

The corrector

Recall from the SB notes that the Schrodinger bridge \(\mathbb{Q}^\star\) minimizing \(D_{\text{KL}}(\mathbb{Q}\,\|\, \mathbb{P})\) subject to \(\mathbb{Q}_0 = \mu\) and \(\mathbb{Q}_1 = \pi\) has path measure \[ \frac{d\mathbb{Q}^\star}{d\mathbb{P}}(\boldsymbol{X}) \;\propto\; \widehat{\varphi}_0(X_0) \, \varphi_1(X_1), \tag{13}\] with time-dependent SB potentials \[ \varphi_t(x) = \mathbb{E}_\mathbb{P}[\varphi_1(X_1) \mid X_t = x], \qquad \widehat{\varphi}_t(x) = \mathbb{E}_\mathbb{P}[\widehat{\varphi}_0(X_0) \mid X_t = x]. \tag{14}\] The forward potential \(\varphi_t\) is a Doob h-transform: the SB process has drift \(f_t + \sigma_t^2 \nabla \log \varphi_t\). The backward potential \(\widehat{\varphi}_t\) propagates information about the initial distribution \(\mu\) forward in time.

The SB solution \(\mathbb{Q}^\star\) is also a controlled diffusion, so it must solve some SOC problem. From Equation 13, the optimal joint density of endpoints satisfies \[ p^\star(X_0, X_1) \;\propto\; \mathbb{P}(X_0, X_1) \, \widehat{\varphi}_0(X_0) \, \varphi_1(X_1). \tag{15}\] Compare this with the SOC joint: for a terminal cost \(g\), the SOC optimal joint is \(p^\star(X_0, X_1) = \mathbb{P}(X_0, X_1) \, e^{-g(X_1) + V_0(X_0)}\), where \(V_0(x) = \log \mathbb{E}_\mathbb{P}[e^{-g(X_1)} \mid X_0 = x]\) is the initial value function. Matching the two expressions requires \[ e^{-g(X_1)} \propto \varphi_1(X_1), \qquad e^{V_0(X_0)} \propto \frac{\widehat{\varphi}_0(X_0)}{\mu(X_0)}. \tag{16}\] The first condition gives \(g(x) = -\log \varphi_1(x) + \text{const}\). To express this in terms of the corrector \(\widehat{\varphi}_1\), use the SB marginal constraint at \(t = 1\): the SB density at time 1 is \(q_1(x) = p^{\mathbb{P}}_1(x) \, \widehat{\varphi}_1(x) \, \varphi_1(x) = \pi(x)\). So \(\varphi_1(x) = \pi(x) / (p^{\mathbb{P}}_1(x) \, \widehat{\varphi}_1(x))\). Substituting into \(g = -\log \varphi_1\): \[ \textcolor{blue}{g(x) = \log \frac{\widehat{\varphi}_1(x)}{\pi(x)}} + \text{const}. \tag{17}\] This is the modified terminal cost. Compared to the terminal cost \(g(x) = \log p^{\mathbb{P}}_1(x) / \pi(x)\) from Equation 2, the base marginal \(p^{\mathbb{P}}_1\) is replaced by \( \textcolor{blue}{\widehat{\varphi}_1}\): a corrector that accounts for the coupling between \(X_0\) and \(X_1\).

Does this corrector actually remove the \(V_0\) bias? Marginalize Equation 15 over \(X_0\). Using Equation 17 to replace \(\varphi_1\): \[ \begin{aligned} p^\star(X_1) &\propto \varphi_1(X_1) \int \mathbb{P}(X_0, X_1) \, \widehat{\varphi}_0(X_0) \, dX_0 \\ &= \varphi_1(X_1) \int \mathbb{P}(X_1 \mid X_0) \, \widehat{\varphi}_0(X_0) \, \mu(X_0) \, dX_0 \\ &= \varphi_1(X_1) \, \widehat{\varphi}_1(X_1) \, p^{\mathbb{P}}_1(X_1) \;\propto\; \pi(X_1). \end{aligned} \tag{18}\] The second-to-last step uses \(\int \mathbb{P}_{1|0}(x \mid y) \, \mu(y) \, \widehat{\varphi}_0(y) \, dy = p_1^{\mathbb{P}}(x) \, \widehat{\varphi}_1(x)\) (from the definition and Bayes’ rule), and the last step uses \(\varphi_1 \, \widehat{\varphi}_1 \, p_1^{\mathbb{P}} = \pi\). The corrector \(\widehat{\varphi}_1\) cancels the initial value function bias exactly.

Every SB problem decomposes as: SOC with the standard adjoint (the Doob h-transform piece \(\varphi_t\)) plus a corrector \(\widehat{\varphi}_1\) that absorbs the prior bias.

Adjoint matching with corrector

For Boltzmann sampling (\(\pi \propto e^{-E}\), \(f_t = 0\)), the modified terminal cost Equation 17 becomes \(g(x) = E(x) + \log \widehat{\varphi}_1(x) + \text{const}\). The lean adjoint is still constant (since \(b = 0\), the same argument from Equation 5 applies) and equals \(\nabla g(X_1) = (\nabla E + \textcolor{blue}{\nabla \log \widehat{\varphi}_1})(X_1)\). The AM loss Equation 6 gains one extra term, \(\nabla \log \widehat{\varphi}_1(X_1)\) alongside \(\nabla E(X_1)\): \[ L_\text{AM}(u) = \mathbb{E}_{\mathbb{P}_{t \mid 0,1} \, p^{\bar{u}}_{0,1}} {\left[ \big\| u_t(X_t) + \sigma_t {\left( \nabla E + \textcolor{blue}{\nabla \log \widehat{\varphi}_1} \right)} (X_1) \big\|^2 \right]} , \tag{19}\] where \(p^{\bar{u}}_{0,1}\) is the joint endpoint distribution under the current control and \(\mathbb{P}_{t \mid 0,1}\) is the Brownian bridge kernel. When \(\mu = \delta_0\), \(\widehat{\varphi}_0 = \text{const}\), hence \(\widehat{\varphi}_1 = \text{const}\), and Equation 19 reduces to Equation 6.

The same reciprocal projection trick applies: sample \(X_1\) from the current model, then draw \(X_t\) from a Brownian bridge conditioned on \((X_0, X_1)\). No SDE simulation during training.

Corrector matching

The AM loss Equation 19 requires \(\nabla \log \widehat{\varphi}_1\), which is unknown. Start from the definition Equation 14: \(\widehat{\varphi}_t(x) = \mathbb{E}_\mathbb{P}[\widehat{\varphi}_0(X_0) \mid X_t = x]\). Writing this as an integral and differentiating: \[ \nabla \log \widehat{\varphi}_t(x) = \frac{\int \nabla_x \mathbb{P}_{t \mid 0}(x \mid y) \, \widehat{\varphi}_0(y) \, dy}{\widehat{\varphi}_t(x)}. \] Pull the gradient inside the transition kernel: \[ \nabla \log \widehat{\varphi}_t(x) = \frac{\int \nabla_x \log \mathbb{P}_{t \mid 0}(x \mid y) \, \mathbb{P}_{t \mid 0}(x \mid y) \, \widehat{\varphi}_0(y) \, dy}{\widehat{\varphi}_t(x)}. \] The ratio \(\mathbb{P}_{t \mid 0}(x \mid y) \, \widehat{\varphi}_0(y) / \widehat{\varphi}_t(x)\) is, by Bayes’ rule, the posterior \(p^\star(X_0 = y \mid X_t = x)\). So the corrector score is a conditional expectation: \[ \nabla \log \widehat{\varphi}_t(x) = \mathbb{E}_{p^\star} {\left[ \nabla_x \log \mathbb{P}_{t \mid 0}(x \mid X_0) \mid X_t = x \right]} . \tag{20}\] This is a Tweedie-type formula: the corrector score equals the conditional expectation of the transition score \(\nabla_x \log \mathbb{P}_{t|0}(x \mid X_0)\) given \(X_t = x\). Since any conditional expectation minimizes a least-squares regression: \[ \nabla \log \widehat{\varphi}_1 = \mathop{\mathrm{argmin}}_h \; \mathbb{E}_{(X_0, X_1) \sim p^{u^\star}_{0,1}} {\left[ \big\| h(X_1) - \nabla_{X_1} \log \mathbb{P}(X_1 \mid X_0) \big\|^2 \right]} . \tag{21}\] This is the corrector matching (CM) objective: regress \(h\) onto the score of the base transition kernel, evaluated at endpoint pairs from the current process.

For \(f_t = 0\), the base process is a scaled Brownian motion, so \(\mathbb{P}(X_1 \mid X_0) = \mathcal{N}(X_1; X_0, (\nu_1 - \nu_0) I)\). The transition score is \[ \nabla_{X_1} \log \mathbb{P}(X_1 \mid X_0) = -\frac{X_1 - X_0}{\nu_1 - \nu_0}, \tag{22}\] known in closed form. The CM regression target is just this Gaussian score, averaged over the posterior on \(X_0\).

Alternating optimization = IPFP

The AM objective Equation 19 needs \(\nabla \log \widehat{\varphi}_1\). The CM objective Equation 21 needs samples from \(p^{u^\star}\). Neither can be solved alone. The fix: alternate.

  1. AM step: Solve Equation 19 with \(\nabla \log \widehat{\varphi}_1 \approx h^{(k-1)}\) to get \(u^{(k)}\).
  2. CM step: Solve Equation 21 with \(u^\star \approx u^{(k)}\) to get \(h^{(k)}\).

Initialize with \(h^{(0)} = 0\). The first AM stage regresses the control onto \(\sigma_t \nabla E(X_1)\): pure energy-guided transport with no corrector, recovering the algorithm from Section 0.6. Subsequent CM stages progressively learn the bias correction.

This alternation has a clean interpretation. The AM step solves a forward half-bridge: \(\min D_{\text{KL}}(\mathbb{Q}\,\|\, \mathbb{Q}^{\text{bwd}})\) subject to \(\mathbb{Q}_0 = \mu\). The CM step solves a backward half-bridge: \(\min D_{\text{KL}}(\mathbb{P}^{u^{(k)}} \,\|\, \mathbb{Q})\) subject to \(\mathbb{Q}_1 = \pi\). Alternating between these two projections is exactly IPFP/Sinkhorn on path-space. Convergence to the SB solution follows from standard IPFP analysis (de2021diffusion?), since projections onto convex constraint sets are contractive in KL. In practice, 3-5 stages suffice.

Comparison

Method Prior Terminal cost \(g(x)\) Corrector
Adjoint Sampling (Section 0.6) \(\delta_0\) (memoryless) \(\log \frac{p^{\mathbb{P}}_1(x)}{\pi(x)}\) None (\(\widehat{\varphi}_1 = \text{const}\))
ASBS (Section 0.11) Arbitrary \(\mu\) \(\log \frac{\widehat{\varphi}_1(x)}{\pi(x)}\) Learned \(\nabla \log \widehat{\varphi}_1\)

ASBS (liu2025adjoint?) reduces to Adjoint Sampling when \(\mu = \delta_0\): the corrector becomes constant and the CM step is trivially solved.

A concrete example of a useful prior: for molecular conformer generation, one can use a harmonic oscillator prior \(\mu(x) \propto \exp(-\frac{\alpha}{2} \sum_{i < j} \|x_i - x_j - r^0_{ij}\|^2)\), where \(r^0_{ij}\) are equilibrium bond lengths from the molecular graph. Particles start in a physically reasonable arrangement rather than at the origin, substantially reducing the transport cost.