Audit: functional_adjoint.qmd

Critical Issues (must fix)

Sign error in the lean adjoint solution (lines 65–71, 78). The note writes \(\tilde{a}(t) = e^{(T-t)A^\top} \nabla g(\mathbf{X}_T)\). This is wrong. The lean adjoint ODE \(\dot{\tilde{a}}(t) = A^\top \tilde{a}(t)\) with terminal condition \(\tilde{a}(T) = \nabla g(\mathbf{X}_T)\) has solution

\[\tilde{a}(t) = e^{A^\top(t - T)} \tilde{a}(T) = e^{-(T-t)A^\top} \nabla g(\mathbf{X}_T).\]

The exponent should be \(-(T-t)A^\top\), not \(+(T-t)A^\top\). The FAS paper confirms this: it defines \(\mathcal{C}_t := \sigma_t Q^{1/2} e^{-(T-t)\mathcal{A}^\dag}\) (Section 3.2). Since \(A\) is symmetric positive-definite, the exponential \(e^{-(T-t)A}\) has eigenvalues \(e^{-(T-t)\lambda_k} < 1\), which are damping factors, not amplifying ones.

This error propagates into:

Line 70: \(\tilde{a}_k(t) = e^{(T-t)\lambda_k} \cdot [\nabla g(\mathbf{X}_T)]_k\) should be \(e^{-(T-t)\lambda_k}\).
Line 78 (FAS loss): The \(\blue{e^{(T-t)A^\top}}\) factor in the loss should be \(\blue{e^{-(T-t)A^\top}}\).
Line 80: “The only structural difference is the \(\blue{e^{(T-t)A^\top}}\) factor” – same correction.

The entire intuitive paragraph about adjoint amplification is backwards (lines 68–74). The note claims: “High-frequency modes get amplified by the exponential. But these are exactly the modes that the OU reference suppresses most aggressively, so the net effect is balanced.” With the correct sign \(e^{-(T-t)\lambda_k}\), it is the opposite: the adjoint damps high-frequency modes. The OU drift pulls perturbations back toward zero, so a perturbation at time \(t\) is attenuated by \(e^{-(T-t)\lambda_k}\) before reaching \(T\). The adjoint does NOT need to compensate; it reflects the fact that perturbing high-frequency modes at early times has little effect on the terminal state. The entire paragraph on line 74 (“the control at time \(t\) must push \(e^{(T-t)A}\) times harder”) needs to be rewritten with the correct sign and the correct physical story: the OU drift means early perturbations decay, so the adjoint reduces the weight on the terminal gradient at early times, especially for high-frequency modes. This is the opposite of what is currently written.

Suggested replacement for lines 68–74: The matrix exponential \(e^{-(T-t)A^\top}\) acts as a smoother/damper. Since \(A\) is symmetric positive-definite with eigenvalues \(\lambda_k\), the exponential has eigenvalues \(e^{-(T-t)\lambda_k}\). In the eigenbasis: \(\tilde{a}_k(t) = e^{-(T-t)\lambda_k} \cdot [\nabla g(\mathbf{X}_T)]_k\). Low-frequency modes (small \(\lambda_k\)) get weight close to 1, nearly identical to the Brownian case. High-frequency modes (large \(\lambda_k\)) are exponentially damped: their adjoint contribution at early times is negligible.

Why? The OU drift \(-A\mathbf{x}\) pulls perturbations back toward zero. A perturbation \(\delta X_t\) at time \(t\) decays by \(e^{-(T-t)A}\) before reaching \(T\). The terminal cost’s sensitivity to that perturbation is therefore \(e^{-(T-t)A^\top} \nabla g(X_T)\), not \(\nabla g(X_T)\). High-frequency modes are strongly damped by the OU reference, so the control has less leverage over them at early times: perturbing them costs energy but has minimal terminal effect.
Controlled SDE parameterization mismatch (line 50–52). The note writes the controlled SDE as: \[d\mathbf{X}_t = [-A\mathbf{X}_t + \sigma_t Q^{1/2} u(t, \mathbf{X}_t)] \, dt + \sigma_t \, dW_t^Q\] and then says \(W_t^Q = Q^{1/2} W_t\). But comparing with the FAS paper’s controlled SDE (equation 4 in the paper): \[d\mathbf{X}^\alpha_t = [\mathcal{A}\mathbf{X}^\alpha_t + \sigma_t Q^{1/2} \alpha(\mathbf{X}^\alpha_t, t)] \, dt + \sigma_t \, d\mathbf{W}^Q_t\] the paper has \(d\mathbf{W}^Q_t\) as the \(Q\)-Wiener process with \(\text{Cov}(d\mathbf{W}^Q) = Q \, dt\). If \(W_t\) is a standard (cylindrical) Wiener process, then \(W_t^Q = Q^{1/2} W_t\) has covariance \(Q \, dt\). This is fine. But the mode SDE on line 44 uses \(\sigma_t \lambda_k^{-s/2} dB_k\). For this to be consistent with the noise term \(\sigma_t dW_t^Q = \sigma_t Q^{1/2} dW_t\), the \(k\)-th component in the eigenbasis should pick up \(\sigma_t (Q^{1/2} dW_t)_k = \sigma_t \lambda_k^{-s/2} dB_k\) (since \(Q = A^{-s}\) with eigenvalues \(\lambda_k^{-s}\), so \(Q^{1/2}\) has eigenvalues \(\lambda_k^{-s/2}\)). This checks out. However, the reference SDE on line 38 uses \(\sigma_t dW_t\) (standard Wiener), not \(\sigma_t dW_t^Q\). Lines 38 and 50 are inconsistent: line 38 drives with \(dW_t\) (standard), line 50 drives with \(dW_t^Q = Q^{1/2} dW_t\). Pick one and stick with it. If the reference SDE uses \(dW_t^Q\), say so from the start.

Sign Convention / Notation Issues

Terminal cost \(g\) vs. adjoint_sampling.qmd convention (line 34). The note defines \(g(\mathbf{x}) = S(\mathbf{x}) + \log q_T(\mathbf{x}_0, \mathbf{x}) - \log \nu_\infty(\mathbf{x})\) and the lean adjoint has terminal condition \(\tilde{a}(T) = \nabla g(\mathbf{X}_T)\) (line 61). In adjoint_sampling.qmd, the terminal condition is also \(\tilde{a}(1) = \nabla g(X_1)\) (line 48), so this is consistent.

However, in adjoint_matching.qmd the terminal condition is \(\tilde{a}(1) = -\nabla r(X_1)\) (line 195), because that note uses a reward \(r\) (maximization) rather than cost \(g\) (minimization), with \(g = -r\). The functional_adjoint note uses cost \(g\) (minimization) consistent with adjoint_sampling.qmd. This is fine, but it would help to add a parenthetical remark at line 57 noting the sign convention, since a reader coming from adjoint_matching.qmd will see \(-\nabla r\) there and \(+\nabla g\) here.
\(Q^{1/2}\) factor in the FAS loss (line 78) vs. adjoint_sampling loss. The adjoint_sampling loss (line 62 of adjoint_sampling.qmd) is \(\|u + \sigma_t \nabla g(X_1)\|^2\). The FAS loss on line 78 is \(\|u + \sigma_t Q^{1/2} e^{(T-t)A^\top} \nabla g(\mathbf{X}_T)\|^2\). The \(Q^{1/2}\) factor appears because the control enters the SDE as \(\sigma_t Q^{1/2} u\), so the optimal control satisfies \(u^\star = -Q^{1/2} \cdot [\text{adjoint}]\), not \(u^\star = -[\text{adjoint}]\). This is the correct mapping from paper to note. But the note should explain where the \(Q^{1/2}\) comes from (it is the \(Q^{1/2}\) in front of \(u\) in the controlled SDE). Currently the \(Q^{1/2}\) appears without comment.
Time interval \([0,T]\) vs \([0,1]\). The functional_adjoint note uses \([0,T]\) (lines 25, 61, 78). The adjoint_matching and adjoint_sampling notes use \([0,1]\). This is not a bug (the FAS paper uses \([0,T]\)), but note it for the reader since all cross-references point to notes that use \([0,1]\).

Redundancy

The lean adjoint derivation (lines 57–61) re-states material from adjoint_matching.qmd. Lines 57–58 explain that the lean adjoint satisfies \(\dot{\tilde{a}} = -(\nabla_x b)^\top \tilde{a}\) and that for \(b = 0\) the adjoint is constant. This is already derived in adjoint_matching.qmd (lines 192–200) and adjoint_sampling.qmd (lines 46–58). The note should simply reference those derivations and proceed directly to the OU case \(b(\mathbf{x}) = -A\mathbf{x}\). One sentence of recap is fine; lines 57–58 spend two sentences saying the same thing the reader already knows.
Section “Connection to existing framework” (lines 94–111) partially repeats the motivation. Lines 94–96 (“Set \(A = 0\): … reduces to the adjoint sampling loss”) duplicate the observation already made on line 80 (“Compare with the adjoint sampling loss”). Pick one location. The “Connection” section’s value is in the OU bridge formula and buffer training; the \(A = 0\) reduction should be a one-liner there or in the earlier section, not both.
The buffer-based training paragraph (lines 111) repeats adjoint_sampling.qmd nearly verbatim. The outer-loop/inner-loop description (simulate forward, store in buffer, sample from bridge, update) is a paraphrase of adjoint_sampling.qmd lines 138–148. One sentence saying “the buffer-based training from adjoint sampling transfers directly, replacing the Brownian bridge with the OU bridge” suffices.

Overselling / Complexity

“But some problems require sampling paths, not points” (line 21). This framing implies FAS addresses a qualitatively different problem. It does not. The discretized version is sampling in \(\mathbb{R}^{dK}\), which is exactly the finite-dimensional setting of adjoint sampling with an OU reference instead of Brownian motion. The infinite-dimensional framing is a limit of the finite-dimensional one. The note should be more upfront about this: the FAS paper’s main practical contribution is the eigenbasis parameterization with Dirichlet boundary conditions and the OU reference, not a fundamentally new mathematical framework.
“Functional Adjoint Sampler (FAS) extends adjoint sampling to this setting” (line 29). Then: “Strip it away and the core idea is simple: adjoint sampling on paths instead of points, with an OU reference instead of Brownian motion.” Good, the note does acknowledge this. But the first sentence of line 29 still lends the paper more novelty than warranted. The note could be more pointed: “FAS is adjoint sampling with an OU reference process. The infinite-dimensional formulation is a presentation choice, not a mathematical necessity.”
“Discretization-invariant sampling” paragraph (line 91). The claim “adding modes does not change the loss on the trained modes” is true for the eigenbasis decomposition, but this is a standard property of Galerkin truncation of linear SDEs, not specific to FAS. The note presents this as if it were a distinctive feature. A brief reality check would help: “This is a standard property of Galerkin truncation for linear reference processes; FAS inherits it rather than inventing it.”
The OU bridge details block (lines 100–109). The formula is standard conditional Gaussian computation. It could be shortened to the final formula with a one-line reference to standard Gaussian conditioning (already derived in adjoint_sampling.qmd, ?@eq-bridge). Currently it takes 10 lines for something that is a textbook identity.

Missing Connections

No mention of ASBS. The ASBS note (if it exists) solves the full Schrodinger bridge problem with arbitrary priors. FAS uses an OU prior, which is a special case. A cross-reference explaining how FAS relates to ASBS (fixed OU prior vs. learned prior in ASBS’s IPFP iterations) would help the reader place FAS in the broader picture.
No connection to Doob h-transforms. The optimal control \(u^\star = \sigma_t \nabla V = \sigma_t \nabla \log h\) is a Doob h-transform of the OU process. The doob.qmd note derives this. Linking to it would give the reader the “aha”: FAS is learning a Doob h-transform of an OU process on the path space, conditioning it to produce paths with low action. One sentence.
No discussion of when OU vs. Brownian reference matters. The note never explains when you would actually want the OU reference over standard Brownian motion. For point sampling in \(\mathbb{R}^d\), adjoint sampling with Brownian motion works fine. The OU reference is motivated by two things: (a) boundary conditions (Dirichlet eigenfunctions pin endpoints), and (b) the OU invariant measure concentrates on smooth paths, providing a better prior for path sampling. The note mentions (a) in the boundary section but never explicitly states (b) as a motivation for the OU choice.
No mention of computational cost of \(e^{-(T-t)A^\top} \nabla g\). The Brownian case has constant adjoint (no matrix exponential needed). The OU case requires applying the matrix exponential at each time step. In the eigenbasis this is diagonal (cheap), but this should be stated. A reader might worry about the cost of matrix exponentials.
Missing reference to Jarzynski/path-space weights. The terminal cost \(g\) includes the log-RND \(\log q_T(\mathbf{x}_0, \mathbf{x})\), which corrects for the mismatch between the OU marginal and the invariant measure. This is closely related to the Jarzynski-type reweighting in jarzynski.qmd. A cross-reference would help.

Structural Feedback

The note is short (122 lines) and reads more like a summary than a note. The existing notes (adjoint_matching: 227 lines, adjoint_sampling: ~250+ lines) derive results from scratch with detailed arguments. This note mostly states results and refers elsewhere. Given the user’s stated goal of extracting substance and main ideas, this brevity could be intentional, but the note currently sits in an awkward middle ground: too long for a “this is just adjoint sampling with OU” dismissal, too short for a self-contained treatment.

Suggestion: Either (a) cut it to ~60 lines by removing all redundant material and making it explicitly a “here’s what’s different” addendum to adjoint_sampling.qmd, or (b) add the missing connections (Doob, ASBS, when-to-use-OU) to make it a full note.
Section ordering is reasonable but “Connection to existing framework” should come earlier, right after the adjoint computation. The boundary conditions section (lines 83–91) interrupts the flow between the math and the algorithmic connection. Suggested order: Setup, Adjoint on path space, Connection to existing framework (including \(A = 0\) reduction), Boundary conditions, Discretization invariance.
No historical figure portrait. The TODO on lines 13–17 is still there. Either add one or remove the placeholder. Lev Pontryagin (optimal control / maximum principle) or Richard Feynman (path integrals) would fit.
No references section at bottom. The BibTeX comment block (lines 114–122) suggests the entry should go in ref.bib but has not been added. The note uses [@bae2025functional] but this will not render unless the bib entry exists.

Minor / Stylistic

Line 29: “Strip it away and the core idea is simple” reads well but “strip it away” has no clear antecedent. Change to “Strip away the infinite-dimensional machinery and the core idea is simple.”
Line 36: “it generates smooth paths and respects the boundary conditions” – the OU process does not inherently respect boundary conditions. It is the eigenbasis parameterization using Dirichlet eigenfunctions that enforces boundary conditions. The OU process itself is defined on all of \(\mathbb{R}^{dK}\). Reword.
Line 40: “\(A\) is the discrete Laplacian” – technically \(A\) is the negative discrete Laplacian (so that its eigenvalues are positive). The FAS paper writes \(\mathcal{A} = -\Delta\). Clarify: “\(A = -\Delta_h\) is the negative discrete Laplacian.”
Line 48: “The OU reference is Brownian motion with a drift that pulls each mode toward zero.” This phrasing is misleading; OU is not “Brownian motion with a drift” in any standard sense. OU is a mean-reverting diffusion. The statement conflates the two. Better: “Compared to Brownian motion, the OU reference adds a mean-reverting drift.”
Line 52: “Same structure as adjoint sampling, with base drift \(b(\mathbf{x}) = -A\mathbf{x}\) instead of \(b = 0\).” Good summary sentence.
Line 80: “the OU semigroup, which propagates the terminal gradient backward through the base dynamics” – with the sign correction, the semigroup damps the terminal gradient, not “propagates” it. Reword to: “the OU semigroup, which damps the terminal gradient according to how much the base dynamics attenuate perturbations.”
Line 91: “the FAS loss ?@eq-fas-loss and the OU bridge (used for the reciprocal projection, see below) are both defined mode-by-mode” – the note never actually describes the reciprocal projection for OU. The “see below” points to the Connection section (line 97), which mentions it in passing. Either add a brief description or remove the forward reference.
Missing date in frontmatter. The date is “2026-03-23” which matches today. Fine, but date-modified should be updated if edits are made.