Second-Pass Critique: adjoint_sampling_v2.qmd

Verdict

A clean, dense note that gets most things right, but the sign convention for \(V\) is internally inconsistent with HJB.qmd and the “projection helps” section describes rather than computes.

Sign Convention Issues

Issue 1 (line 42): \(V(t,x) = -\log h(t,x)\) contradicts HJB.qmd.

HJB.qmd defines \(V = \log h\) (lines 200–205) and \(u^\star = \sigma^\top \nabla V = \sigma^\top \nabla \log h\). This note defines \(V = -\log h\) on line 42. Both notes agree that \(u^\star = \sigma_t \nabla \log h\), so the optimal control formula is correct regardless. But the value function symbol \(V\) now means opposite things across the two notes:

  • HJB.qmd: \(V = \log h = \sup_u \E[\int (f - \frac{1}{2}\|u\|^2) ds + g(X_T)]\) (maximization)
  • This note: \(V = -\log h\) (which equals \(\inf_u \E[\int \frac{1}{2}\|u\|^2 ds + g(X_1)]\) for the minimization formulation)

The note on line 36 says: “our minimization corresponds to \(f = 0\) and \(g_\text{HJB} = -g\)”. That is correct for translating the SOC cost. But then \(V_\text{HJB} = \log h_\text{HJB}\) where \(h_\text{HJB} = \E[\exp(g_\text{HJB}(X_1)) | X_t = x] = \E[\exp(-g(X_1)) | X_t = x]\), which matches the \(h\) defined on line 39. So \(V_\text{HJB} = \log h = -V_\text{this note}\). The note’s \(V\) is the negation of HJB.qmd’s \(V\).

Recommendation: Either (a) use \(V = \log h\) consistently and write the minimization cost as \(J(u) = -V(0,0)\), or (b) keep \(V = -\log h\) but add an explicit note saying “we define \(V = -\log h\), the negation of the HJB notes convention, so that \(V\) equals the cost-to-go rather than the reward-to-go.” Currently the note just drops \(V = -\log h\) without flagging the sign flip.

Issue 2 (line 39): \(h\) definition is correct but notation could confuse.

The note defines \(h(t,x) = \E[\exp(-g(X_1)) | X_t = x]\) under the base process. This matches HJB.qmd’s convention where \(h = \E[\exp(g_\text{HJB})]\) with \(g_\text{HJB} = -g\). The formula is mathematically correct. However, the cross-reference on line 42 says “This is the same \(h\) from the Doob h-transform notes” but in the Doob notes, \(h(t,x) = \E[\exp(g(X_T)) | X_t = x]\) with a plus sign in the exponent. The \(h\) here has \(\exp(-g)\), not \(\exp(+g)\). The functions coincide only after identifying \(g_\text{here} = -g_\text{doob}\), which is not stated.

Issue 3 (line 34): Terminal cost sign is correct.

\(g(x) = \log p_1^\text{base}(x) + E(x)/\tau\). Since \(\pi \propto \exp(-E/\tau)\), we have \(\log(p_1^\text{base}/\pi) = \log p_1^\text{base} + E/\tau + \text{const}\), so \(g = \log(p_1^\text{base}/\pi) + \text{const}\). The SOC minimizes \(\E[\frac{1}{2}\|u\|^2 + g(X_1)]\). At optimality, \(p_1^{u^\star} = \pi\), which makes \(\kl(\bbP^{u^\star}, \bbP^\star) = 0\) since \(\bbP^\star = \bbP(\cdot | X_1) \pi(X_1)\). This is all consistent. No issue here.

Notation Inconsistencies (cross-note)

1. SDE convention mismatch with adjoint_matching_v2.qmd.

  • adjoint_sampling_v2 (line 25): \(dX_t = \sigma_t u(t, X_t) dt + \sigma_t dW_t\) with \(X_0 = 0\). No base drift \(b\).
  • adjoint_matching_v2 (line 36): \(dX^u_t = [b(X^u_t, t) + \sigma(t) u(t, X^u_t)] dt + \sigma(t) dW_t\). Has base drift \(b\).

This difference is intentional (adjoint_matching handles the general pre-trained model case), but it means the “control” \(u\) plays a slightly different role. In adjoint_matching, \(u\) is an additive correction to an existing drift \(b\). In adjoint_sampling, \(u\) is the full drift (since \(b = 0\)). The connection paragraph on lines 46–47 handles this by saying “For a controlled SDE with base drift \(b\) and running cost \(f\),” which is fine.

2. Terminal cost symbol.

  • adjoint_sampling_v2: \(g(x) = \log p_1^\text{base}(x) + E(x)/\tau\)
  • adjoint_matching_v2: terminal cost is \(-r(X_1)\) where \(r\) is a reward
  • ASBS_v2 (line 46): \(g(x) = \log(\widehat{\varphi}_1(x)/\pi(x))\)
  • BMS_v2: uses \(\nabla \log \pi(X_T)\) directly

The adjoint_matching note uses reward \(r\) (maximization framing for fine-tuning), while adjoint_sampling uses cost \(g\) (minimization framing for sampling). ASBS uses the same \(g\) symbol as adjoint_sampling but with a different expression. These are all valid for their settings, but it would help the reader to have a sentence in adjoint_sampling saying “In the adjoint matching note, the terminal cost is \(g = -r\).”

3. Bridge notation.

  • adjoint_sampling_v2 (line 105): \(X_t | X_1 = x_1 \sim \normal(\frac{\nu_t}{\nu_1} x_1, \frac{\nu_t(\nu_1 - \nu_t)}{\nu_1} I)\)
  • BMS_v2 (line 104): \(X_t | (X_0, X_T) \sim \normal((1-\gamma(t)) X_0 + \gamma(t) X_T, \kappa(T) \gamma(t)(1-\gamma(t)) I)\) with \(\gamma(t) = \kappa(t)/\kappa(T)\)

For \(X_0 = 0\), BMS gives mean \(= \gamma(t) X_T = (\kappa(t)/\kappa(T)) X_T = (\nu_t/\nu_1) X_1\) and variance \(= \kappa(T) \gamma(1-\gamma) = \nu_1 \cdot (\nu_t/\nu_1)(1 - \nu_t/\nu_1) = \nu_t(1 - \nu_t/\nu_1) = \nu_t(\nu_1 - \nu_t)/\nu_1\). These match. Good.

Pedagogy Gaps

1. The adjoint being constant (lines 46–57): Clear enough, but the details block could be sharper.

The details block on lines 68–72 argues that perturbing \(X_t\) by \(\delta X_t\) shifts \(X_1\) by the same amount because \(X_1 = X_t + \text{(independent future increments)}\). This is correct and clear. The main text derivation (set \(b = 0, f = 0\) in the lean adjoint ODE, get \(\dot{\tilde{a}} = 0\)) is transparent.

One gap: the note doesn’t explain what the “lean adjoint” is. It links to the adjoint matching note but a grad student reading this note first would stumble. Line 47 says “the adjoint method gives the lean adjoint ODE” but the lean adjoint is a specific simplification of the full adjoint (removing \(u\)-dependent terms). The word “lean” needs either a one-sentence definition or an explicit pointer like “the lean adjoint (the \(u\)-independent part of the full adjoint, as derived in adjoint matching)”.

2. The Brownian bridge computation (lines 102–108): Correct and well-motivated.

The Gaussian conditioning is shown: \(\cov(X_t, X_1) = \nu_t I\), independent increments argument. The formula follows by standard Gaussian conditioning. A grad student who knows Gaussian conditioning can follow this.

3. The reciprocal projection (lines 86–100): The “why” is partially missing.

The note states that at optimality \(p^{u^\star}_{t,1} = p^\text{base}_{t|1} \cdot \pi\), and then says the reciprocal projection replaces \(p^{\bar{u}}_{t,1}\) by \(p^\text{base}_{t|1} \cdot p^{\bar{u}}_1\). But it doesn’t explain why this replacement is valid for optimization. Line 100 says “This is a projection onto the reciprocal class of the base process” which is a description, not an argument. The justification comes later in the “projection helps” section, but that section is itself more description than computation (see below).

A grad student would ask: “If I replace the joint with something different, don’t I get a different loss with a different minimizer?” The answer is no (the minimizer is the same because at the fixed point the two coincide), but this should be said explicitly near line 98.

4. The projection improvement (lines 121–140): Describes rather than computes.

This is the weakest section stylistically. The “proof sketch” in the details block on lines 134–137 is a verbal argument, not a computation. Compare with HJB.qmd, where every result falls out of a few lines of algebra. Here the reader gets “Split the terminal cost… By definition of \(\Pi(u)\), the first term is minimized… the second term depends only on \(X_1\)…” This reads like a referee report, not like the author’s other notes.

Suggested fix: write out \(J(u) = \E_{\bbP^u}[\frac{1}{2}\|u\|^2 + g(X_1)]\), decompose \(g = g_1 + g_2\) where \(g_1 = \log(p^\text{base}_1/p^u_1)\) and \(g_2 = \log(p^u_1/\pi)\), note that \(\E[\frac{1}{2}\|u\|^2 + g_1(X_1)] = \kl(\bbP^u, \bbP^\text{base}(\cdot|X_1) p^u_1(X_1))\) by Girsanov, and this is minimized by \(\Pi(u)\) by definition. Then \(J(u) = \kl(\ldots) + \E_{p^u_1}[\log(p^u_1/\pi)]\) and the second term is unchanged under \(\Pi\) since \(\Pi\) preserves the terminal marginal.

Remaining Style Issues

1. “Note the sign” (line 36).

The phrase “Note the sign” followed by a colon and explanation is acceptable but slightly preamble-ish. In HJB.qmd style, you’d just write the correspondence and move on: “The minimization problem corresponds to \(f = 0, g_\text{HJB} = -g\) in the HJB convention.”

2. Bold section-within-algorithm (line 147).

Outer loop (expensive):” and “Inner loop (cheap):” is fine, but the algorithm description on lines 144–159 is more textbook than the author’s usual style. Compare with doob.qmd, where the computation speaks for itself. Consider trimming the verbal description and making the fixed-point iteration equation (lines 155–157) the centerpiece.

3. “The gain is computational” (line 118).

A bit of an announcement. Better: “Sampling the pair now proceeds as:…”

4. Missing historical figure portrait (lines 13–17).

TODO comment is still there. All existing notes (HJB, doob, girsanov, adjoint) have portraits.

5. Last paragraph comparison with PDDS/TSM (lines 161).

Good content, but the phrase “This makes Adjoint Sampling on-policy with a moving target that converges to the fixed point, rather than a single regression against an approximate sample from \(\pi\)” is slightly long. Consider splitting.

Mathematical Errors or Concerns

1. The AM loss sign (line 62).

\(\cL_\text{AM}(u) = \E[\int \frac{1}{2}\|u(t, X_t) + \sigma_t \nabla g(X_1)\|^2 dt]\). The optimal control should satisfy \(u^\star(t,x) = -\sigma_t \E[\nabla g(X_1) | X_t = x]\) (stated on line 65). Let’s verify: \(u^\star = \sigma_t \nabla \log h\) where \(h(t,x) = \E[\exp(-g(X_1)) | X_t = x]\). So \(\nabla \log h = -\E[\nabla g(X_1) \exp(-g(X_1)) | X_t = x] / h(t,x)\). This is NOT the same as \(-\E[\nabla g(X_1) | X_t = x]\); that equality holds only under \(\bbP^\star\), not under \(\bbP\). Under \(\bbP^\star\), the conditional density of \(X_1 | X_t = x\) already includes the \(\exp(-g)\) tilt, so the expectation of \(\nabla g(X_1)\) under \(\bbP^\star\) does give \(-\nabla \log h\).

The regression target in the AM loss is computed under \(\bbP^{\bar{u}}\), not \(\bbP\). At the fixed point where \(\bar{u} = u^\star\), the distribution \(\bbP^{u^\star}\) is tilted so that \(X_1 | X_t = x\) has the right conditional. So the statement “At the fixed point, \(u^\star(t,x) = -\sigma_t \E[\nabla g(X_1) | X_t = x]\)” is correct, but only because the expectation is under \(\bbP^{u^\star}\), not the base \(\bbP\). The note should clarify which measure the expectation is under. Line 65 says “the adjoint target \(\nabla g(X_1)\) is a stochastic estimate of \(-\nabla \log h(t, X_t)\),” which is imprecise: it’s a stochastic estimate under \(\bbP^{u^\star}\), not under \(\bbP\).

2. Brownian bridge formula (line 105): Correct.

\(X_t | X_1 = x_1 \sim \normal(\frac{\nu_t}{\nu_1} x_1, \frac{\nu_t(\nu_1 - \nu_t)}{\nu_1} I)\).

Verification: \(X_t \sim \normal(0, \nu_t I)\), \(X_1 \sim \normal(0, \nu_1 I)\), \(\cov(X_t, X_1) = \nu_t I\) (since \(X_t\) contributes \(\nu_t\) and the independent future contributes \(\nu_1 - \nu_t\) to \(X_1\)). Gaussian conditioning: \(X_t | X_1 = x_1\) has mean \(\nu_t \nu_1^{-1} x_1 = (\nu_t/\nu_1) x_1\) and variance \(\nu_t I - \nu_t^2 \nu_1^{-1} I = \nu_t(1 - \nu_t/\nu_1) I = \nu_t(\nu_1 - \nu_t)/\nu_1 \cdot I\). Matches.

3. The projection operator (line 126).

\(\Pi(u) = \argmin_v \E_{\bbP^v}[\int \frac{1}{2}\|v\|^2 + \log(p^\text{base}_1(X_1)/p^u_1(X_1))]\). Here the terminal cost \(\log(p^\text{base}_1/p^u_1)\) is the log-ratio with the current terminal marginal \(p^u_1\) as target. The SOC with this terminal cost has solution \(\Pi(u)\) whose terminal marginal is \(p^u_1\). This is a standard half-bridge construction. No issue.

4. RAM time weighting (line 116).

\(\lambda(t) = 1/\sigma_t^2\). The note says “normalizes the regression target magnitude across time (does not change the optimum).” This is correct: the regression target is \(-\sigma_t \nabla g(X_1)\), which has magnitude proportional to \(\sigma_t\). The squared loss \(\|u + \sigma_t \nabla g\|^2\) has a factor of \(\sigma_t^2\) in the regression target, so weighting by \(1/\sigma_t^2\) normalizes this. The minimum is the same conditional expectation regardless of \(\lambda(t)\).

5. Fixed-point equation (lines 153–157).

\(u_{i+1} = \Pi(u_i) - \frac{\delta \cL_\text{AM}}{\delta u}(\Pi(u_i))\). This mixes notation: \(\Pi(u_i)\) is a function (a control), not a parameter, and \(\frac{\delta \cL_\text{AM}}{\delta u}\) is a functional derivative. The expression is meant to be read as “the control obtained by one step of functional gradient descent on \(\cL_\text{AM}\) starting from \(\Pi(u_i)\).” This is fine conceptually but slightly informal. The paper (Theorem 1, reference paper line 260) states the same. No mathematical error.

Minor Issues

  • Line 20: ### Sampling via controlled Brownian motion uses ### as the first heading. HJB.qmd and doob.qmd also use ### for their first content heading (after the portrait). Consistent.
  • Line 28: Long paragraph. Consider a line break before “As described in the SOC notes.”
  • Line 34: “up to an additive constant \(\log \cZ\) that does not affect the optimal control” is good.
  • Line 59: “No backward ODE to solve.” Good punchline.
  • Line 100: “reciprocal class” is linked to the Schrodinger notes. Good.
  • Line 108: Parenthetical “(the usual Brownian bridge)” references doob.qmd. Good.
  • Line 110: “Reciprocal Adjoint Matching (RAM)” is bolded. In HJB.qmd, named results are not bolded; they just appear. Minor style difference.
  • Line 144: Citation [@havens2025adjoint] is correct but the BibTeX entry is in a comment block (lines 164–192), not in the actual ref.bib. This will fail to render. Same for the other BibTeX entries.
  • Line 62: The loss uses \(\cL_\text{AM}\), same symbol as the generator \(\cL\) from the macros. The macros define \(\cL = \mathcal{L}\), and the loss also uses \(\cL_\text{AM}\). HJB.qmd uses \(\cL\) for the generator and doesn’t have a loss function with the same symbol. This is a potential collision but likely renders fine since the subscript disambiguates.
  • Lines 67–72: The <details> block uses <p style="color: blue;">. HJB.qmd and doob.qmd also use this pattern. Consistent.
  • Macro usage: I checked for raw LaTeX that should use macros. The note uses \bbP^u, \bbP, \normal, \kl, \E, \sqBK, \BK, \cL, \cZ, \blue. No missing macro usage detected. Line 62 uses \texttt{stopgrad} which is fine (no macro for it).
  • Line 94: \(p^\text{base}_{t \mid 1}\) uses \mid for the conditional. This is standard. Some notes use | directly. Minor inconsistency but not worth flagging.