Critique: ASBS.qmd
Overall Assessment
The note correctly identifies the core contribution of ASBS (removing the memoryless condition via SB potentials, alternating AM/CM = IPFP) and is short enough to read in one sitting. But it reads like a well-organized summary rather than a note in the author’s distinctive style. It tells the reader what things are instead of deriving them from scratch, it leans on descriptions where the existing notes lean on computation, and it re-derives material already covered in shrodinger.qmd / HJB.qmd instead of just citing it. The voice is too clean, too structured, too “explainer-mode” compared to doob.qmd or HJB.qmd.
Major Issues
The note describes rather than derives. The most distinctive feature of HJB.qmd and doob.qmd is that they derive everything by computing: Euler discretization, Bayes’ rule, expand, collect terms, done. ASBS.qmd instead announces results and then justifies them post hoc. For example, the SB potentials ?@eq-sb-potentials are stated, then the “SOC characteristics” theorem is stated, and a details block confirms it works. The author’s style would be: start from the SB problem, write down the Radon-Nikodym derivative, ask “what does the optimal control look like?”, and compute it. The reader should discover the SOC reinterpretation through computation, not receive it as a theorem statement.
Too much re-derivation of material already in existing notes. Lines 25-54 re-derive the SOC setup, the optimal joint distribution, the memoryless condition, and why it forces a Dirac prior. All of this is already in HJB.qmd and adjoint_sampling.qmd. The ASBS note should assume the reader has read those notes and simply say “Recall from the SOC notes that the optimal joint is \(p^\star(X_0, X_1) = \bbP(X_0, X_1) e^{-g(X_1) + V_0(X_0)}\) and that the memoryless condition \(\bbP(X_0, X_1) = \bbP(X_0)\bbP(X_1)\) eliminates the \(V_0\) bias.” That is 2 sentences, not 30 lines.
SB potentials and IPFP are already in shrodinger.qmd. Lines 57-78 re-present the SB factorization \(d\bbQ^\star/d\bbP \propto \widehat{\varphi}_0(X_0) \varphi_1(X_1)\) and the time-dependent potentials. These are defined and derived in the Schrodinger bridges note. Cross-reference them; do not re-derive.
The corrector matching derivation (lines 122-141) is the most interesting new computation in the note, but it is buried in a details block. This is the core new idea that distinguishes ASBS from adjoint sampling. It should be front and center, derived step by step in the main text, not hidden. The details block pattern is for routine side computations that support the main argument, not for the main argument itself.
Missing historical figure portrait. Every existing note has one. The TODO comment (lines 13-17) is still there. This is a formatting gap but also a style signature.
The “Global convergence” section (lines 162-172) adds almost nothing. It states a theorem that says “if each step is exact, IPFP converges” and then immediately concedes “in practice it’s approximate.” This is theory for theory’s sake. Either give the reader genuine intuition for why IPFP converges (contraction in KL, monotone decrease, connection to Sinkhorn for matrices) or cut it to one sentence: “Since this is IPFP on path-space, it converges to the SB solution when each step is solved exactly [@de2021diffusion].”
No Euler-discretization / Bayes’ rule derivation anywhere. The signature move of the existing notes is absent. The corrector matching formula, the SOC reinterpretation, the connection between \(\widehat{\varphi}_1\) and base transition kernels: all of these could be derived from scratch using the author’s informal style. Instead they are stated.
Style Issues
Lines 19-22 (opening paragraph)
“The adjoint matching and adjoint sampling notes showed how to learn a controlled diffusion that transports samples to a target… The price for that simplicity was the memoryless condition…”
This reads like a paper introduction, not the author’s style. Compare the opening of HJB.qmd, which starts directly with “Consider a diffusion…” and immediately writes down the SDE. The ASBS note should open with something more like: “The adjoint sampling setup forces \(X_0 = 0\). What if we want a Gaussian prior, or a harmonic oscillator prior for molecules? The memoryless condition breaks, and the terminal distribution picks up a bias from the initial value function \(V_0(X_0)\). To fix this, solve a full Schrodinger bridge instead.”
Lines 22-23
“This note describes the Adjoint Schrodinger Bridge Sampler (ASBS) of @liu2025adjoint, which removes the memoryless restriction entirely.”
“This note describes” is exactly the kind of preamble the author never uses. Just do the thing.
Line 81
“The key theorem of @liu2025adjoint connects the SB problem back to SOC.”
The author would never announce “the key theorem.” The style is to set up the computation and let the result speak.
Lines 82-88 (bold theorem statement)
SOC characteristics of SB. The kinetic-optimal drift…
The author’s notes never use bold theorem statements. They derive results inline as natural consequences of computation. This should be a derivation, not a theorem box.
Line 111
“In other words, every SB problem decomposes into an SOC problem…”
Acceptable transition phrase (the author does use “In other words”), but the sentence summarizes what should have been shown by computation, not announced.
Lines 146-151 (alternating optimization description)
“Given a corrector approximation \(h^{(k-1)}\) from stage \(k-1\): 1. Adjoint matching step… 2. Corrector matching step…”
Numbered algorithmic steps are fine, but the surrounding prose is too procedural/textbook. The author’s style would derive the alternation as the natural thing to try given the circular dependency.
Lines 153-159 (half-bridge paragraph)
“This alternating scheme has a clean variational interpretation. At each stage: - The AM step solves a forward half bridge… - The CM step solves a backward half bridge…”
Bullet-point lists describing variational interpretations feel like a paper summary. The writing style guide explicitly says “Avoid LLM-style bullet point lists unless summarizing key takeaways.”
Lines 175-194 (Summary section)
The comparison table is good. But the rest reads like a paper abstract. “The broader picture:” followed by a bullet list is not how the author writes. The practical benefit sentence at the end (line 194) is fine but should be woven into the motivation earlier, not saved for a coda.
Missing Content
The reciprocal projection / Brownian bridge sampling trick. The paper specializes to \(f_t = 0\) and uses the same reciprocal projection as adjoint sampling: sample pairs from Brownian bridges rather than simulating the controlled SDE. This is a key practical ingredient that makes the method scalable. The adjoint_sampling.qmd note covers this for the memoryless case; the ASBS note should at minimum cross-reference it and explain that the same trick applies here.
What happens at stage \(k=1\)? The note mentions (line 151) that the first AM stage regresses onto \(\sigma_t \nabla E(X_1)\). This deserves more emphasis: the first stage is just energy-guided transport, the same as naive score-based sampling. The corrector then iteratively fixes the bias. This is the key intuition for why the method works in practice.
Harmonic prior example. The paper’s main practical selling point is domain-specific priors (harmonic oscillators for molecular systems). The note mentions this once in passing (line 194). A concrete 2-3 line example of what the harmonic prior looks like (\(\mu(x) \propto \exp(-\alpha/2 \sum_{i,j} \|x_i - x_j\|^2)\)) and why it helps (particles start in a physically reasonable arrangement rather than at the origin) would ground the abstract framework.
Connection to Nelson’s relation. The ASBS paper emphasizes that the forward and backward SB drifts are related through Nelson’s relation (time reversal of the SDE). This is what makes the alternating scheme a genuine IPFP and not just coordinate descent. The note mentions IPFP but does not explain why the AM and CM steps correspond to the two IPFP projections at a mechanical level.
The role of \(\widehat{\varphi}_0\) in the corrector matching. The details block (lines 130-141) derives \(\nabla \log \widehat{\varphi}_t(x) = \E_{p^\star}[\nabla_x \log \bbP_{t|0}(x | X_0) | X_t = x]\), and then says “Writing this conditional expectation as a least-squares regression target gives ?@eq-cm.” But the step from conditional expectation to least-squares regression is non-trivial (it is the same denoising score matching / Tweedie trick). This should be spelled out or cross-referenced to the reverse_and_tweedie.qmd note.
Mode-seeking behavior / limitations. The paper discusses that SOC-based samplers are mode-seeking (they minimize KL in the forward direction). This is an important practical caveat that affects the method’s ability to cover all modes. Worth a one-line mention.
Unnecessary Content
Lines 25-54 (memoryless condition recap). Almost entirely redundant with HJB.qmd and adjoint_sampling.qmd. Cut to 3-5 lines of cross-references plus a reminder of the key equation.
Lines 57-78 (SB potentials presentation). Redundant with shrodinger.qmd. Cross-reference and state only what is new: the connection between SB potentials and the corrector.
Lines 162-172 (Global convergence section). The theorem statement adds no understanding beyond “IPFP converges.” Either derive the convergence argument briefly (it follows from monotone decrease of KL, which is standard) or cut to one sentence.
Lines 175-194 (Summary and relation to prior work). The table is useful. The “broader picture” bullet list is a paper-style summary that duplicates what the note already said. Cut the bullets, keep the table.
Line-by-Line Notes
- Line 10: The
[//]: # INCLUDE LATEX MACROScomment is fine (matches existing notes). - Line 20: “adding a control \(u_t(x)\) gives” – good, matches HJB style.
- Line 33: The SDE uses \(\sigma_t u_t(X_t)\) inside the drift, matching the \(f_t = 0\) case. Good.
- Line 39: “\(u^\star_t(x) = \sigma_t \nabla \log h(t,x)\)” – note this uses \(\sigma_t\) (scalar) not \(\sigma_t^2\). In HJB.qmd line 56, the formula is \(u^\star = \sigma^\top \nabla \log h\). For scalar \(\sigma\) and the convention \(dX = (\sigma u) dt + \sigma dW\) used here, the optimal control is \(u^\star = \sigma \nabla \log h\), so the drift is \(\sigma^2 \nabla \log h\). Double-check this against the convention in HJB.qmd. The HJB note uses \(dX = b\,dt + \sigma(dW + u\,dt)\), giving drift correction \(\sigma u = \sigma \cdot \sigma^\top \nabla \log h = \sigma\sigma^\top \nabla \log h\). The ASBS note uses \(dX = (f + \sigma u)dt + \sigma dW\), so drift correction is \(\sigma u\). With \(u = \sigma \nabla \log h\), the drift correction is \(\sigma^2 \nabla \log h\). This matches. OK.
- Line 41: “\(V_0(x) = -\log \int \bbP(X_1 | X_0 = x) e^{-g(X_1)} dX_1\)” – this is correct but uses a different sign convention than HJB.qmd where \(V = \log h\) and \(h = \E[e^{g}]\). Here \(h = \E[e^{-g}]\) because the terminal cost enters with a minus sign in the SOC objective. Notation is internally consistent but could confuse readers going between notes.
- Line 52: The claim that \(\mu = \delta_0\) with \(f_t = 0\) is the “standard way to enforce memorylessness” is correct for the adjoint sampling context but slightly misleading; VP-SDE with large noise is another standard approach (mentioned in the paper). A parenthetical “(or large-noise processes like VP-SDE)” would be honest.
- Line 66-67: The SB Radon-Nikodym derivative \(d\bbQ^\star/d\bbP \propto \widehat{\varphi}_0(X_0) \varphi_1(X_1)\) is stated without derivation. In shrodinger.qmd this is derived. Cross-reference.
- Line 79: “The forward potential \(\varphi_t\) is a Doob h-transform: it generates the optimal forward drift \(\sigma_t^2 \nabla \log \varphi_t(x)\).” – Good, but “generates” is vague. Say “the SB process has forward drift \(f_t + \sigma_t^2 \nabla \log \varphi_t\)” to be precise.
- Line 86: \(\eqref{eq-controlled-sde}\) – this reference format may not render correctly in Quarto. The existing notes use
@eq-controlled-sdeconsistently. Check rendering. - Line 96: The details block uses \(\exp\BK{-\log \frac{\widehat{\varphi}_1(X_1)}{\pi(X_1)} + V_0(X_0)}\). The sign of \(V_0\) here needs careful attention. In line 41, \(V_0\) is defined as \(-\log \int \bbP_{1|0}(y|x) e^{-g(y)} dy\). With \(g = \log(\widehat{\varphi}_1/\pi)\), we get \(V_0(x) = -\log \int \bbP_{1|0}(y|x) \pi(y)/\widehat{\varphi}_1(y) dy\). Then line 103 uses \(\mu(X_0) e^{V_0(X_0)} = \widehat{\varphi}_0(X_0)\). This step is not obvious and requires using the SB boundary conditions. The details block should spell this out more carefully.
- Line 118-119: ?@eq-am has \(\bbP_{t|0,1}\) and \(p^{\bar{u}}_{0,1}\). The subscript notation is ambiguous. Is \(\bbP_{t|0,1}\) the base bridge kernel \(\bbP(X_t | X_0, X_1)\)? And \(p^{\bar{u}}_{0,1}\) the joint endpoint distribution under the current control? This should be defined explicitly.
- Line 120: “When \(\mu = \delta_0\) (memoryless case), \(\widehat{\varphi}_1 = \bbP_1\)” – this claim deserves a one-line justification. With \(\mu = \delta_0\), \(\widehat{\varphi}_0 = f = \text{const}\) (since all mass is at 0), so \(\widehat{\varphi}_t(x) = \E[\text{const} | X_t = x] = \text{const}\). Wait, that gives \(\widehat{\varphi}_1 = \text{const}\), not \(\bbP_1\). This might be a sign error or confusion about normalization. Needs checking.
- Line 124: ?@eq-cm has \(\nabla_{X_1} \log \bbP(X_1 | X_0)\). For \(f_t = 0\), \(\bbP(X_1 | X_0)\) is Gaussian, so this score is known in closed form: \((X_0 - X_1)/(\nu_1 - \nu_0)\) (up to scaling). Stating this explicitly would make the formula concrete and connect to the Brownian bridge score.
- Line 164: The citation @de2021diffusion is for the convergence of IPFP on path-space. Good.
- Line 179-183: The comparison table is useful and well-formatted. Keep this.
- Lines 197-234: BibTeX entries in an HTML comment. These should be added to ref.bib, not left as comments. The note uses
@liu2025adjointetc. in citations, so the bib file needs these entries.
Suggested Structure for v2
The v2 should be roughly 60-70% the length of the current version, with more computation and less description. Here is a suggested outline:
Opening (5-8 lines)
State the problem directly: adjoint sampling forces \(X_0 = 0\). We want informative priors. The memoryless condition creates a bias \(V_0(X_0)\) in the terminal marginal (cross-reference HJB.qmd, adjoint_sampling.qmd for details). Question: can we fix this bias while keeping the adjoint matching framework?
From SOC to SB: the corrector (main derivation)
Start from the Schrodinger bridge factorization \(d\bbQ^\star/d\bbP \propto \widehat{\varphi}_0(X_0) \varphi_1(X_1)\). Derive, by direct computation, that the SB problem is equivalent to an SOC problem with modified terminal cost \(g(x) = \log(\widehat{\varphi}_1(x)/\pi(x))\). Show by explicit marginalization that \(\widehat{\varphi}_1\) cancels the \(V_0\) bias (this is currently in a details block; promote it to main text). This is the core “aha.”
The two matching objectives (derive, do not state)
Derive the AM objective by applying adjoint matching (cross-reference adjoint_matching.qmd) to the modified SOC problem. Derive the CM objective from the definition of \(\widehat{\varphi}_t\) using Bayes’ rule/denoising score matching. Make the connection to the Tweedie formula explicit: the corrector score is a conditional expectation of the bridge score.
State explicitly what the CM regression target looks like for \(f_t = 0\): it is the Gaussian bridge score, which is known in closed form.
Alternating optimization = IPFP (keep short)
Explain the circular dependency. The natural fix is alternation. Note that \(h^{(0)} = 0\) gives pure energy-guided transport at stage 1. Identify the alternation with IPFP on path-space (cross-reference shrodinger.qmd for IPFP), note convergence in one sentence.
Comparison table and practical remarks (keep)
Keep the table comparing AS and ASBS. Add a concrete example of a useful prior (harmonic oscillator, 2-3 lines). Mention mode-seeking limitation in one sentence.
Cut entirely
- The memoryless condition recap (lines 25-54): cross-reference only
- SB potentials re-derivation (lines 57-78): cross-reference only
- Global convergence section: fold into one sentence in the IPFP paragraph
- “Summary and relation to prior work” prose (keep the table)
- Bold theorem statement format