Second-Pass Critique: ASBS_v2.qmd
Verdict
Major improvement over v1. The note now derives rather than announces. The corrector matching derivation is in main text (not buried in a details block). The opening is direct and problem-driven. The SOC-to-SB identification is computed, not stated as a theorem. Cross-references replaced most re-derivation. At 142 lines this is the shortest of the four v2 notes, as intended.
Remaining issues are mostly at the equation level: one sign convention subtlety that could confuse readers moving between notes, a few notation mismatches with the other v2 notes and with shrodinger.qmd, and some broken cross-reference paths. No showstopper mathematical errors found, but two claims need tightening.
Sign Convention Issues
1. SOC convention mismatch between ASBS_v2 and HJB.qmd.
HJB.qmd defines the SOC problem as a maximization:
\[V(t,x) = \sup_u \E\left[\int_t^T (f - \tfrac{1}{2}\|u\|^2) ds + g(X_T) \mid X_t = x\right]\]
with \(h(t,x) = \E[\exp(\int_t^T f\,ds + g(X_T)) \mid X_t = x]\), \(V = \log h\), and optimal control \(u^\star = \sigma^\top \nabla V = \sigma^\top \nabla \log h\).
ASBS_v2 (line 39) uses the minimization convention from adjoint_sampling_v2: \(\min_u \E[\int \frac{1}{2}\|u\|^2 dt + g(X_1)]\). In this convention, \(h(t,x) = \E[e^{-g(X_1)} \mid X_t = x]\) (note the minus sign on \(g\)) and \(V = -\log h\). The optimal control is \(u^\star = \sigma \nabla \log h = -\sigma \nabla V\), which has the opposite sign from the HJB.qmd convention.
The note tries to bridge the two by writing (line 39): “the SOC optimal joint is \(p^\star(X_0, X_1) = \bbP(X_0, X_1) e^{-g(X_1) + V_0(X_0)}\)”. This is correct for the minimization convention where \(V_0(X_0) = \log \E[e^{-g(X_1)} \mid X_0]\). But the cross-reference to the “SOC joint” is misleading because HJB.qmd writes \(\frac{d\bbP^\star}{d\bbP} \propto \exp(g(X_T))\) (maximization), not \(\exp(-g(X_1))\).
Fix: add a parenthetical clarifying the sign flip. Something like: “the SOC optimal joint is \(p^\star(X_0, X_1) \propto \bbP(X_0, X_1) e^{-g(X_1) + V_0(X_0)}\) (the HJB notes use a maximization convention where \(g\) has the opposite sign).”
Alternatively, adopt the same convention as adjoint_sampling_v2, which already handles this (line 36 of that note: “the [HJB notes] use a maximization convention… our minimization corresponds to \(f = 0\) and \(g_{\text{HJB}} = -g\)”). The ASBS note should include the same clarification.
2. The identification \(e^{-g(X_1)} = \varphi_1(X_1)\) (line 42).
This is the crucial equation connecting SOC to SB. With the minimization convention, the SOC tilts the path measure by \(e^{-g(X_1) + V_0(X_0)}\). The SB tilts by \(\widehat{\varphi}_0(X_0) \varphi_1(X_1)\). Matching gives \(e^{-g} = \varphi_1\) (up to constants absorbed into \(\widehat{\varphi}_0\)). Then \(g = -\log \varphi_1\).
Next, \(\varphi_1 = \pi / (p_1^{\bbP} \widehat{\varphi}_1)\) from the SB marginal condition. So \(g = -\log[\pi / (p_1^{\bbP} \widehat{\varphi}_1)] = \log(p_1^{\bbP} \widehat{\varphi}_1 / \pi) = \log \widehat{\varphi}_1 - \log \pi + \log p_1^{\bbP}\).
The note writes (line 45-46): \(g(x) = \log(\widehat{\varphi}_1(x)/\pi(x)) + \text{const}\), absorbing \(\log p_1^{\bbP}\) into the constant. This is correct: \(p_1^{\bbP}\) is the base process marginal at \(t=1\), which does not depend on the control and can be absorbed.
Sign convention is consistent here. Verified.
3. The second identification \(e^{V_0(X_0)} = \widehat{\varphi}_0(X_0)/\mu(X_0)\) (line 42).
Line 58 says this “is automatically satisfied by the SB boundary conditions at \(t = 0\).” This is too hand-wavy. The claim is that with \(g = -\log \varphi_1\), we get \(V_0(x) = \log \E[e^{-g(X_1)} \mid X_0 = x] = \log \E[\varphi_1(X_1) \mid X_0 = x] = \log \varphi_0(x)\). So \(e^{V_0} = \varphi_0\). Then \(\widehat{\varphi}_0/\mu = \varphi_0\) needs to hold. From the SB boundary condition at \(t=0\): the SB density is \(q_0(x) = p_0^{\bbP}(x) \widehat{\varphi}_0(x) \varphi_0(x) = \mu(x)\). With \(p_0^{\bbP} = \mu\) (the base starts from \(\mu\)), this gives \(\widehat{\varphi}_0 \varphi_0 = 1\), hence \(\widehat{\varphi}_0 = 1/\varphi_0\).
But the note claims \(e^{V_0} = \widehat{\varphi}_0/\mu\). We derived \(e^{V_0} = \varphi_0\). And \(\widehat{\varphi}_0/\mu = \widehat{\varphi}_0/p_0^{\bbP}\). With \(p_0^{\bbP} = \mu\) and \(\widehat{\varphi}_0 = 1/\varphi_0\), this gives \(\widehat{\varphi}_0/\mu = 1/(\mu \varphi_0)\). For this to equal \(\varphi_0\), we’d need \(\varphi_0^2 = 1/\mu\), which is not generally true.
There is a normalization issue here. The SB factorization \(d\bbQ^\star/d\bbP \propto \widehat{\varphi}_0(X_0) \varphi_1(X_1)\) holds up to a multiplicative constant. And the SOC factorization is \(p^\star(X_0, X_1) \propto \bbP(X_0, X_1) e^{-g(X_1) + V_0(X_0)}\). The proportionality constants make the identification non-unique. The note sweeps this under “\(+ \text{const}\)” in the exponent. The claim on line 42 should not be stated as an exact equality; it should say “up to multiplicative constants” or equivalently “taking logs, this gives \(-g = \log \varphi_1 + c_1\) and \(V_0 = \log \widehat{\varphi}_0 - \log \mu + c_2\) for constants \(c_1, c_2\).”
This is not a mathematical error (the proportionality is enough for all downstream results), but the equality as written is imprecise and could confuse a careful reader.
Notation Inconsistencies (cross-note)
1. SDE convention. ASBS_v2 does not write the controlled SDE explicitly. It inherits the convention from adjoint_sampling_v2: \(dX_t = \sigma_t u(t, X_t) dt + \sigma_t dW_t\) with \(X_0 \sim \mu\). This matches BMS_v2 (line 22). Good.
adjoint_matching_v2 uses a different convention: \(dX_t = [b(X_t, t) + \sigma(t) u(t, X_t)] dt + \sigma(t) dW_t\) with non-trivial base drift \(b\). ASBS_v2 specializes to \(b = 0\) (Brownian motion base), which is consistent.
No issue here.
2. SB potential notation: \(\varphi_t\) vs \(\widehat{\varphi}_t\).
In shrodinger.qmd (line 122-132), the notation is:
- \(\varphi_t(x) = \E[g(X_1) \mid X_t = x]\) (forward, propagates terminal potential)
- \(\widehat{\varphi}_t(x) = \E[f(X_0) \mid X_t = x]\) (backward, propagates initial potential)
With boundary conditions \(\varphi_1 = g\) and \(\widehat{\varphi}_0 = f\).
In ASBS_v2 (line 30-32), the notation is:
- \(\varphi_t(x) = \E_{\bbP}[\varphi_1(X_1) \mid X_t = x]\) (forward)
- \(\widehat{\varphi}_t(x) = \E_{\bbP}[\widehat{\varphi}_0(X_0) \mid X_t = x]\) (backward)
This is consistent with shrodinger.qmd. Good.
3. The marginal density factorization.
shrodinger.qmd (line 122) writes: \(q_t(x) = p_t^{\text{ref}}(x) \widehat{\varphi}_t(x) \varphi_t(x)\).
ASBS_v2 (line 43) writes: “the SB density at time 1 is \(q_1(x) = p_1^{\bbP}(x) \widehat{\varphi}_1(x) \varphi_1(x) = \pi(x)\).”
Consistent. Good.
4. The SB drift.
shrodinger.qmd (line 138) writes: drift correction is \(\sigma \sigma^\top \nabla_x \log \varphi_t(X_t)\).
ASBS_v2 (line 33) writes: “the SB process has drift \(f_t + \sigma_t^2 \nabla \log \varphi_t\).”
For scalar \(\sigma_t\), \(\sigma \sigma^\top = \sigma_t^2\). Consistent.
5. Terminal cost notation across the 4 v2 notes.
- adjoint_matching_v2: \(g = r\) (reward), maximization via \(\cJ(u) = \E[\int \frac{1}{2}\|u\|^2 dt - r(X_1)]\)
- adjoint_sampling_v2: \(g(x) = \log p_1^{\text{base}}(x) + E(x)/\tau\)
- ASBS_v2: \(g(x) = \log(\widehat{\varphi}_1(x)/\pi(x))\)
- BMS_v2: uses the general coupling framework, no explicit \(g\)
adjoint_matching_v2 uses \(r\) for reward, others use \(g\) for terminal cost. The sign flip is documented in adjoint_sampling_v2. ASBS_v2 does not re-state this convention, relying on the reader having read adjoint_sampling_v2 first. Acceptable for the draft, but a one-sentence reminder on line 39 would help.
6. Bridge notation.
adjoint_sampling_v2 uses \(\bbP_{t \mid 0,1}\) for the bridge kernel and \(p^{\bar{u}}_{0,1}\) for the endpoint joint.
ASBS_v2 uses \(\bbP_{t \mid 0,1}\) and \(p^{\bar{u}}_{0,1}\) in ?@eq-am (line 67-68). Consistent.
BMS_v2 uses \(\bbP_{t|0,T}\) and \(\Pi^\star_{0,T}\) (different notation for the coupling, but BMS has its own framework). Acceptable.
7. \(p^{\text{base}}_1\) vs \(p_1^{\bbP}\).
adjoint_sampling_v2 writes \(p^{\text{base}}_1\). ASBS_v2 writes \(p^{\bbP}_1\) on line 43 but \(p^{\text{base}}_1\) on lines 47 and 117. Mix of the two notations within the same note. Pick one. Since \(\bbP\) is the base path measure, \(p_1^{\bbP}\) is cleaner and matches the path-measure notation used throughout. Replace the two occurrences of \(p^{\text{base}}_1\) on lines 47 and 117.
Pedagogy Gaps
1. The SB-to-SOC decomposition is now clear.
The computation on lines 35-60 is well-structured: start from the SB Radon-Nikodym derivative, match it with the SOC joint, identify the modified terminal cost, then verify the bias cancellation by explicit marginalization. The reader can follow the logic.
2. The corrector matching derivation (lines 76-98) is clear and in main text.
The progression from definition (?@eq-sb-potentials) to gradient (?@eq-cm-start) to Bayes’ rule (?@eq-cm-cond-exp) to regression (?@eq-cm) is well-paced. The connection to the Tweedie formula is stated on line 88. Good.
3. One pedagogical gap: WHY the alternation converges.
Line 110 says “Convergence to the SB solution follows from standard IPFP analysis.” This is a bit abrupt. The reader understands that AM = forward half-bridge and CM = backward half-bridge. But why alternating these two projections converges is not intuitive from the ASBS note alone. The cross-reference to shrodinger.qmd helps, but shrodinger.qmd also does not prove IPFP convergence (it describes IPFP and cites papers). A single sentence of intuition would help: “Each step reduces the KL to the SB solution (since projections onto convex sets are contractive in KL), so the iterates converge monotonically.”
4. Missing: what happens concretely at \(k=1\).
Line 108 says: “The first AM stage then regresses the control onto \(\sigma_t \nabla E(X_1)\): pure energy-guided transport with no corrector.” This is good but could be expanded by one sentence. With \(h^{(0)} = 0\), the first stage ignores the prior entirely, starting all trajectories from \(\mu\) and pushing them toward low-energy regions using \(\nabla E\) alone. The corrector at stage 2 then says “you’re overshooting some regions because the prior already had mass there.” This two-sentence picture would anchor the abstract alternation in concrete intuition.
5. The reciprocal projection trick (line 71) is handled by cross-reference.
“The same reciprocal projection trick from [adjoint sampling] applies.” This is fine for the shortest note, but the reader who skipped adjoint_sampling_v2 will be lost. Consider adding half a sentence: “…applies: sample \(X_1\) from the current model, then draw \(X_t\) from a Brownian bridge conditioned on \((X_0, X_1)\).”
Actually, I see this is already on line 71: “sample \(X_1\) from the current model, then draw \(X_t\) from a Brownian bridge conditioned on \((X_0, X_1)\). No SDE simulation needed during training.” Good. No gap here.
Remaining Style Issues
1. Opening (lines 20) is good.
“The [adjoint sampling] setup forces \(X_0 = 0\). What if we want a Gaussian prior…?” Direct, problem-driven, matches the v1 critique’s suggestion almost verbatim. No “This note describes” preamble. Passes.
2. No theorem-announcement style remaining.
The bold “SOC characteristics of SB” theorem from v1 is gone. Results are derived inline. Good.
3. The numbered list on lines 105-107 is borderline.
1. **AM step:** Solve @eq-am with...
2. **CM step:** Solve @eq-cm with...
This is an algorithmic description, not a bullet-point summary, so it is acceptable. The writing style guide says “avoid LLM-style bullet point lists unless summarizing key takeaways.” An alternating algorithm is naturally described as two steps, and the existing notes (e.g., shrodinger.qmd lines 97-98) use numbered lists for sampling procedures. Pass.
4. Line 92: “This is the corrector matching (CM) objective”.
Bold on “corrector matching” is fine as a term introduction. Matches the pattern in the other v2 notes (e.g., adjoint_matching_v2 line 185: “The Adjoint Matching loss”).
5. Line 110: “provided each step is solved exactly. In practice, 3-5 stages suffice.”
This is a nice practical remark. No style issue.
6. Line 122-123: the harmonic oscillator example.
“For molecular conformer generation, one can use a harmonic oscillator prior…” This is concrete and useful. Well placed.
7. Line 124: mode-seeking caveat.
“Like all SOC-based samplers, ASBS minimizes \(\kl(\bbQ \| \bbP)\), which is mode-seeking.” This is the forward KL from controlled to base, which is indeed mode-seeking. Wait: actually \(\kl(\bbQ \| \bbP)\) penalizes \(\bbQ\) for placing mass where \(\bbP\) is small, which forces \(\bbQ\) to stay within the support of \(\bbP\). This is mode-covering for \(\bbQ\) relative to \(\bbP\). But the SOC tilts the terminal distribution, and the effective objective for the terminal marginal is mode-seeking (it prefers to concentrate mass on high-reward regions rather than covering all modes of \(\pi\)). The claim as written is sloppy. The mode-seeking behavior comes from the fact that the SOC/SB solution is a Doob h-transform, which exponentially upweights certain modes, and the variational family (controlled diffusions) may not have enough capacity to cover all modes simultaneously.
More precisely: the AM loss is a reverse KL regression (\(u\) fitted to conditional expectations), which is mode-seeking in \(u\)-space. This is worth stating correctly or dropping entirely. As written, the justification “\(\kl(\bbQ \| \bbP)\) is mode-seeking” is not quite right.
8. No emojis, no em-dashes. Clean.
9. Historical figure portrait: still a TODO comment (lines 13-17).
All existing notes have one. This is a formatting gap flagged in v1. Still not addressed.
Mathematical Errors or Concerns
1. The identification on line 42 has a normalization issue (detailed in Sign Convention Issues, item 3).
The equalities \(e^{-g(X_1)} = \varphi_1(X_1)\) and \(e^{V_0(X_0)} = \widehat{\varphi}_0(X_0)/\mu(X_0)\) hold up to multiplicative constants, not exactly. This is fine mathematically (the proportionality is all that matters) but should be stated as proportionality, not equality.
2. Line 55: “The second-to-last step uses the backward propagation from ?@eq-sb-potentials: \(\widehat{\varphi}_1(x) = \int \bbP_{1|0}(x \mid y) \widehat{\varphi}_0(y) dy\).”
This is \(\widehat{\varphi}_1(x) = \E_{\bbP}[\widehat{\varphi}_0(X_0) \mid X_1 = x]\), which by Bayes’ rule is \(\int \bbP(X_0 = y \mid X_1 = x) \widehat{\varphi}_0(y) dy\), not \(\int \bbP_{1|0}(x \mid y) \widehat{\varphi}_0(y) dy\). The note writes the latter, which is \(\int \bbP(X_1 = x \mid X_0 = y) \widehat{\varphi}_0(y) dy\). By the definition in ?@eq-sb-potentials, \(\widehat{\varphi}_t(x) = \E_{\bbP}[\widehat{\varphi}_0(X_0) \mid X_t = x]\). This is:
\[\widehat{\varphi}_t(x) = \int \frac{p_{\bbP}(X_0 = y, X_t = x)}{p_t^{\bbP}(x)} \widehat{\varphi}_0(y) dy = \int \frac{\bbP_{t|0}(x|y) p_0^{\bbP}(y)}{p_t^{\bbP}(x)} \widehat{\varphi}_0(y) dy.\]
The expression \(\int \bbP_{1|0}(x|y) \widehat{\varphi}_0(y) dy\) (what the note writes) is \(\int \bbP(X_1 = x \mid X_0 = y) \widehat{\varphi}_0(y) dy\). This is not the same as \(\widehat{\varphi}_1(x)\) unless \(\widehat{\varphi}_0(y) \propto p_0^{\bbP}(y) / p_1^{\bbP}(x)\)… which is not generally true.
Actually, wait. In ?@eq-debias (line 51-57), the computation is:
\[p^\star(X_1) \propto \varphi_1(X_1) \int \bbP(X_0, X_1) \widehat{\varphi}_0(X_0) dX_0 = \varphi_1(X_1) \int \bbP(X_1 \mid X_0) \underbrace{\bbP(X_0)}_{\text{absorbed}} \widehat{\varphi}_0(X_0) dX_0\]
Wait, this absorbs \(\bbP(X_0) = p_0^{\bbP}(X_0)\) into the integral. Then \(\int \bbP(X_1 \mid X_0) p_0^{\bbP}(X_0) \widehat{\varphi}_0(X_0) dX_0\) is not \(\widehat{\varphi}_1(X_1)\) by definition. It equals \(p_1^{\bbP}(X_1) \cdot \widehat{\varphi}_1(X_1)\) by Bayes’ rule. So the second-to-last step in ?@eq-debias should read:
\[\varphi_1(X_1) \int \bbP(X_1 \mid X_0) p_0^{\bbP}(X_0) \widehat{\varphi}_0(X_0) dX_0 = \varphi_1(X_1) \cdot p_1^{\bbP}(X_1) \cdot \widehat{\varphi}_1(X_1) \propto \pi(X_1),\]
using \(\varphi_1 \widehat{\varphi}_1 p_1^{\bbP} = \pi\). The note does say “absorbing \(p_1^{\bbP}\) into the proportionality” on line 57, but the intermediate step \(\widehat{\varphi}_1(x) = \int \bbP_{1|0}(x \mid y) \widehat{\varphi}_0(y) dy\) written on line 55 is wrong. It should be \(p_1^{\bbP}(x) \cdot \widehat{\varphi}_1(x) = \int \bbP_{1|0}(x \mid y) p_0^{\bbP}(y) \widehat{\varphi}_0(y) dy\), or more carefully: \(\int \bbP(X_1 = x \mid X_0 = y) \mu(y) \widehat{\varphi}_0(y) dy = p_1^{\bbP}(x) \widehat{\varphi}_1(x)\).
This is a real error. The parenthetical explanation of the “backward propagation” step is incorrect as stated. The computation in ?@eq-debias itself is correct (since the \(p_1^{\bbP}\) factor is there implicitly and gets absorbed into \(\propto\)), but the verbal explanation mis-identifies what \(\widehat{\varphi}_1\) equals.
Fix: Replace line 55’s parenthetical with: “The second-to-last step uses \(\int \bbP_{1|0}(x \mid y) \mu(y) \widehat{\varphi}_0(y) dy = p_1^{\bbP}(x) \widehat{\varphi}_1(x)\) (from the definition ?@eq-sb-potentials and Bayes’ rule).”
3. Line 69: “When \(\mu = \delta_0\), we have \(\widehat{\varphi}_0 = \text{const}\), hence \(\widehat{\varphi}_1(x) \propto p_1^{\text{base}}(x)\)”.
With \(\mu = \delta_0\), \(X_0 = 0\) deterministically. The SB boundary condition at \(t=0\) requires \(q_0 = \mu = \delta_0\), which the base process already satisfies. The backward potential \(\widehat{\varphi}_0\) must satisfy \(p_0^{\bbP}(x) \widehat{\varphi}_0(x) \varphi_0(x) = \delta_0(x)\). Since \(p_0^{\bbP}(x) = \delta_0(x)\), we need \(\widehat{\varphi}_0(0) \varphi_0(0) = 1\), i.e., \(\widehat{\varphi}_0\) is only constrained at \(x = 0\). Taking \(\widehat{\varphi}_0 = \text{const}\) is one valid choice (corresponding to the “trivial” backward potential).
Then \(\widehat{\varphi}_1(x) = \E[\widehat{\varphi}_0(X_0) \mid X_1 = x] = \text{const}\) (since \(\widehat{\varphi}_0\) is constant). But the note claims \(\widehat{\varphi}_1(x) \propto p_1^{\text{base}}(x)\).
This seems wrong. If \(\widehat{\varphi}_0 = c\), then \(\widehat{\varphi}_1(x) = c\) as well. The terminal cost becomes \(g(x) = \log(c/\pi(x)) + \text{const} = -\log \pi(x) + \text{const}\). For Boltzmann \(\pi \propto e^{-E}\), this is \(g = E + \text{const}\).
But adjoint_sampling_v2 uses \(g(x) = \log p_1^{\text{base}}(x) + E(x)/\tau\), which has an extra \(\log p_1^{\text{base}}\) term. Where does this come from? It comes from writing \(\log(p_1^{\text{base}}/\pi) = \log p_1^{\text{base}} + E + \log Z\).
The reconciliation: in the adjoint sampling setup with \(\mu = \delta_0\) and \(b = 0\), the base process is a scaled Brownian motion. The terminal cost for the SOC problem is \(g(x) = \log(p_1^{\text{base}}(x)/\pi(x))\). In the ASBS framework, this should correspond to \(g(x) = \log(\widehat{\varphi}_1(x)/\pi(x))\), so \(\widehat{\varphi}_1 = p_1^{\text{base}}\) (up to constants). But we just argued \(\widehat{\varphi}_1 = \text{const}\) when \(\widehat{\varphi}_0 = \text{const}\).
The issue is normalization. The SB marginal condition says \(p_1^{\bbP}(x) \widehat{\varphi}_1(x) \varphi_1(x) = \pi(x)\). If \(\widehat{\varphi}_1 = c\) (constant), then \(\varphi_1(x) = \pi(x)/(c \cdot p_1^{\bbP}(x))\). The terminal cost is \(g = -\log \varphi_1 = \log(c \cdot p_1^{\bbP}/\pi) = \log p_1^{\bbP} - \log \pi + \log c\). So \(g = \log p_1^{\text{base}} + E + \text{const}\), which does match adjoint_sampling_v2.
So the note’s claim “\(\widehat{\varphi}_1(x) \propto p_1^{\text{base}}(x)\)” is wrong; the correct statement is “\(\widehat{\varphi}_1 = \text{const}\), and the \(p_1^{\text{base}}\) factor re-emerges through \(\varphi_1\), giving a terminal cost \(g = \log(p_1^{\text{base}}/\pi) + \text{const}\) which matches the adjoint sampling objective.”
Fix: Replace “hence \(\widehat{\varphi}_1(x) \propto p_1^{\text{base}}(x)\)” with “hence \(\widehat{\varphi}_1 = \text{const}\). The terminal cost ?@eq-modified-terminal then becomes \(g(x) = -\log \pi(x) + \text{const}\), and through \(\varphi_1(x) = \pi(x)/(p_1^{\bbP}(x) \cdot \text{const})\), this recovers the adjoint sampling terminal cost \(\log(p_1^{\text{base}}(x)/\pi(x))\).”
4. Line 90: the CM regression target (eq-cm).
The equation says \(\nabla \log \widehat{\varphi}_1 = \argmin_h \E_{(X_0, X_1) \sim p^{u^\star}_{0,1}} [\|h(X_1) - \nabla_{X_1} \log \bbP(X_1 \mid X_0)\|^2]\). The expectation is over \(p^{u^\star}_{0,1}\), the joint endpoint distribution of the optimal process. But in practice (line 106), the CM step uses the current iterate \(u^{(k)}\). This distinction is made on line 106 (“\(u^\star \approx u^{(k)}\)”) but ?@eq-cm itself writes \(p^{u^\star}\). Consider writing \(p^{u}\) in the equation and noting the fixed-point condition explicitly.
5. Comparison table (lines 115-120): Adjoint Sampling corrector column says “None (\(\nabla \log p_1^{\text{base}}\))”.
Given the analysis in item 3, this is misleading. The corrector is absent (\(\widehat{\varphi}_1 = \text{const}\)), but what remains in the terminal cost is \(\nabla g = \nabla \log(p_1^{\text{base}}/\pi)\), which includes \(\nabla \log p_1^{\text{base}}\). So “\(\nabla \log p_1^{\text{base}}\)” is not a corrector; it is part of the terminal cost gradient. The parenthetical is confusing. Suggest: change to “None (\(\widehat{\varphi}_1 = \text{const}\))”.
Minor Issues
1. Broken cross-reference paths.
All ../shrodinger/shrodinger.qmd links should be ../shrodinger_bridge/shrodinger.qmd. The actual directory on disk is shrodinger_bridge/, not shrodinger/. This affects lines 20, 25, 43, 110. Systemic across all v2 notes.
Similarly, ../adjoint/adjoint.qmd (used in other v2 notes, not ASBS_v2) should be ../adjoint_method/adjoint.qmd.
2. Missing image for historical figure portrait (lines 13-17).
TODO comment still present. All gold-standard notes (HJB, doob, girsanov, shrodinger) have a portrait. Suggested: Erwin Schrodinger (the note already suggests this in the comment).
3. BibTeX entries in HTML comment (lines 127-141).
These need to be added to ref.bib. The note uses @liu2025adjoint and @de2021diffusion in the text, so citations will not render until the bib entries exist.
4. Line 65: “the lean adjoint is constant (since \(f_t = 0\), same as in [adjoint sampling]).”
Correct, but the parenthetical relies on the reader knowing that the lean adjoint ODE is \(\dot{\tilde{a}} = -(\nabla_x b)^\top \tilde{a}\) with terminal condition \(\nabla g(X_1)\), and that \(b = 0\) implies \(\tilde{a} = \text{const}\). A more self-contained phrasing: “the lean adjoint is constant along each trajectory: \(\tilde{a}(t) = \nabla g(X_1)\) for all \(t\) (since the base drift is zero, the lean adjoint ODE \(\dot{\tilde{a}} = 0\) is trivial).”
5. Line 97: “which is the Brownian bridge score, known in closed form.”
The term “Brownian bridge score” is slightly imprecise. It is the score of the base transition density \(\bbP(X_1 \mid X_0)\), which is Gaussian. The Brownian bridge score (the score of \(\bbP(X_t \mid X_0, X_1)\) with respect to \(X_t\)) is a different thing. Suggest: “the score of the base transition kernel” or “the Gaussian transition score.”
Actually, looking again at line 98, the note says “The CM regression target is just this Gaussian score, averaged over the posterior on \(X_0\).” This is correct. The issue is only the label “Brownian bridge score” on line 97. Minor but worth fixing for precision.
6. Line 67: double bars in the AM loss.
The notation \(\| u_t(X_t) + \sigma_t (\nabla E + \nabla \log \widehat{\varphi}_1)(X_1) \|^2\) is clear but dense. Consider breaking into two lines for readability, with the target on a separate line.
7. Line 43: long sentence with nested parenthetical.
“To express this in terms of the corrector \(\widehat{\varphi}_1\), use the SB marginal constraint at \(t = 1\): the SB density at time 1 is \(q_1(x) = p_1^{\bbP}(x) \widehat{\varphi}_1(x) \varphi_1(x) = \pi(x)\) (from the [SB notes]), so \(\varphi_1(x) = \pi(x) / (p_1^{\bbP}(x) \widehat{\varphi}_1(x))\).”
This is one sentence doing three things (invoking the marginal constraint, stating the density, solving for \(\varphi_1\)). Split into two sentences for readability.