Second-Pass Critique: BMS_v2.qmd

Verdict

Major improvement over v1. Nelson’s relation is now derived from scratch via Euler discretization and Bayes’ rule, matching the style of reverse_and_tweedie.qmd. The TSI gets a proper details block with integration by parts. The note computes things rather than announcing them. The three coupling substitutions are shown explicitly. The opening is mathematical, not expository.

Remaining issues are mostly second-order: a sign convention ambiguity in the backward drift definition, a few notation mismatches with the other v2 notes, one broken cross-reference path, and some passages where the derivation skips a step that would help a careful reader. No major structural rewrites needed. This is close to publishable.

Sign Convention Issues

1. Backward drift sign convention: implicit, never stated

Line 50 says “if \(v(y,t)\) denotes the backward drift (the drift of the time-reversed process)” but never writes the backward SDE. In reverse_and_tweedie.qmd, the time-reversed process has drift \(-\mu(\overleftarrow{X}_t) + \sigma^2 \nabla \log p_{T-t}(\overleftarrow{X}_t)\). The question is: does \(v\) include the sign flip from time reversal, or is it defined so that the backward SDE reads \(dY_t = -\sigma_t v(Y_t, t) dt + \sigma_t d\overleftarrow{B}_t\)?

Lines 52-53 write \(-\sigma_t v(y,t) = -\sigma_t u(y,t) + \sigma_t^2 \nabla \log p_t(y)\), which gives \(v = u - \sigma_t \nabla \log p_t\). Then Nelson’s relation is \(u + v = \sigma_t \nabla \log p_t\) (line 59), which requires \(v = \sigma_t \nabla \log p_t - u\). These are contradictory unless I’m misreading the intermediate step.

Let me re-derive. The backward conditional mean at line 47 is: \[\E[X_t \mid X_{t+\delta} = y] = y - \sigma_t u(y,t) \delta + \sigma_t^2 \nabla \log p_t(y) \delta.\] The backward process goes from \(t+\delta\) to \(t\), so the “backward drift” should be read from \(Y_s = X_{T-s}\). If we define \(v\) so the backward Euler step is \(X_t = X_{t+\delta} - \sigma_t v(X_{t+\delta}, t) \delta + \text{noise}\), then \(v(y,t) = u(y,t) - \sigma_t \nabla \log p_t(y)\).

But then \(u + v = 2u - \sigma_t \nabla \log p_t\), which is NOT Nelson’s relation.

The issue is in the sign convention at line 53. The backward SDE in the BMS paper (eq. 2) uses \(dY_t = -\sigma(t) v(Y_t, t) dt + \sigma(t) d\overleftarrow{B}_t\), i.e., the backward drift carries a minus sign in the SDE. If the backward conditional mean is \(\E[X_t \mid X_{t+\delta}=y] = y - \sigma_t v \delta\), then from line 47 we identify \(\sigma_t v = \sigma_t u - \sigma_t^2 \nabla \log p_t\), giving \(v = u - \sigma_t \nabla \log p_t\), hence \(u + v = 2u - \sigma_t \nabla \log p_t\). Still wrong.

Resolution: The derivation at lines 47-53 has the backward conditional mean correct, but the extraction of \(v\) needs the backward SDE to read \(dY = +\sigma_t v dt + \sigma_t d\overleftarrow{B}\) (no minus sign), where \(v\) is the “backward drift” defined with the opposite sign convention from the paper. The note never commits to a backward SDE, which is the source of ambiguity.

Recommendation: Write the backward SDE explicitly after ?@eq-euler-fwd. State: “The time-reversed process satisfies \(d\overleftarrow{X}_s = \sigma_{T-s} v(\overleftarrow{X}_s, T-s) ds + \sigma_{T-s} d\overleftarrow{B}_s\), where \(v\) is the backward drift.” Then the backward conditional mean gives \(v(y,t) = -u(y,t) + \sigma_t \nabla \log p_t(y)\), hence \(u + v = \sigma_t \nabla \log p_t\). The sign at line 53 should be \(+\sigma_t v(y,t) = -\sigma_t u(y,t) + \sigma_t^2 \nabla \log p_t(y)\) (positive on the LHS, not negative), or equivalently the backward conditional mean gives a drift of \(+v\), not \(-v\). Currently line 53 has a minus on the LHS which forces \(v = u - \sigma_t \nabla \log p_t\) and breaks Nelson.

This is a real sign error. Line 53 should read: \[+\sigma_t v(y,t) = -\sigma_t u(y,t) + \sigma_t^2 \nabla \log p_t(y)\] or equivalently, the backward Euler reads \(X_t = X_{t+\delta} + \sigma_t v \delta + \text{noise}\) (going backwards, the drift is \(+v\), the sign flip is already absorbed into the definition of backward time). Compare reverse_and_tweedie.qmd line 57: the reversed SDE is \(d\overleftarrow{X} = [-\mu + \sigma^2 \nabla \log p] dt + \sigma dB\), so the backward drift is \(-\mu + \sigma^2 \nabla \log p\), which is \(+v\) with \(v = -u + \sigma \nabla \log p\).

Actually, wait. Let me re-read line 50-53 more carefully.

Line 50: “So if \(v(y,t)\) denotes the backward drift…” Line 53: “\(-\sigma_t v(y,t) = -\sigma_t u(y,t) + \sigma_t^2 \nabla \log p_t(y)\)”

The note is reading off \(v\) from the backward conditional mean \(\E[X_t \mid X_{t+\delta}=y] = y \underbrace{- \sigma_t u \delta + \sigma_t^2 \nabla \log p_t \delta}_{\text{this is } -\sigma_t v \delta}\). So the convention is that the backward Euler step is \(X_t \approx X_{t+\delta} - \sigma_t v(X_{t+\delta}, t) \delta + \text{noise}\), matching the backward SDE \(dY = -\sigma v dt + \sigma d\overleftarrow{B}\) (the paper’s convention).

Then \(-\sigma_t v = -\sigma_t u + \sigma_t^2 \nabla \log p_t\) gives \(v = u - \sigma_t \nabla \log p_t\), hence \(u + v = 2u - \sigma_t \nabla \log p_t\). This does NOT give Nelson’s relation \(u + v = \sigma_t \nabla \log p_t\).

The sign error is confirmed. To get Nelson’s relation, the backward SDE should carry a \(+\) sign: \(dY = +\sigma v dt + \sigma d\overleftarrow{B}\), which means the backward conditional mean is \(\E[X_t \mid X_{t+\delta}] = y + \sigma_t v \delta + \text{noise}\). Then \(\sigma_t v = -\sigma_t u + \sigma_t^2 \nabla \log p_t\), giving \(v = -u + \sigma_t \nabla \log p_t\), hence \(u + v = \sigma_t \nabla \log p_t\). Correct.

Alternatively, if you want the backward SDE to be \(dY = -\sigma v dt + \sigma d\overleftarrow{B}\) (with the minus sign, matching the paper), then Nelson’s relation should read \(u - v = \sigma_t \nabla \log p_t\) (not \(u + v\)). The paper uses \(u + v = \sigma \nabla \log p_t\) with the convention that the backward SDE is \(dY = -\sigma v dt + \sigma d\overleftarrow{B}\) only if \(v\) is the “reverse drift” defined as the drift with the SAME sign convention as the reversed SDE in reverse_and_tweedie.qmd (i.e., \(v = -u + \sigma \nabla \log p\), so the backward SDE is \(dY = \sigma(-v) dt\) which is \(dY = \sigma(u - \sigma \nabla \log p) dt\)… this gets circular).

Bottom line: The derivation at lines 47-59 has a sign issue. The extraction of Nelson’s relation from the backward conditional mean is not self-consistent with the definition of \(v\) and the minus sign at line 53. This needs to be fixed by either: (a) Defining \(v\) so that the backward Euler is \(X_t = X_{t+\delta} + \sigma_t v \delta\) (no minus), then \(v = -u + \sigma_t \nabla \log p_t\) and \(u + v = \sigma_t \nabla \log p_t\). Or: (b) Keeping \(v\) with the paper’s minus-sign convention, but then Nelson’s relation is \(u - v = -\sigma_t \nabla \log p_t\), i.e., the paper’s formula requires \(v\) to be defined with the sign already absorbed.

This must be clarified. The paper’s eq. (4) defines Nelson as \(u^* + v^* = \sigma \nabla \log \Pi^*_t\), with the backward SDE being \(dY = -\sigma v dt + \sigma d\overleftarrow{B}\). That means their \(v\) has the property that \(v = -u + \sigma \nabla \log p_t\) (the backward drift without the minus sign from the SDE). So the extraction at line 53 should give \(v = -u + \sigma_t \nabla \log p_t\), not \(v = u - \sigma_t \nabla \log p_t\). The sign on the LHS of line 53 is wrong.

2. The \(v^*\) in ?@eq-backward-drift

Line 162: \(v^\star(x,t) = \E_{\Pi^\star_{0|t}}[\sigma_t \nabla_{X_t} \log \bbP_{t|0}(X_t \mid X_0) \mid X_t = x]\). Since \(\bbP_{t|0}\) is Gaussian \(\normal(X_0, \kappa(t) I)\), the gradient \(\nabla_{X_t} \log \bbP_{t|0} = -(X_t - X_0)/\kappa(t)\). So \(v^\star\) is \(-\sigma_t(X_t - X_0)/\kappa(t)\) averaged over \(X_0 \mid X_t\). This is negative (pointing from \(X_t\) back toward \(X_0\)), which makes sense for a backward drift.

But then Nelson gives \(u^\star = \sigma_t \nabla \log \Pi^\star_t - v^\star\). Since \(v^\star < 0\) (pointing backward), \(u^\star = \sigma_t \nabla \log p_t + |v^\star|\), which means the forward drift is the score plus a positive correction. This is consistent if the sign conventions are right.

However, line 165 says “Using Nelson’s relation \(u^\star = \sigma_t \nabla \log \Pi^\star_t - v^\star\) from ?@eq-nelson”. This is \(u = \sigma \nabla \log p - v\), which rearranges to \(u + v = \sigma \nabla \log p\). Correct if \(v\) has the right sign from the derivation.

The issue is still with lines 52-53. Everything downstream from Nelson’s relation (line 59 onward) is self-consistent, assuming ?@eq-nelson is correct. The bug is in the derivation of ?@eq-nelson from the backward conditional mean. Fix lines 52-53 and the rest follows.

3. Forward SDE convention

The SDE at line 22 is \(dX = \sigma_t u dt + \sigma_t dW\), matching adjoint_sampling_v2.qmd (line 25) and ASBS_v2.qmd implicitly. Good. But adjoint_matching_v2.qmd uses \(dX = [b + \sigma u] dt + \sigma dW\) (line 36-37), which has a base drift \(b\). The BMS note has \(b = 0\) throughout, which is fine since it focuses on the sampling problem. Consistent within the 4-note series.

Nelson’s Relation Derivation Quality

What works

The Euler discretization at ?@eq-euler-fwd is clean and mirrors reverse_and_tweedie.qmd.
The Bayes’ rule step at line 41 correctly conditions on \(X_{t+\delta} = y\).
The Taylor expansion of \(p_t(x)\) around \(y\) is the right move (line 44).
The “completing the square” step is implied but clear enough.
The resulting conditional mean at line 47 matches the standard formula.
The one-line intuition at line 62 (“Forward drift pushes mass toward the target…”) is good.

What needs fixing

The sign error at line 53 (see above). This is the critical issue.
The backward SDE is never written. The note says “if \(v(y,t)\) denotes the backward drift” but the reader does not know whether \(v\) appears with a plus or minus sign in the backward SDE. Write it explicitly. Compare reverse_and_tweedie.qmd ?@eq-rev-diffusion which writes the full backward SDE.
The completing-the-square step is skipped. Lines 44-47 go from “expanding \(p_t(x)\) and completing the square” to the conditional mean. A reader who wants to verify needs to see the one-line Gaussian integral. Put this in a details block if space is a concern, but the v1 critique specifically asked for followable derivations.
Scope of validity. Line 62 says “This holds for any Markov diffusion of the form ?@eq-controlled-sde with marginals \(p_t\).” True, but worth noting that the derivation assumed a scalar \(\sigma_t\) (not state-dependent diffusion). The existing notes (girsanov.qmd, doob.qmd) use state-dependent \(\sigma(X_t)\). The restriction to scalar schedule should be stated as an assumption.

Notation Inconsistencies (cross-note)

1. Path measure notation: \(\bbP^u\) vs \(\bbP^{u_i}\)

BMS_v2 uses \(\bbP^{u_i}\) (line 92, 213, 239-241). adjoint_sampling_v2 uses \(\bbP^u\) and \(\bbP^{\bar{u}}\) with \(\bar{u} = \texttt{stopgrad}(u)\) (line 62-63). ASBS_v2 uses \(\bbP^{\bar{u}}\) in its AM loss (line 67). BMS_v2 never uses stopgrad notation since it is describing the fixed-point iteration abstractly, which is fine. Consistent enough.

2. Coupling notation

BMS_v2 line 93: \(\Pi^i_{0,T}\) for the coupling at iteration \(i\). Uses \(\Pi^\star_{0,T}\) for the target coupling.
ASBS_v2: uses \(\Pi^\star_T\) for the target marginal (line 28), \(p^{u^\star}_{0,1}\) for the endpoint joint.
adjoint_sampling_v2: uses \(p^{\bar{u}}_1\) for terminal marginal, \(p^\text{base}_{t \mid 1}\) for bridge.
BMS_v2: uses \(\bbP_{t|0,T}\) for bridge kernel, \(\Pi^\star_{0,T|t}\) for conditional coupling.

The \(\Pi^\star_{0,T|t}\) notation at line 137 is used without definition. It means “the conditional distribution of \((X_0, X_T)\) given \(X_t = x\) under \(\Pi^\star\).” Define it on first use.

3. Bridge notation

BMS_v2 ?@eq-bridge-marginal uses \(\kappa(t) = \int_0^t \sigma_s^2 ds\) and \(\gamma(t) = \kappa(t)/\kappa(T)\). adjoint_sampling_v2 uses \(\nu_t = \int_0^t \sigma_s^2 ds\) (line 102). These are the same quantity with different names. This is an inconsistency across the 4 notes. Either unify the notation (use \(\kappa\) everywhere or \(\nu\) everywhere) or at least note the equivalence.

4. Time horizon

BMS_v2 uses \([0,T]\) (line 19). adjoint_sampling_v2 uses \([0,1]\) (line 25). adjoint_matching_v2 uses \([0,1]\) (line 24). ASBS_v2 uses \([0,1]\) implicitly.

BMS using \([0,T]\) is fine (the paper uses general \(T\)), but it creates a cosmetic mismatch. Not critical.

5. \(\hat\varphi\) notation

Line 186 uses \(\hat\varphi_0, \hat\varphi_T, \varphi_T\) for SB potentials. ASBS_v2 uses \(\widehat{\varphi}_0, \widehat{\varphi}_1, \varphi_1\) (lines 27-32). The hat notation matches. The subscript (\(T\) vs \(1\)) reflects the time horizon difference. Consistent modulo time horizon.

Pedagogy Gaps

1. The Markovian projection derivation (lines 75-85) is good

The \(L^2\) decomposition via the tower property is shown in three lines. The cross-term argument is clear. This is a significant improvement over v1, which appealed to Gyongy (1986) by name.

2. The TSI derivation needs a small fix

The details block at lines 125-148 is a genuine computation. The step from \(\nabla_x \bbP_{t|0,T} = -\frac{1}{1-\gamma} \nabla_{x_0} \bbP_{t|0,T}\) (line 128) is not obvious. This is because both \(x\) and \(x_0\) enter the Gaussian mean as \((1-\gamma)x_0 + \gamma x_T\), so shifting \(x\) by \(\delta\) is equivalent to shifting \(x_0\) by \(-\delta/(1-\gamma)\). State this explicitly: “Since the Gaussian mean is linear in \(x_0\) with coefficient \((1-\gamma)\), shifting \(x\) by \(\epsilon\) is equivalent to shifting \(x_0\) by \(-\epsilon/(1-\gamma)\).”

3. The jump from ?@eq-tsi to ?@eq-xi-general (lines 157-171) is the weakest link

Line 159 says the backward drift has the form ?@eq-backward-drift, citing “the Doob h-transform and the bridge structure.” But ?@eq-backward-drift appears from thin air. The reader needs to know: for a reciprocal process \(\Pi^\star = \Pi^\star_{0,T} \bbP_{|0,T}\), the backward drift at time \(t\) conditioned on the past is the conditional expectation of the bridge drift \(\sigma_t \nabla_{X_t} \log \bbP_{t|0}(X_t \mid X_0)\) given \(X_t\), averaged over \(X_0\). This is Proposition 3 in the paper. Show this in 2-3 lines or at least explain the logic: “The bridge process from \(X_0\) to \(X_T\) has drift \(\sigma_t \nabla_{X_t} \log \bbP_{T|t}(X_T \mid X_t)\). Its time-reversed version, conditioned on \(X_0\), has drift \(\sigma_t \nabla_{X_t} \log \bbP_{t|0}(X_t \mid X_0)\). The Markovian backward drift \(v^\star\) is the conditional expectation of this over the posterior on \(X_0\).”

Currently the note just writes ?@eq-backward-drift and expects the reader to accept it. This is the one place where “stated, not derived” still applies.

4. The three coupling substitutions (lines 176-206) are clear

Each coupling is stated, the coupling scores are identified, and the resulting \(\xi\) is shown. The adjoint sampling case (lines 178-184) could note that \(c(t) = \gamma(t)\) is chosen to kill the \(X_0\) term (since \(\nabla_{X_0} \log \delta_{x_0}\) is undefined). Currently this is stated without explaining the choice of \(c\).

5. “Why independent coupling works” (lines 209-216) is good but incomplete

Line 211 says “Any coupling satisfying these boundary constraints yields a valid fixed point.” This is the key claim and it is not justified. The argument is: if the Markovian projection \(u^\star\) of \(\Pi^\star = (p_0 \otimes \pi) \bbP_{|0,T}\) has marginals matching \(\Pi^\star_t\), then running the SDE with \(u^\star\) gives \(X_0 \sim p_0\) and \(X_T \sim \pi\) (since \(\Pi^\star_0 = p_0\) and \(\Pi^\star_T = \pi\)). The coupling is then \(\bbP^{u^\star}_0 \otimes \bbP^{u^\star}_T = p_0 \otimes \pi = \Pi^\star_{0,T}\), so \(u^\star\) is a fixed point. Spell this out.

Remaining Style Issues

1. Line 26-27: “The central tool is Nelson’s relation, which we derive next.”

This is announcing. The existing notes do not announce what they will derive. Just start deriving. Delete this sentence or fold into the transition: “…removes the need for alternation, using Nelson’s relation as the central tool.”

2. Line 62: “This holds for any Markov diffusion of the form ?@eq-controlled-sde with marginals \(p_t\).”

Slightly passive. Rephrase: “Nelson’s relation holds whenever the controlled SDE ?@eq-controlled-sde has marginals \(p_t\).”

3. Line 69: “A reciprocal measure is generally non-Markovian: the bridge drift depends on \(X_T\).”

Good. Direct.

4. Line 211: “The Schrodinger bridge coupling minimizes the path-space KL to the reference, so it is optimal in that sense.”

“In that sense” is hedging. Say: “The Schrodinger bridge coupling minimizes path-space KL to the reference. The independent coupling sacrifices this optimality for a fully tractable regression target.”

5. Line 243: “Forward KL is mode-covering (it penalizes placing zero mass where \(\Pi^\star\) has mass), which partly explains the good mode diversity in practice.”

“Partly explains” is hedging. Either explain the mechanism or state it as a fact: “Forward KL is mode-covering: it penalizes placing zero mass where \(\Pi^\star\) has mass. This drives mode diversity.”

6. No experimental results mentioned

Good. The v1 had a line about \(d=2500\) sampling which was correctly flagged. v2 has none. Clean.

7. No “by a classical result” or “it can be shown”

Good. Every formula is either derived or referenced to another note.

8. Colored text usage

\(\blue{\cdot}\) is used at line 59 for Nelson’s relation (good, it is the key result) and at line 230, 232 for the damping regularizer (good, it highlights the new term). Consistent with existing notes.

Mathematical Errors or Concerns

1. Sign error in Nelson derivation (critical, detailed above)

Line 53: the minus sign on the LHS should be positive, or the convention for \(v\) needs to change. As written, the derivation gives \(v = u - \sigma_t \nabla \log p_t\), but ?@eq-nelson requires \(v = \sigma_t \nabla \log p_t - u\). Fix the sign or redefine \(v\).

2. ?@eq-backward-drift: check the sign

Line 162 has \(v^\star = \E[\sigma_t \nabla_{X_t} \log \bbP_{t|0}(X_t \mid X_0) \mid X_t = x]\). Since \(\bbP_{t|0}\) is \(\normal(X_0, \kappa(t) I)\), the score is \(\nabla_{X_t} \log \bbP_{t|0} = -(X_t - X_0)/\kappa(t)\). So \(v^\star(x,t) = -\sigma_t \E[(X_t - X_0)/\kappa(t) \mid X_t = x]\). This is negative (pointing backward from \(X_t\) toward \(X_0\)), consistent with \(v\) being the backward drift. Correct, assuming the sign convention from ?@eq-nelson is fixed.

3. ?@eq-xi-general: verify

Line 168: \(\sigma_t^{-1} \xi = \frac{1-c}{1-\gamma} \nabla_{X_0} \log \Pi^\star_{0,T} + \frac{c}{\gamma} \nabla_{X_T} \log \Pi^\star_{0,T} - \nabla_{X_t} \log \bbP_{t|0}(X_t \mid X_0)\).

This should follow from \(u^\star = \sigma_t \nabla \log \Pi^\star_t - v^\star\) (Nelson), substituting the TSI for \(\nabla \log \Pi^\star_t\) and ?@eq-backward-drift for \(v^\star\). The TSI gives \(\sigma_t \nabla \log \Pi^\star_t(x) = \sigma_t \E[\frac{1-c}{1-\gamma} \nabla_{X_0} \log \Pi^\star_{0,T} + \frac{c}{\gamma} \nabla_{X_T} \log \Pi^\star_{0,T} \mid X_t = x]\). Subtracting \(v^\star = \E[\sigma_t \nabla_{X_t} \log \bbP_{t|0} \mid X_t = x]\) gives the conditional expectation of the RHS of ?@eq-xi-general. Since \(u^\star = \E[\xi \mid X_t]\), the non-Markovian drift \(\xi\) is the integrand before taking the conditional expectation. Correct.

4. ?@eq-xi-as: verify the AS specialization

\(\Pi^\star_{0,T} = \delta_{x_0} \otimes \pi\). Then \(\nabla_{X_0} \log \Pi^\star_{0,T}\) is ill-defined (Dirac). Taking \(c(t) = \gamma(t)\) kills the \(X_0\) term: \(\frac{1-c}{1-\gamma} = \frac{1-\gamma}{1-\gamma} = 1\) … wait, that does NOT kill the \(X_0\) term. To kill it, we need \(1 - c(t) = 0\), i.e., \(c(t) = 1\). But line 178 says “Taking \(c(t) = \gamma(t)\).”

Let me check: with \(c = \gamma\), the coefficient of \(\nabla_{X_0}\) is \(\frac{1-\gamma}{1-\gamma} = 1\) and the coefficient of \(\nabla_{X_T}\) is \(\frac{\gamma}{\gamma} = 1\). So both terms survive. That does NOT remove the Dirac issue.

Actually, for the Dirac prior \(\delta_{x_0}\), there is no \(X_0\) randomness at all: \(X_0 = x_0\) deterministically. The coupling is \(\delta_{x_0}(dx_0) \otimes \pi(dx_T)\). Then \(\nabla_{X_0} \log \Pi^\star_{0,T}\) does not appear in the conditional expectation because \(X_0 = x_0\) is fixed. The \(\nabla_{X_0}\) term is simply absent, not killed by \(c\).

But that is not what the note says. Line 178 says “Then \(\nabla_{X_0} \log \Pi^\star_{0,T}\) is undefined (Dirac), and \(\nabla_{X_T} \log \Pi^\star_{0,T} = \nabla \log \pi(X_T)\). Taking \(c(t) = \gamma(t)\)…” This suggests that \(c = \gamma\) is chosen to handle the Dirac issue, but the real reason the \(X_0\) term disappears is that \(X_0\) is deterministic.

The remaining terms after dropping the \(X_0\) gradient are \(\frac{c}{\gamma} \nabla \log \pi(X_T) - \nabla_{X_t} \log \bbP_{t|0}(X_t \mid x_0)\). With \(c = \gamma\) and \(x_0 = 0\): \(\nabla \log \pi(X_T) - \nabla_{X_t} \log \bbP_{t|0}(X_t \mid 0) = \nabla \log \pi(X_T) + X_t/\kappa(t)\).

But ?@eq-xi-as says \(\sigma_t^{-1} \xi = \nabla_{X_T} \log[\pi(X_T)/\bbP_T(X_T)]\). These should be equal, and matching them requires \(X_t/\kappa(t) = -\nabla \log \bbP_T(X_T)\)… which is not obviously true. The note skips the algebra that reduces the general formula to the AS formula. This substitution needs to be shown or at least sketched.

5. ?@eq-xi-bms: correct

With \(\Pi^\star_{0,T} = p_0 \otimes \pi\): \(\nabla_{X_0} \log(p_0 \otimes \pi) = \nabla \log p_0(X_0)\), \(\nabla_{X_T} \log(p_0 \otimes \pi) = \nabla \log \pi(X_T)\). Substituting into ?@eq-xi-general: \(\sigma_t^{-1} \xi = \frac{1-c}{1-\gamma} \nabla \log p_0(X_0) + \frac{c}{\gamma} \nabla \log \pi(X_T) - (X_t - X_0)/\kappa(t)\). Matches ?@eq-xi-bms. Correct.

6. Damped iteration variational characterization

?@eq-damped-var: Expanding the first-order condition of the two-term objective with weight 1 and \(\eta\): \(\E_{\Pi^i}[\xi - u \mid X_t] + \eta(u_i - u) = 0\), giving \(u = \frac{1}{1+\eta} \E[\xi \mid X_t] + \frac{\eta}{1+\eta} u_i = \alpha \Phi(u_i) + (1-\alpha) u_i\) with \(\alpha = 1/(1+\eta)\), i.e., \(\eta = (1-\alpha)/\alpha\). Matches ?@eq-damped. Correct.

Minor Issues

Line 13-17: TODO for historical portrait. Edward Nelson (1932-2014) is a good choice for this note. The TODO should be resolved before publication.
Cross-reference path: line 159. [Doob h-transform](../doob_transforms/doob.qmd) – but the file is at doob.qmd in the same directory, not ../doob_transforms/. The correct path should be doob.qmd or whatever the actual deployed path will be. Currently ../doob_transforms/doob.qmd will break. Same issue at line 67 for ../shrodinger/shrodinger.qmd and line 38 for ../reverse_and_tweedie/reverse_and_tweedie.qmd. These paths assume the deployed website structure, not the draft directory. Verify that these resolve in the final build.
Line 67: \(\bbP_{|0,T}\) is the reference bridge, but this notation is never defined. Line 67 says “where \(\Pi_{0,T}\) is an endpoint coupling and \(\bbP_{|0,T}\) is the reference bridge.” The subscript notation \(|0,T\) meaning “conditioned on endpoints” should be stated explicitly: “\(\bbP_{|0,T}(\cdot \mid x_0, x_T)\) denotes the law of the reference process conditioned on \(X_0 = x_0, X_T = x_T\).”
Line 101: \(\kappa(t) = \int_0^t \sigma_s^2 ds\) is defined here but used earlier implicitly (the variance in the Euler step is \(\sigma_t^2 \delta\)). No issue, just note that \(\kappa\) is the cumulative variance.
Line 150: \(c(t) \in (0,1]\) is arbitrary. The note never explains what good choices of \(c\) are. The paper discusses \(c(t) = \gamma(t)\) for simplicity in the AS case. Mention that the choice \(c = 1\) gives a formula involving only \(\nabla_{X_T}\), while \(c = 0\) (excluded) would involve only \(\nabla_{X_0}\), and intermediate values trade off between the two.
Line 181: ?@eq-xi-as has \(\bbP_T(X_T)\) but \(\bbP_T\) is the terminal marginal of the uncontrolled reference (scaled BM). For \(X_0 = 0\), this is \(\normal(0, \kappa(T) I)\). Not defined in the BMS note. Add a parenthetical.
Line 186: ASBS coupling is written as \(\hat\varphi_0(x_0) \bbP_{T|0}(x_T \mid x_0) \varphi_T(x_T)\). This matches ASBS_v2 ?@eq-sb-rn (the SB Radon-Nikodym). Fine.
Line 239: Summary table. The AS row has coupling \(\delta_{x_0} \otimes \bbP^{u_i}_T\). This should be \(\delta_{x_0} \otimes p^{u_i}_T\) (lowercase \(p\) for the marginal density, or just state it is the terminal marginal of the \(i\)-th iterate). Using \(\bbP^{u_i}_T\) is ambiguous: is it the path measure at time \(T\) (which is a distribution on \(\bbR^d\), not on paths)? Notation could be cleaner.
Line 243: forward KL claim. “The matching loss ?@eq-matching-loss is a forward KL objective: \(u^\star = \argmin_u \kl(\Pi^\star \mid \bbP^u)\).” This is stated without derivation. It follows from the Markovian projection being the forward KL minimizer (this is in the BMS paper, Lemma 1). Either derive it in 2 lines (using the chain rule of KL) or cross-reference where this is proven.
Inconsistency in \(\kappa\) vs \(\nu\) with adjoint_sampling_v2.qmd (mentioned above). The cumulative variance is \(\nu_t\) in adjoint_sampling_v2 and \(\kappa(t)\) in BMS_v2. Pick one and use it across all notes.