Alexandre Thiéry - Jarzynski and Crooks

Consider a sequence of densities on \(\mathbb{R}^D\) indexed by time parameter \(t \in [0,T]\),

\[ \pi_t(x) \; = \; \frac{ e^{-U_t(x)}}{Z_t} \]

where \(U_t: \mathbb{R}^D \to \mathbb{R}\) is time-dependent potential function and \(Z_t\) is the normalizing constant. We are in fact really interested in studying the final density \(\pi_T\) and the bridging sequence of densities \(\pi_t\) is just a tool to get there, starting from an initial and tractable density \(\pi_0\). If one initializes a particle \(X_0 \sim \pi_0\) and evolves it according to the Langevin dynamics

\[ dX_t \; = \; -\nabla U_t(X_t) \, dt + \sqrt{2} \, dW_t \]

one can hope that the distribution of \(X_T\) will be close to \(\pi_T\). This would be the case if one evolved the particle according \(dX_t \; = \; -\gamma \nabla U_t(X_t) \, dt + \sqrt{2 \gamma} \, dW_t\) and let \(\gamma \to \infty\) since in that case \(X_t\) would be distributed according to \(\pi_t\) for all \(t\). Can one correct the distribution of \(X_T\) with importance sampling weights?

I like the approach presented in (Vargas et al. 2024) and these notes are my attempt to understand it. One very fruitful idea that has been used in a number of works in the Monte-Carlo literature is to look at a probability distribution of interest as the marginal of a joint distribution and to carry out computations and build numerical methods on the joint distribution (Del Moral, Doucet, and Jasra 2006). Indeed, there is a lot of flexibility in the choice of the joint distribution.

Here, we can also consider the diffusion process \(Y_T\) that runs backward in times and that is initialized according to \(\pi_T\) and follows the same Langevin dynamics. Again, one expects the distribution of \(Y_t\) to be close to \(\pi_t\). It is more intuitive to discuss discretized version of the process. For a time discretization \(\delta = T/N\), we have

\[ \left\{ \begin{aligned} x_{t + \delta} &= x_t - \nabla U_t(x_t) \, \delta + \sqrt{2 \delta} \, \xi_t\\ y_{t} &= y_{t + \delta} - \nabla U_{t + \delta}(y_{t + \delta}) \, \delta + \sqrt{2 \delta} \, \xi_t \end{aligned} \right. \]

where \(\xi_t \sim \mathcal{N}(0,I)\) are i.i.d. standard Gaussian random variables. Let us continue with these discretized versions and denote by \(\mathbb{P}^{X}\) and \(\mathbb{P}^{Y}\) the probability measures associated with the discretized processes. The crucial remark is that the marginal distribution of \(\mathbb{P}^Y\) at time \(T\) is our distribution of interest \(\pi_T\). For a discretized path \(\underline{z} = (z_0, z_{\delta}, \ldots, z_{T})\) we have:

\[ \begin{aligned} \mathbb{P}^X(\underline{z}) &= \pi_0(z_0) \, \exp {\left\{ -\frac{1}{4 \delta} \sum_{k=0}^{N-1} \|z_{t_{k+1}} - [z_{t_k} - \nabla U_{t_k}(z_{t_k})\,\delta]\|^2 \right\}} \\ \mathbb{P}^Y(\underline{z}) &= \pi_T(z_T) \, \exp {\left\{ -\frac{1}{4 \delta} \sum_{k=0}^{N-1} \|z_{t_{k}} - [z_{t_{k+1}} - \nabla U_{t_{k+1}}(z_{t_{k+1}})\,\delta]\|^2 \right\}} . \end{aligned} \]

One can compute the ratio \(\mathbb{P}^Y(z) / \mathbb{P}^X(z)\) and examine its limit as \(N \to \infty\). Algebra gives:

\[ \frac{d \mathbb{P}^Y}{d \mathbb{P}^X}(\underline{z}) = \frac{\pi_T(z_T)}{\pi_0(z_0)} \, \exp {\left\{ \sum_{k=0}^{N-1} \left< z_{t_{k+1}} - z_{t_k}, \frac{\nabla U_{t_k}(z_{t_k}) + \nabla U_{t_{k+1}}(z_{t_{k+1}})}{2} \right> + \mathcal{O}(\delta^2) \right\}} . \]

One could probably use some Stratonovich calculus to study this, but I always forget these things, so let’s use Ito instead. Write

\[\frac{\nabla U_{t_k}(z_{t_k}) + \nabla U_{t_{k+1}}(z_{t_{k+1}})}{2} \approx \nabla U_{t_k}(z_{t_k}) + \frac{1}{2} \mathrm{Hess}_{U_{t_{k+1}}} (z_{t_{k+1}}) (z_{t_{k+1}} - z_{t_k}) + \frac12 \, \partial_t U_{t_k}(z_{t_k}) \, \delta. \]

The term \(\partial_t U_{t_k}(z_{t_k}) \, \delta\) is too small to matter in the limit \(N \to \infty\) and Ito formula \(d U_t(z_t) = \partial_t U_t(z_t) \, dt + \left< \nabla U_t(z_t), dz_t \right> + \frac{1}{2} \left< dx, \mathrm{Hess}_{U_t}(z_t) \, dz_t \right>\) shows that in the limit \(N \to \infty\):

\[ \frac{d \mathbb{P}^Y}{d \mathbb{P}^X}(\underline{z}) = \frac{\pi_T(z_T)}{\pi_0(z_0)} \, \exp {\left\{ U_T(z_T) - U_0(z_0) - \int_0^T \partial_t U_t(z_t) \, dt \right\}} . \]

Since \(\pi_t(z_t) = \exp(-U_t(z_t)) / Z_t\), this gives the Crooks relation:

\[ \frac{d \mathbb{P}^Y}{d \mathbb{P}^X}(\underline{z}) = \frac{Z_0}{Z_T} \, \exp {\left\{ - \int_0^T \partial_t U_t(z_t) \, dt \right\}} . \]

Integrating over trajectories of \(X_t\), since \(\mathbb{E}_{X}[(d \mathbb{P}^Y / d \mathbb{P}^X)(X)] = 1\), one obtains the Jarzynski equality \[ \frac{Z_T}{Z_0} \; = \; \mathbb{E}_{X} {\left\{ \exp {\left\{ - \int_0^T \partial_t U_t(X_t) \, dt \right\}} \right\}} \]

which is indeed also central to sequential Monte-Carlo methods. As described in (Vargas et al. 2024), the same approach can be used to slightly generalize the Crooks relation. Indeed, suppose that one instead consider the dynamics:

\[ dX_t \; = \; -\nabla U_t(X_t) \, dt \textcolor{blue}{+ b_t(X_t) \, dt} + \sqrt{2} \, dW_t \]

where \(b: \mathbb{R}^D \to \mathbb{R}^D\) is a control function. One can consider the backward dynamics \(Y_t\) that is initialized according to \(\pi_T\) and follows the dynamics \(dY_t = -\nabla U_t(Y_t) \, dt \textcolor{red}{-} b(Y_t) \, dt + \sqrt{2} \, dW_t\) backward in time, i.e.

\[ \left\{ \begin{aligned} x_{t + \delta} &= x_t - \nabla U_t(x_t) \, \delta + b_t(x_t) \, \delta + \sqrt{2 \delta} \, \xi_t\\ y_{t} &= y_{t + \delta} - \nabla U_{t + \delta}(y_{t + \delta}) \, \delta \textcolor{red}{-} b_{t + \delta}(y_{t + \delta}) \, \delta + \sqrt{2 \delta} \, \xi_t. \end{aligned} \right. \]

The minus sign for the backward dynamics \(Y_t\) is natural since one knows that this exactly gives the backward dynamics of the forward process \(X_t\) in the case when \(X_t \sim \pi_t(dx)\) for all time \(0 \leq t \leq T\). One can then follow the exact same steps, using that the quadratic variation is \(\left< dz_t, dz_t \right> = 2 \, dt\), to obtain that

\[ \frac{d \mathbb{P}^Y}{d \mathbb{P}^X}(\underline{z}) = \frac{Z_0}{Z_T} \, \exp {\left\{ \int_0^T -\partial_t U_t(z_t) \textcolor{blue}{+ \nabla \cdot b_t(z_t) - \left< \nabla U_t(z_t), b_t(z_t) \right>} \, dt \right\}} . \]

This for example shows that, for \(dX_t \; = \; -\nabla U_t(X_t) \, dt \textcolor{blue}{+ b_t(X_t) \, dt} + \sqrt{2} \, dW_t\) initialized according to \(\pi_0\), we have:

\[ \frac{Z_T}{Z_0} \; = \; \mathbb{E}_{X} {\left\{ \exp {\left\{ \int_0^T -\partial_t U_t(X_t) \textcolor{blue}{+ \nabla \cdot b_t(X_t) - \left< \nabla U_t(X_t), b_t(X_t) \right>} \, dt \right\}} \right\}} . \]

This generalization of the Crooks relation is also explored in (Albergo and Vanden-Eijnden 2024) where an alternative derivation by directly exploiting the Fokker-Planck equation. Crucially, (Albergo and Vanden-Eijnden 2024) note that, if the control function \(b_t: \mathbb{R}^D \to \mathbb{R}^D\) is chosen so that

\[ -\partial_t U_t(x) + \nabla \cdot b_t(x) - \left< \nabla U_t(x), b_t(x) \right> = \frac{d}{dt} \, \log Z_t \tag{1}\]

then the term \(\int_0^T -\partial_t U_t(X_t) + \nabla \cdot b_t(X_t) - \left< \nabla U_t(X_t), b_t(X_t) \right> \, dt\) is indeed constant, which gives a zero-variance estimator of the free energy difference \(\log(Z_T/Z_0)\). Indeed, it is a formidable challenge to solve the high-dimensional PDE Equation 1 and (Albergo and Vanden-Eijnden 2024) propose interesting PINNs-based methods to do so.

Some References:

The original papers by Jarzynski (Jarzynski 1997) and Crooks (Crooks 1999).
The book (Stoltz, Rousset, et al. 2010) is excellent!
The two papers that prompted these notes: (Vargas et al. 2024) and (Albergo and Vanden-Eijnden 2024).

References

Albergo, Michael S, and Eric Vanden-Eijnden. 2024. “Nets: A Non-Equilibrium Transport Sampler.” arXiv Preprint arXiv:2410.02711.

Crooks, Gavin E. 1999. “Entropy Production Fluctuation Theorem and the Nonequilibrium Work Relation for Free Energy Differences.” Physical Review E 60 (3). APS: 2721.

Del Moral, Pierre, Arnaud Doucet, and Ajay Jasra. 2006. “Sequential Monte Carlo Samplers.” Journal of the Royal Statistical Society Series B: Statistical Methodology 68 (3). Oxford University Press: 411–36.

Jarzynski, Christopher. 1997. “Nonequilibrium Equality for Free Energy Differences.” Physical Review Letters 78 (14). APS: 2690.

Stoltz, Gabriel, Mathias Rousset, et al. 2010. Free Energy Computations: A Mathematical Perspective. World Scientific.

Vargas, Francisco, Shreyas Padhy, Denis Blessing, and Nikolas Nüsken. 2024. “Transport Meets Variational Inference: Controlled Monte Carlo Diffusions.” ICLR 2024.