Reversing a diffusion
Imagine a scalar diffusion process defined on the interval ,
where is the drift term, is the diffusion coefficient, and is a Wiener process. Denote the distribution of this process at time () as with an initial distribution of . Now, what happens if we reverse time and examine the process backward? In other words, consider the time-reversed process defined as
Intuitively, the process is also a diffusion on the interval , but with an initial distribution . To gain intuition, consider an Euler discretization of the forward process:
where represents a noise term independent from , and is a time increment. Re-arranging terms and making the approximation gives that
This seems to suggest that the time-reversed process follows the dynamics started from . However, this conclusion is incorrect because this would suggest that the time-reversed of a standard Brownian motion (where ) starting at zero is also a Brownian motion starting at , which is clearly not the case. The flaw in this argument lies in assuming that the noise term is independent of , which is not true, rendering the Euler discretization argument invalid.
Deriving the dynamics of the backward process in a rigorous manner is not straightforward (Anderson 1982) (Haussmann and Pardoux 1986). What follows is a heuristic derivation that proceeds by estimating the mean and variance of given , assuming . Here, is treated as a fixed and constant value, and we are only interested in the conditional distribution of given . Bayes’ law gives
where the exponential term corresponds to the transition of the forward diffusion for . Using the 1st order approximation
eliminating multiplicative constants and higher-order error terms, we obtain:
For , this is transition of the reverse diffusion
The notation is used to emphasize that this Brownian motion is distinct from the one used in the forward diffusion. The additional drift term, denoted as , is intuitive: it pushes the reverse diffusion toward regions where the forward diffusion spent a significant amount of time, i.e., where is large. The popular “denoising diffusion models” (Ho, Jain, and Abbeel 2020) can be seen as discretizations of this backward process, employing various techniques to estimate the additional drift term from data.
References
Anderson, Brian DO. 1982. “Reverse-Time Diffusion Equation Models.” Stochastic Processes and Their Applications 12 (3): 313–26.
Efron, Bradley. 2011. “Tweedie’s Formula and Selection Bias.” Journal of the American Statistical Association 106 (496): 1602–14.
Haussmann, Ulrich G, and Etienne Pardoux. 1986. “Time Reversal of Diffusions.” The Annals of Probability, 1188–1205.
Ho, Jonathan, Ajay Jain, and Pieter Abbeel. 2020. “Denoising Diffusion Probabilistic Models.” Advances in Neural Information Processing Systems 33: 6840–51.
Vincent, Pascal. 2011. “A Connection Between Score Matching and Denoising Autoencoders.” Neural Computation 23 (7): 1661–74.