Averaging
Consider a pair of (coupled) Markov processes \(X_t \in \mathcal{X}\) and \(Y_t \in \mathcal{Y}\) with dynamics that can informally be described as
\[ \left\{ \begin{align} dX^{\varepsilon}/dt &= F(X^{\varepsilon},Y^{\varepsilon}, W^X)\\ dY^{\varepsilon}/dt &= \textcolor{red}{\varepsilon^{-1}} \, G(X^{\varepsilon},Y^{\varepsilon}, W^Y)\\ \end{align} \right. \]
for two independent “noise” terms \(W^X\) and \(W^Y\) and a time-scale parameter \(\varepsilon\ll 1\). We assume that \(X\) is a slow component that moves by \(\mathcal{O}(\delta)\) in on the time interval \([t, t+\delta]\). The scaling \(\varepsilon^{-1}\) in the dynamics of fast process \(Y^{\varepsilon}\) indicates that we expect the process \(Y\) to evolve on a time scale of order \(\mathcal{O}(\varepsilon)\). We are interested in the limit \(\varepsilon\to 0\) and hope to “average out” the fast process \(Y^{\varepsilon}\) and be able to describe the slow (and interesting) process \(X^{\varepsilon}\) without referring to the fast process. Informally, we would like to describe the process \(X^{\varepsilon}\), in the limit \(\varepsilon\to 0\), as following an effective Markovian dynamics
\[ dX/dt = \overline{F}(X, W^X). \]
For describing the averaging phenomenon, we typically assume some ergodicity conditions on the fast process \(Y\). Here, we assume that for each fixed \(x \in \mathcal{X}\), the fast process process \(Y^{[x]}\) with fixed slow-component \(x \in \mathcal{X}\), i.e.
\[ dY^{[x]}/dt = G(x,Y^{[x]}, W^Y) \]
is ergodic with respect to some probability distribution \(\rho_x(dy)\). Although the averaging phenomenon is quite general, it is somewhat easier to illustrate it for diffusion processes. In this case, let us assume that the slow process is given by
\[ dX^{\varepsilon} = \mu(X^{\varepsilon}, Y^{\varepsilon}) \, dt + \sigma(X^{\varepsilon}, Y^{\varepsilon}) \, dW^x. \]
For \(X^{\varepsilon}_{t} = x\) and for a time increment \(\delta \ll 1\), since the process \(X^{\varepsilon}\) can be considered constant we have
\[ \begin{align} X^{\varepsilon}_{t+\delta} - x &\approx \; {\left( \frac{ \int_{t}^{t+\delta} \mu(x, Y^{\varepsilon}) \, dt}{\delta} \right)} \, \delta + \ {\left( \frac{ \int_{t}^{t+\delta} \sigma^2(x, Y^{\varepsilon}) }{\delta} \right)} ^{1/2} \, \mathcal{N}(0, \delta). \end{align} \]
This can be regarded as a time-discretization of the averaged process
\[ dX \, = \; \overline{\mu}(X) \, dt + \overline{\sigma}(X) \, dW \]
for averaged drift and volatility functions give by
\[ \left\{ \begin{align} \overline{\mu}(x) &= \int \mu(x,y) \, \rho_x(dy) \\ \overline{\sigma}^2(x) &= \int \sigma^2(x,y) \, \rho_x(dy). \end{align} \right. \tag{1}\]
One standard approach for proving this type of results is to write the Kolmogorov equations
\[\frac{d}{dt}\varphi^{\varepsilon}(x,y,t) = \mathcal{L}^{\varepsilon} \varphi^{\varepsilon}(x,y,t)\] for \(\varphi^{\varepsilon}(x,y,t) = \mathop{\mathrm{\mathbb{E}}}[\varphi(X^{\varepsilon}_{t}, Y^{\varepsilon}_{t}, t) | X^{\varepsilon}_{0}=x, Y^{\varepsilon}_{0}=y]\) and perform a multiscale expansion (Hinch 1991) (Pavliotis and Stuart 2008) (Weinan 2011)
\[ \varphi^{\varepsilon}(x,y,t) = A(x,t) + \varepsilon B(x,y,t) + \mathcal{O}(\varepsilon^2). \tag{2}\]
Indeed, the first order term \(A(x,t)\) is expected to not depend on the initial condition \(y\) since the process \((X^{\varepsilon}_t, Y^{\varepsilon}_t)\) forgets \(Y^{\varepsilon}_0 = y\) on time scales of order \(\varepsilon\) and we are interested in the regime \(\varepsilon\to 0\). From Equation 2 one can obtain the dynamics of the averaged process described by the function \(A(x,t)\). One finds that \(A\) is described by the averaged generator of the slow component, i.e. averaging \(\mathcal{L}^{X^{\varepsilon}}\) under \(\rho_x(dy)\); this exactly gives Equation 1 in the case of diffusions. A typical example could be as follows:
\[ \left\{ \begin{align} dX^{\varepsilon} &= \mu(X^{\varepsilon}, Y^{\varepsilon}) dt + \sigma(X^{\varepsilon}, Y^{\varepsilon}) \, dW^X\\ dY^{\varepsilon} &= - \textcolor{red}{\varepsilon^{-1}} \frac{ (Y^{\varepsilon} - X^{\varepsilon}) }{\sigma^2} \, dt + \sqrt{2 \textcolor{red}{\varepsilon^{-1}} } \, dW^Y.\\ \end{align} \right. \]
The fast process \(Y^{\varepsilon}_t\) is a Ornstein-Uhlenbeck process sped-up by a factor \(1/\varepsilon\) that will very rapidly oscillate around \(X^{\varepsilon}_t\), with Gaussian fluctuations with variance \(\sigma^2>0\), ie:
\[ \rho_x(dy) \; = \; \frac{ e^{-(y-x)^2/2} }{\sqrt{2\pi \sigma^2}}\, dy. \]
This averaging phenomenon is relatively straightforward and not extremely surprising. More interesting is the homogenization phenomenon described in the next Section.
Homogenization
Consider the presence of an additional intermediate time scale \(\varepsilon^{-1/2}\), \[ \left\{ \begin{align} dX^{\varepsilon}/dt &= \textcolor{blue}{\varepsilon^{-1/2} H(X^{\varepsilon},Y^{\varepsilon})} \, + \,F(X^{\varepsilon},Y^{\varepsilon}, W^X)\\ dY^{\varepsilon}/dt &= \textcolor{red}{\varepsilon^{-1}} \, G(X^{\varepsilon},Y^{\varepsilon}, W^Y)\\ \end{align} \right. \] with the same assumption that for any fixed \(x \in \mathcal{X}\) the process \(dY^{[x]}/dt = G(x,Y^{[x]}, W^Y)\) is ergodic with respect to the probability distribution \(\rho_x(dy)\). The same reasoning as in the averaging case shows that averaging the term \(F(X^{\varepsilon},Y^{\varepsilon}, W^X)\) is relatively straightforward and has the exact same expression: it suffices to average under \(\rho_x(dy)\). This means that one can study instead
\[ \left\{ \begin{align} dX^{\varepsilon}/dt &= \textcolor{blue}{\varepsilon^{-1/2} H(X^{\varepsilon},Y^{\varepsilon})} \, + \, \overline{F}(X^{\varepsilon}, W^X)\\ dY^{\varepsilon}/dt &= \textcolor{red}{\varepsilon^{-1}} \, G(X^{\varepsilon},Y^{\varepsilon}, W^Y)\\ \end{align} \right. \]
with, informally, \(\overline{F}(x,w) = \int F(x,y,w) \, \rho_x(dy)\). The new interesting phenomenon is coming from the intermediate time scale \(\varepsilon^{-1/2}\). Contrarily to the averaging phenomenon of the previous section that was only relying on a Law of Large Numbers, dealing with the intermediate time-scale requires exploiting a CLT and quantifying the rate of mixing of the fast process \(Y^{[x]}\) Note that since \(\varepsilon^{-1/2} \gg 1\), for the dynamics to not explode one needs the centering condition:
\[ \int_{\mathcal{Y}} H(x,y) \, \rho_x(dy) = 0 \qquad \textrm{for all } x \in \mathcal{X}. \tag{3}\]
Because of the centering condition*, the term \( \textcolor{blue}{\varepsilon^{-1/2} H(X^{\varepsilon},Y^{\varepsilon})}\) will contribute an additional noise term in the effective dynamics of the slow process. To describe this additional noise term, assume an ergodic central limit theorem (CLT) for the fast process \(dY^{[x]}/dt = G(x,Y^{[x]}, W^Y)\): for a test function \(\varphi: \mathcal{Y}\to \mathbb{R}\) with zero expectation under \(\rho_x(dy)\) we have:
\[ \lim_{t \to \infty} \; T^{-1/2} \int_{t=0}^T \, \varphi(Y^{[x]}_t) \, dt \; = \; \mathcal{N}(0, V_x[\varphi]) \tag{4}\]
for asymptotic variance \(V_x[\varphi] \geq 0\). For a time increment \(\delta > 0\) and assuming \(X^{\varepsilon}_{t}=x\) we have
\[ \begin{align} X^{\varepsilon}_{t+\delta} - x &\approx \textcolor{blue}{ \varepsilon^{-1/2} \, \int_{u=t}^{t+\delta} H(X^{\varepsilon}_u,Y^{\varepsilon}_u)} \, du \, + \, \int_{u=t}^{t+\delta} \overline{F}(x, W^X_u) \, du. \end{align} \tag{5}\]
The second integral term is an averaging term that can be treated easily. Approximating the process \(t \mapsto Y^{\varepsilon}_t\) by \(t \mapsto Y^{[x]}_{t \varepsilon^{-1}}\), the first integral on the RHS of Equation 5 can be approximated as
\[ \begin{align} \underbrace{\varepsilon^{-1/2} \int_{u=t}^{t+\delta} H(x,Y^{[x]}_{u \, \varepsilon^{-1}}) \, du}_{\textrm{CLT}} \, + \underbrace{\int_{u=t}^{t+\delta} \varepsilon^{-1/2} \partial_x H(x,Y^{[x]}_{u \, \varepsilon^{-1}}) \, (X^{\varepsilon}_u - x) \, \, du}_{\textrm{(drift)}}. \end{align} \]
After a time-rescaling, one can readily see that the first term is described by the CLT of Equation 4,
\[ \varepsilon^{-1/2} \int_{u=t}^{t+\delta} H(x,Y^{[x]}_{u \, \varepsilon^{-1}}) \, du \approx V_x[H(x, \cdot)]^{1/2} \mathcal{N}(0, \delta). \]
The second term is further approximated as
\[ \begin{align} \varepsilon^{-1} \, &\int_{u=t}^{t+\delta}\int_{v=t}^{t+\delta} \partial_x H(x,Y^{[x]}_{u \, \varepsilon^{-1}}) \, H(x,Y^{[x]}_{v \, \varepsilon^{-1}}) \, 1_{v<u} \, du \, dv\\ &= {\left( \frac{1}{\delta \varepsilon^{-1}} \int_{u=t}^{t+\delta \varepsilon^{-1}}\int_{v=t}^{t+\delta \varepsilon^{-1}} \partial_x H(x,Y^{[x]}_{u}) \, H(x,Y^{[x]}_{v}) \, 1_{v<u} \right)} \, \delta, \end{align} \]
the second equality coming from the time-rescaling \(t \mapsto t \varepsilon\). The process \(Y^{[x]}\) mixes on scale \(\mathcal{O}(1)\) so that the term inside bracket \( {\left( \ldots \right)} \) converges to its expectation. Setting \(T = \delta \, \varepsilon^{-1} \to \infty\), one obtains
\[ \begin{align} I(x) &= \frac{1}{T} \iint_{[0,T]^2} \partial_x H(x,Y^{[x]}_{u}) \, H(x,Y^{[x]}_{v}) \, 1_{v<u} \, du \, dv \\ &\to \int \rho_x(dy) \, H(x,y) \, {\left\{ \int_{s=0}^{\infty} \mathop{\mathrm{\mathbb{E}}}[\partial_x H(\hat{x}, Y^{[x]}_s) \, | Y^{[x]}_0=y] \, ds \right\}} . \end{align} \]
In conclusion, the fast-slow system
\[ \left\{ \begin{align} dX^{\varepsilon} &= \textcolor{blue}{\varepsilon^{-1/2} \, H(X^{\varepsilon}, Y^{\varepsilon})} \, dt + \mu(X^{\varepsilon}, Y^{\varepsilon}) \, dt + \sigma(X^{\varepsilon}, Y^{\varepsilon}) \, dW^x\\ dY^{\varepsilon} &= \textcolor{red}{\varepsilon^{-1}} \, G(X^{\varepsilon},Y^{\varepsilon}, W^Y) \, dt \end{align} \right. \]
can be described in the regime \(\varepsilon\to 0\) by the effective dynamics
\[ dX = \textcolor{blue}{I(X) \, dt + \Gamma^{1/2}(X) \, dW^{H}} + \overline{\mu}(X) \, dt + \overline{\sigma}(X) \, dW^X. \]
for two independent Brownian motions \(W^X\) and \(W^H\). The volatility terms \( \textcolor{blue}{\Gamma(x)}\) comes from the CLT and the drift term \( \textcolor{blue}{I(x)}\) comes from the self-interaction term:
\[ \left\{ \begin{align} \Gamma(x) &= \lim_{T \to \infty} \; \mathop{\mathrm{Var}} {\left\{ T^{-1/2} \int_{0}^T H(x, Y^{[x]}_t) \, dt \right\}} \\ % I(x) &= \lim_{T \to \infty} \; \frac{1}{T} \iint_{0<u<v<T} H(x, Y^{[x]}_u) \, \partial_x H(x, Y^{[x]}_v) \, du \, dv. \end{align} \right. \tag{6}\]
For the drift function, the scaling \(T^{-1} \int_{0<u<v<T} (\ldots)\) may look a bit surprising at first sight as one may expect \(T^{-2} \int_{0<u<v<T} (\ldots)\) instead. Note that since the process \(Y^{[x]}\) mixes on a time scale \(\mathcal{O}(1)\) and the centering condition \(\int H(x, y) \rho_x(dy)=0\) holds, the expectation \(\mathop{\mathrm{\mathbb{E}}}[H(x, Y^{[x]}_u) \, \partial_x H(x, Y^{[x]}_v)]\) goes to zero as soon as \(|u-v| \gg 1\). This means that only the subset \(|u-v| = \mathcal{O}(1)\) of \([0,T]^2\) really matters in that double integral, hence the \((1/T)\) normalization factor.
Closed form solution & Poisson equation:
The drift and volatility terms \(\Gamma(x)\) and \(I(x)\) quantify the mixing properties of the fast process \(Y^{[x]}\). While formulas Equation 6 are intuitive, they can be difficult to deal with if one needs the exact expressions of the drift and volatility functions. Instead, they can also be expressed in terms of the solution to an appropriate Poisson equations.
\[ \begin{align} I(x) &= \frac{1}{T} \iint_{[0,T]^2} H(x,Y^{[x]}_{v}) \, \partial_x H(x,Y^{[x]}_{u}) \, 1_{v<u} \, du \, dv \\ &\to \int \rho_x(dy) \, H(x,y) \, \partial_{\hat{x}} {\left\{ \int_{s=0}^{\infty} \mathop{\mathrm{\mathbb{E}}}[H(\hat{x}, Y^{[x]}_s) \, |Y^{[x]}_0=y] \, ds \right\}} \\ &= -\int \rho_x(dy) \, H(x,y) \, \partial_{x} \Phi(x,y)\\ &= -\left< H(x, \cdot), \partial_x \Phi(x, \cdot) \right>_{\rho_x} \end{align} \tag{7}\]
where the function \(\Phi(x,y)\) is solution to the Poisson equation
\[ \mathcal{L}^{Y^{[x]}} \Phi(x, \cdot) = H(x, \cdot) \]
for all \(x \in \mathcal{X}\) and \(\mathcal{L}^{Y^{[x]}}\) is the generator of the fast process \(dY^{[x]}/dt = G(x,Y^{[x]}, W^Y)\). The last equality in Equation 7 follows from the integral representation of the Poisson equation. Similarly, and also as explained here, the asymptotic variance term can also be expressed in terms of the function \(\Phi\),
\[ \begin{align} \Gamma(x) &= \lim_{T \to \infty} \; \mathop{\mathrm{Var}} {\left\{ T^{-1/2} \int_{0}^T H(x, Y^{[x]}_t) \, dt \right\}} \\ &= -2 \int_{\mathcal{Y}} \Phi(x, y) \, H(x, y) \, \rho_x(dy)\\ &= -2 \left< \Phi, \mathcal{L}^{Y^{[x]}} \Phi \right>_{\rho_x}. \end{align} \]
Example: integrated OU process
Consider a slow process obtained by integrating an OU process,
\[ \left\{ \begin{align} dX^{\varepsilon} &= \varepsilon^{-1/2} \, Y^{\varepsilon} \, dt\\ dY^{\varepsilon} &= -\lambda \varepsilon^{-1}\, Y^{\varepsilon} \, dt + \sqrt{2 \lambda/\varepsilon} \, dW^Y, \end{align} \right. \]
where \(\lambda > 0\) is just a fixed time-scaling parameter. The fast OU process mixes on time scales of order \(\mathcal{O}(\varepsilon)\) and has a standard Gaussian distribution as invariant distribution. Homogenization gives that in the regime \(\varepsilon\to 0\), the slow process can be approximated as
\[ dX = \sqrt{2/\lambda} \, dW \tag{8}\]
since the asymptotic variance is
\[ \mathop{\mathrm{Var}} {\left\{ T^{-1/2} \int_{t=0}^{T} Y_t \, dt \right\}} \to 2 \, \int_{0}^{\infty} C(r) \, dr = 2/\lambda \]
where \(C(r) = \mathop{\mathrm{\mathbb{E}}}[Y_t Y_{t+r}] = \exp[-\lambda r]\) is the autocorrelation function of the fast OU process, as explained here. The fact that the effective diffusion is (twice) the integrated autocorrelation of the fast process is an example of Green-Kubo relations.
Example: Overdamped Langevin Dynamics
This example does not exactly fall within the homogenization result described in the previous section, but almost. Consider a potential \(U\) and the slow-fast dynamics:
\[ \left\{ \begin{align} dX^{\varepsilon} &= \varepsilon^{-1/2} \, Y^{\varepsilon} \, dt\\ dY^{\varepsilon} &= - \varepsilon^{-1}\, [Y^{\varepsilon}+\varepsilon^{1/2}\nabla U(X^{\varepsilon})] \, dt + \sqrt{2 /\varepsilon} \, dW^Y. \end{align} \right. \]
For any fixed value of \(x \in \mathcal{X}\), the fast OU-dynamics
\[ dY = -[Y^{\varepsilon}+\varepsilon^{1/2}\nabla U(x)] \, dt + \sqrt{2} \, dW^Y \]
converges to a Gaussian distribution with mean \(-\nabla U(x)\) and unit variance. The same arguments as the previous section immediately give that, starting from \(X^{\varepsilon}_0=x\), we have
\[ \varepsilon^{-1/2} \, \int_{0}^{\delta} Y^{\varepsilon} \, dt \; \to \; -\nabla U(x) \, \delta + \sqrt{2 \delta} \, \mathcal{N}(0,1). \]
The \(\sqrt{2}\) terms comes from the OU asymptotic variance. this shows that the slow process converges as \(\varepsilon\to 0\) to the overdamped Langevin dynamics
\[ dX = -\nabla U(X) \; + \; \sqrt{2} \, dW. \]
Example: Stratonovich Corrections
Consider a function \(f: \mathbb{R}\to \mathbb{R}\) and the slow-fast system
\[ dX^{\varepsilon} = \varepsilon^{-1/2} \, f(X^{\varepsilon}) \, Y^{\varepsilon} \, dt \]
where \(dY^{\varepsilon} = -(\lambda/\varepsilon) Y^{\varepsilon} + \sqrt{2 \lambda / \varepsilon}\) is a fast OU process mixing on scales of order \(\mathcal{O}(\varepsilon)\) and with standard centred Gaussian invariant distribution \(\rho(dy)\).The discussion leading to Equation 8 suggest that the term \(\varepsilon^{-1/2} \, Y^{\varepsilon} \, dt\) can be heuristically be thought of as \((2/\lambda)^{1/2} \, dW\), which would imply that the effective dynamics for the slow-process is
\[ dX = \sqrt{2/\lambda} \, f(X) \, dW. \]
We will see that this heuristic is wrong! In order to obtain the effective dynamics of the slow process as \(\varepsilon\to 0\), since the generator of the fast-OU reads \(\mathcal{L}\varphi= \lambda [ -y\,\varphi_y + \varphi_{yy}]\), one can solve the Poisson equation \(\mathcal{L}\Psi(x,y) = f(x)y\) to obtain that \(\Phi(x,y) = -f(x)y/\lambda\). One already knows that \(\mathop{\mathrm{Var}}[T^{-1}\int_{[0,T]} Y_t \, dt] = 2/\lambda\). The drift term is given by
\[ \begin{align} I(x) &= \int f(x) \partial_x \Phi(x,y) \, \rho(dy)\\ &= \lambda^{-1} \int f(x) f'(x) y^2 \, \rho(dy)\\ &= \lambda^{-1} f(x) f'(x). \end{align} \]
Putting everything together gives that the effective slow dynamics reads
\[ \begin{align} dX &= \textcolor{blue}{ \lambda^{-1} f'(X) f(X) \, dt } + \sqrt{2/\lambda} \, f(X) \, dW\\ &= \sqrt{2/\lambda} \, f(X) \textcolor{red}{\circ} dW \end{align} \]
where \( \textcolor{red}{\circ}\) denotes Stratonovich integration.
Readings
The book (Pavliotis and Stuart 2008) is beautiful, and I quite like the section on multiscale expansion in (Weinan 2011). For proving this type of results with the “martingale problem” approach (Stroock and Varadhan 1997), the lectures (Papanicolaou 1977) are nicely done.