Let \(q(dx) \equiv \mathcal{N}(\mu,\Gamma)\) be the Gaussian distribution with mean \(\mu \in \mathbb{R}^D\) and covariances \(\Gamma \in \mathbb{R}^{D \times D}\). For a direction \(u \in \mathbb{R}^D\), consider the distribution \(q^{u}(dx) \equiv \mathcal{N}(\mu + \Gamma^{1/2} \, u, \Gamma)\), i.e. the same Gaussian distribution but shifted by an amount \(\Gamma^{1/2} \, u\). Algebra directly gives that

\[ \frac{q^{u}(x)}{q(x)} = \exp {\left\{ - \frac{1}{2} \| u\|^2 + \left< u, \, \Gamma^{-1/2}(x-\mu) \right> \right\}} . \tag{1}\]

We will see that, not very surprisingly, a similar change-of-probability result holds in continuous time. On the time interval \([0,T]\), let \(W_t\) be a standard Brownian motion in \(\mathbb{R}^D\) and \(X_t\) be the solution to the SDE

\[ dX_t \; = \; b(X_t) \, dt + \sigma(X_t) \, dW_t \tag{2}\]

for some drift \(b: \mathbb{R}^D \to \mathbb{R}^D\) and diffusion \(\sigma: \mathbb{R}^D \to \mathbb{R}^{D \times D}\) and initial distribution \(\mu_0(dx_0)\). This SDE defines a probability measure \(\mathbb{P}\) on the path-space \(C([0,T]; \mathbb{R}^D)\), the space of continuous functions from \([0,T]\) to \(\mathbb{R}^D\). Consider a perturbation drift function \(u: \mathbb{R}^D \to \mathbb{R}^D\) and associated perturbed SDE given by

\[ dX_t^u \; = \; b(X_t^u) \, dt + \sigma(X_t^u) \, {\left\{ dW_t + \textcolor{blue}{u(X_t^u) \, dt} \right\}} . \tag{3}\]

This perturbed SDE, started from the same initial distribution \(\mu_0(dx_0)\), defines a probability measure \(\mathbb{P}^u\) on the path-space \(C([0,T]; \mathbb{R}^D)\) and it is often useful to understand the Radon-Nikodym derivative of \(\mathbb{P}^u\) with respect to \(\mathbb{P}\). I have never really liked the way this is usually derived, and also never really remember the result. It takes only a few lines of algebra to re-derive these results, at least informally. For this purpose, consider a simpler Euler discretization of the SDE with time-discretization \(\delta = T/N\) for \(N \gg 1\). Consider a discretized paths \((x_0, x_{\delta}, \ldots, x_{T})\) of Equation 2 obtained by iterating the update

\[ x_{t_{k+1}} \; = \; x_{t_k} + b(x_{t_k})\,\delta + \sigma(x_{t_k}) \, (\Delta W_{t_k}) \]

with \(t_k = k\delta\) and \(\Delta W_{t_k} = W_{t_{k+1}} - W_{t_k}\). The probability of observing such a path reads \[ \frac{1}{\mathcal{Z}} \, \mu_0(x_0) \, \exp {\left\{ -\frac{1}{2 \delta} \sum_{k=0}^{N-1} \|x_{t_{k+1}} - [x_{t_k} + b(x_{t_k})\,\delta]\|^2_{\Gamma^{-1}(x_{t_k}) } \right\}} \]

with \(\Gamma(x) \equiv \sigma(x) \sigma^\top(x)\) the volatility matrix and an irrelevant multiplicative constant \(\mathcal{Z}\). One obtains a similar expression for a discretized path of the perturbed SDE Equation 3 and the ratio of these two quantities equals

\[ \frac{d \widetilde{\mathbb{P}}^{u}}{d \widetilde{\mathbb{P}}}(x) = \exp {\left\{ \sum_{k=0}^{N-1} -\frac{\delta}{2} \|u(x_{t_k})\|^2 + \left< x_{t_{k+1}}-x_{t_k}-b(x_{t_k})\delta, \sigma(x_{t_k}) \, u(x_{t_k}) \right>_{\Gamma^{-1}(x_{t_k})} \right\}} . \]

where the tilde notation denotes the discretized version of the measures. Since

\[ x_{t_{k+1}}-x_{t_k}-b(x_{t_k})\delta = \sigma(x_{t_k}) \, \Delta W_{t_k}, \]

under \(\widetilde{\mathbb{P}}\) and as \(N \to \infty\), for a path \(dx_t \; = \; b(x_t) \, dt + \sigma(x_t) \, dW_t\), we have

\[ \frac{d \widetilde{\mathbb{P}}^{u}}{d \widetilde{\mathbb{P}}}(x) \to \exp {\left\{ -\frac 12 \, \int_0^T \|u(x_t)\|^2 \, dt + \int_{0}^T u^\top(x_t) \, dW_t \right\}} . \]

Similarly, under \(\widetilde{\mathbb{P}}^u\) and as \(N \to \infty\), for a path \(dx^{u}_t \; = \; b(x^u_t) \, dt + \sigma(x^u_t) \, {\left( dW_t + u(x^u_t) \right)} \), we have

\[ \frac{d \widetilde{\mathbb{P}}}{d \widetilde{\mathbb{P}}^u}(x^u) \to \exp {\left\{ -\frac 12 \, \int_0^T \|u(x^u_t)\|^2 \, dt - \int_{0}^T u^\top(x^u_t) \, dW_t \right\}} . \]

These results remain identical for time-dependent drift and volatility functions, as is clear from this non-rigorous argument. The above two formulas for \(d\mathbb{P}^u/d\mathbb{P}(x)\) and \(d\mathbb{P}/d\mathbb{P}^u(x)\) may be slightly confusing since they are not immediately recognizable as inverse of each other. Another way to write these results that is very similar to Equation 1 and that is often used in the physics literature is as follows,

\[ \frac{d \mathbb{P}^{u}}{d \mathbb{P}}(x) = \exp {\left\{ -\frac 12 \, \int_0^T \|u(x_t)\|^2 \, dt + \int_{0}^T u^\top(x_t) \, \frac{dx_t - b(x_t) dt}{\sigma(x_t)} \right\}} , \]

From this expression, it is slightly easier to see the relationship between \(d\mathbb{P}^u/d\mathbb{P}(x)\) and \(d\mathbb{P}/d\mathbb{P}^u(x)\). As described below, these change of variables formulae are often useful when performing importance sampling on path-space. As a sanity check, one can see that in the case of a scalar Brownian motion \(dX = \sigma \, dW\) and drifted version of it \(dX^u = \sigma \, dW + u \, dt\), we indeed have that \(d\mathbb{P}^u/d\mathbb{P}(x)\) has unit expectation under \(\mathbb{P}\) since it is equivalent to the fact \(\mathop{\mathrm{\mathbb{E}}}[\exp(\sigma \, \xi)] = \exp(\sigma^2/2)\) for a standard Gaussian random variable \(\xi\). Finally, note that the Kullback-Leibler divergence between \(\mathbb{P}\) and \(\mathbb{P}^u\) has a particularly simple form. Since \(\mathop{\mathrm{D_{\text{KL}}}}(\mathbb{P}, \mathbb{P}^u) = \mathop{\mathrm{\mathbb{E}}}_{\mathbb{P}} {\left[ -\log {\left\{ \frac{d \mathbb{P}^{u}}{d \mathbb{P}}(X) \right\}} \right]} \) one obtains

\[ \mathop{\mathrm{D_{\text{KL}}}}(\mathbb{P}, \mathbb{P}^u) = \frac12 \mathop{\mathrm{\mathbb{E}}} {\left[ \int_0^T \|u(X_t)\|^2 \, dt \right]} . \]

### Importance Sampling on path-space

Consider a functional \(\Phi: C([0,T]; \mathbb{R}^D) \to \mathbb{R}\) on path-space; a typical example is

\[ \Phi(x) = \exp {\left\{ \int_0^T f(X_t) \, dt \, + \, g(X_T) \right\}} . \]

Suppose that we would like to evaluate the expectation of \(\Phi\) under the measure \(\mathbb{P}\). Naive Monte-Carlo (MC) would require sampling \(M\) trajectories from Equation 2 and computing the average of \(\Phi\) on these trajectories. To reduce the variance of this naive MC estimator, one can also use importance sampling by sampling \(M\) trajectories \(x^{1,u}, \ldots, x^{M,u}\) from the measure \(\mathbb{P}^u\) and compute the average

\[ \frac{1}{M} \, \sum_{i=1}^M \Phi(x^{i,u}) \, W(x^{i,u}) \]

with weights given by the Radon-Nikodym derivative

\[ W(x^{i,u}) \; = \; \exp {\left\{ -\frac 12 \, \int_0^T \|u(x^{i,u}_t)\|^2 \, dt - \int_{0}^T u^\top(x^{i,u}_t) \, dW_t \right\}} . \]

Choosing the optimal “control” function \(u\) that minimizes the variance of the estimator is not entirely straightforward, although this previous note already gives the answer. More on this in another note.