Probability Basics

probability
Published

23 01 2024

Modified

13 08 2025

Table of Contents

  1. Mean and Expectation
  2. Standard Deviation, Variances and Covariances
  3. Random Vectors
  4. Gaussian distributions

Mean and Expectation

The mean or expectation of a random variable \(X\) is denoted as \(\mathbb{E}[X]\)l. For a discrete random variable,

\[ \mathbb{E}[X] = \sum_{x} x \cdot \mathbb{P}(X = x) \]

and for a continuous random variable,

\[ \mu_X = \mathbb{E}[X] = \int_{-\infty}^{\infty} x \cdot f(x) \, dx \]

where \(f(x)\) is the probability density function of \(X\). It’s important to note that in some cases, particularly for distributions with “heavy” tails, the mean might not be well-defined. This situation arises in distributions like the Cauchy distribution, where the tails of the distribution do not decay rapidly enough to yield a finite expectation. In such cases, the integrals or sums used to define the mean do not converge.

Independence is a fundamental concept in probability theory, referring to the relationship between two random variables. Two random variables, \(X\) and \(Y\), are said to be independent if the occurrence of an event related to \(X\) does not influence the probability of an event related to \(Y\), and vice versa. Mathematically, \(X\) and \(Y\) are independent if and only if for every pair of events \(A\) and \(B\), the probability that both \(X\) belongs to \(A\) and \(Y\) belongs to \(B\) is the product of their individual probabilities. This can be expressed as:

\[ \mathbb{P}(X \in A \; \text{ and } \; Y \in B) = \mathbb{P}(X \in A) \cdot \mathbb{P}(Y \in B). \]

This can equivalently be expressed as the fact that, for any two functions \(F(\cdot)\) and \(G(\cdot)\), the following identity holds

\[ \mathbb{E}[F(X) \cdot G(Y)] \; = \; \mathbb{E}[F(X)] \cdot \mathbb{E}[G(Y)]. \]

This definition implies that knowing the outcome of \(X\) provides no information about the outcome of \(Y\), and this lack of influence is a key characteristic of independent random variables. One extremely important remark is that, for two random variables \(X\) and \(Y\), the expectation of the sum equals the sum of the expectation,

\[ \mathbb{E}[X+Y] = \mathbb{E}[X] + \mathbb{E}[Y], \]

as soon as all these quantities exists. This holds even if the two random variables are not independent.

Standard Deviation and Variances and Covariances

The variance of a random variable \(X\), denoted as \(\text{Var}(X)\), measures the spread of its values. It is defined as

\[ \begin{align} \text{Var}(X) &= \mathbb{E}[(X - \mu_X )^2] = \mathbb{E}[X^2] - \mu_X ^2\\ &= \int_{-\infty}^{\infty} (x - \mu_X)^2 \cdot f(x) \, dx \end{align} \]

The standard deviation is the square root of the variance, denoted as \(\sigma_X = \sqrt{\text{Var}(X)}\). The notion of covariance measures the linear relationship between two random variables \(X\) and \(Y\). It is defined as

\[ \begin{align} \text{Cov}(X, Y) &= \mathbb{E}[(X - \mu_X )(Y - \mu_Y )] = \mathbb{E}[X \, Y] - \mu_X \, \mu_Y\\ &= \int_{-\infty}^{\infty} (x - \mu_X)(y-\mu_Y) \cdot f(x, y) \, dx \end{align} \]

where \(f(x,y)\) is the joined density of the pair of random variables \((X,Y)\). The correlation is defined as a normalized version of the covariance,

\[ \text{Corr}(X,Y) = \textrm{Cov}\left\{ \frac{X - \mu_X}{\sigma_X}, \frac{Y - \mu_Y}{\sigma_Y}\right\} \]

and always satisfies \(-1 \leq \text{Corr}(X,Y) \leq 1\), as is easily proved (exercise). Note that if \(X\) and \(Y\) are independent, then \(\text{Cov}(X, Y) = 0\). However, zero covariance does not imply independence and it is a good exercise to construct such a counter-example. Standard manipulations reveal that for two random variables \(X\) and \(Y\) we have

\[ \text{Var}(X+Y) = \text{Var}(X) + \text{Var}(Y) + \text{Cov}(X, Y), \]

which is indeed the equivalent of the identity \((x+y)^2 = x^2 + y^2 + 2xy\). Importantly, if the two random variables \(X\) and \(Y\) are independent, the variance of the sum equals the sum of the variances, \(\text{Var}(X+Y) = \text{Var}(X) + \text{Var}(Y)\). This also shows that for \(N\) independent and identically distributed random variables \(X_1, \ldots, X_N\), we have that

\[ \text{Var}\left\{ \frac{X_1 + \ldots + X_N}{N} \right\} \; = \; \frac{\text{Var}(X)}{N}. \]

Random Vectors

A random vector is a vector of random variables. For a random vector \(\mathbf{X}\) in \(\mathbb{R}^d\), the mean of \(\mu = \mathbf{X} \ in \mathbb{R}^d\) is a vector in \(\mathbb{R}^d\), each component of which is the mean of one of its \(d\) components. The covariance matrix, \(\Sigma \in \mathbb{R}^{d,d}\), of \(\mathbf{X}\) is a \(d \times d\) matrix defined by

\[ \Sigma_{ij} = \text{Cov}(X_i, X_j) = \mathbb{E}[(X_i - \mu_{X_i}) \, (X_j - \mu_{X_j})] \]

where \(X_i\) and \(X_j\) are the \(i\)-th and \(j\)-th components of \(\mathbf{X}\), respectively. Each element \(\Sigma_{ij}\) represents the covariance between the \(i\)-th and \(j\)-th components of the vector \(\mathbf{X}\). If the components are independent, the covariance matrix is diagonal. Furthermore, the covariance matrix, when it exists, is always a symmetric and positive semi-definite matrix.

Gaussian distributions

The Gaussian distribution, also known as the normal distribution, holds a central place in statistics, probability, and applied mathematics due to several key reasons. Firstly, its mathematical properties are well-understood and conducive to analytical work. Secondly, the Central Limit Theorem states that the sum of a large number of independent, identically distributed variables will approximately follow a Gaussian distribution, regardless of the original distribution. This makes it a fundamental tool for inferential statistics. Furthermore, Gaussian distributions arise naturally in numerous contexts due to random noise and errors often tending to distribute normally. Lastly, since Gaussian distributions are extremely tractable, its properties allow for convenient modeling in various fields.

Univariate case:

A univariate Gaussian (or normal) distribution for a random variable \(X\) is characterized by its mean \(\mu\) and variance \(\sigma^2\). Its probability density function is given by

\[ f(x) = \frac{1}{\sqrt{2\pi\sigma^2}} \exp\left\{ -\frac{(x - \mu)^2}{2\sigma^2}\right\} \]

The constant \(1/\sqrt{2\pi\sigma^2}\) is often written as \(1/\mathcal{Z}\) where \(\mathcal{Z} = \sqrt{2\pi\sigma^2}\) is referred to as the “normalization factor”. It ensures that the density \(f(x)\) integrates to one.

Multivariate case:

A multivariate Gaussian distribution for a random vector \(\mathbf{X}\) in \(\mathbb{R}^n\) is characterized by a mean vector \(\boldsymbol{\mu}\) and a covariance matrix \(\Sigma\). Its probability density function is

\[ f(\mathbf{x}) = \frac{1}{\sqrt{(2\pi)^n |\Sigma|}} \exp\left\{ -\frac{1}{2} \langle (\mathbf{x} - \boldsymbol{\mu}), \Sigma^{-1} (\mathbf{x} - \boldsymbol{\mu})\rangle \right\} \]

and \(|\Sigma|\) is the determinant of the covariance matrix \(\Sigma\). Crucially, the inverse of the covariance matrix is inside the dot product \(\langle (\mathbf{x} - \boldsymbol{\mu}), \Sigma^{-1} (\mathbf{x} - \boldsymbol{\mu})\rangle\). In general, zero correlation between variables does not imply their independence. However, this principle has a notable exception in the case of Gaussian distributions. If two vectors \(X,Y\) are such that \((X,Y)\) is a Gaussian vector, then zero correlation implies independence. Note that, it is still possible that \(X\) and \(Y\) are both Gaussian vectors with zero correlation and yet are not independent; this may happen with the joint vector \((X,Y)\) is not Gaussian (although both \(X\) and \(Y\) are).