Almost sure convergence explained using sequences of random variables and pathwise convergence intuition

What is Almost Sure Convergence?

Recall that a random variable is a function from the sample space $\Omega$ to the real numbers. That is, it assigns a numerical value to each possible outcome of an experiment.

Once the experiment is run and an outcome $\omega$ is selected, we then input this into the random variable (since this is a function) to produce a number.

We explain what it means for a sequence of random variables $X_1, X_2, \dots$ to converge almost surely to a “target” random variable $X$, where all of these random variables have the same sample space $\Omega$ (i.e. relate to the same experiment).

As above, once $\omega$ is drawn from $\Omega$, we have a number $X(\omega)$ corresponding to the “target” random variable $X$, and a further sequence of numbers $X_1(\omega), X_2(\omega), X_3(\omega), \dots$.

This notion of convergence says that we can be “almost sure” that this sequence $X_1(\omega), X_2(\omega), X_3(\omega) \dots$ converges to the number $X(\omega)$, in the ordinary sense of a sequence converging to a limit (as explained in the slide).

Being “almost sure” means that the probability of drawing an outcome $\omega$ such that this happens is equal to 1.

That is:

$$\mathbb{P}( \{ \omega \in \Omega : \lim\limits_{n \to \infty}X_n(\omega) = X(\omega) \}) = 1 $$

If this holds, we can write $ X_n \overset{a.s.}{\longrightarrow} X$.

This is the strongest of the three main notions of convergence, and implies the other two. That is, we have:

$$\text{Almost Sure Convergence} \Rightarrow \text{Convergence in Probability} \Rightarrow \text{Convergence in Distribution}$$

Moreover, an analogous statement to the “weak law of large numbers” (relating to convergence in probability) also holds here.

That is; let $X_1, X_2, X_3, \dots $ be independent random variables, all with the same distribution, and with finite mean and variance $ \mathbb{E}(X_n) = \mu $ and $\mathbb{V}\text{ar}(X_n) = \sigma^2 $ for all $n$.

Now let $\bar{X}_n$ be the sample mean of the first $n$. Then we have that:

$$ \bar{X}_n \overset{a.s.}{\longrightarrow} \mu$$

This is called the “strong law of large numbers”.

Background: