Convergence in probability explained with intuitive definition and epsilon condition

What is Convergence in Probability?

Consider a sequence of random variables $X_1, X_2, X_3, … $. Roughly speaking, we say that this sequence converges in probability to a “target” random variable $X$ if, as $n$ tends to $\infty$, the probability that $X_n$ and $X$ are “far apart” becomes very small.

To formalise this, we need to be clearer about what “far apart” means. For instance, we might decide that $X_n$ and $X$ are “far apart” if they differ by $0.1$ or more; that is, $|X_n-X|>0.1$.

However, this represents a particular choice whose suitability depends on context. If you’re designing some trainers and $X_n$ represents a random person’s foot width in centimetres, this level of precision may be too ambitious to hope for. On the other hand, if you are designing components for a NASA mission to Mars, this may be too loose for your requirements.

In our definition, then, we generalise to replace the $0.1$ with any error bound, which we denote $\varepsilon$.

We then insist that, whatever error bound $\varepsilon$ we choose, the probability of $X_n$ and $X$ being “far apart” still becomes negligible for large $n$.

Formally, we have that the sequence $X_1, X_2, X_3, … $ converges to $X$ in probability if:

$$ \text{For all } \varepsilon>0,\ \ \lim_{n \to \infty} \mathbb{P}(\left| X_n-X \right| >\varepsilon) = 0 $$

If this convergence holds, we can write $ X_n \overset{p}{\longrightarrow} X$.

Although this notation looks like an ordinary limit, we should emphasise again that we are talking about the convergence of a sequence of probabilities here, namely, $\mathbb{P}(\left| X_n-X \right|>\varepsilon)$ – not the sequence of random variables $X_1, X_2, \dots$ itself.

That is, convergence in probability is a property of the sequence:

$$\mathbb{P}(\left| X_1-X \right|>\varepsilon), \ \ \mathbb{P}(\left| X_2-X \right|>\varepsilon), \ \ \mathbb{P}(\left| X_3-X \right|>\varepsilon), \dots$$

This is the second-strongest of the three main notions of convergence – implied by one, but implying that other. Specifically, we have:

$$\text{Almost Sure Convergence} \Rightarrow \text{Convergence in Probability} \Rightarrow \text{Convergence in Distribution}$$

However, if the “target” random variable $X$ is a constant, then convergence in distribution is equivalent to convergence in probability.

Intuitively, if $X$ is just a constant, and if $X_n$ begins to “behave like” that same constant as $n \to \infty$ (convergence in distribution), becomes likely to be close to it, too (convergence in probability).

Finally, we illustrate with an important example. Let $X_1, X_2, X_3, \dots $ be independent random variables, all with the same distribution, and with finite mean and variance $ \mathbb{E}(X_n) = \mu $ and $\mathbb{V}\text{ar}(X_n) = \sigma^2 $ for all $n$.

Now let $\bar{X}_n$ be the sample mean of the first $n$; that is:

$$\bar{X}_n = \frac{X_1+X_2+…+X_n}{n}$$

Let our target random variable be just the constant $\mu$.

Then we have that:

$$ \bar{X}_n \overset{p}{\longrightarrow} \mu$$

This is called the “weak law of large numbers”.