Slide explaining sampling from a population using a random variable example, comparing die rolls and human height

What is Sampling?

When we pick an individual from a population to make an observation about them, we think of their response as a random variable. We then think of population characteristics as features of this random variable.

Suppose, for instance, we want to know the average height of a woman in the UK. Let’s call this unknown number $\mu$.

Now let $Y$ represent the height of a random woman, yet to be chosen from the UK population. Then we think of $\mu$ as the expectation of this “population” random variable – that is, $\mu=\mathbb{E}(Y)$.

A fundamental principle of statistics is that we think of $Y$ in the same way as the outcome of a randomisation device, such as the outcome of a fair die roll, $X$.

However, with $X$ we can calculate the expected value:

$$\mathbb{E}(X) = 1\times \frac{1}{6}+2\times \frac{1}{6}+3\times \frac{1}{6}+4\times \frac{1}{6}+5\times \frac{1}{6}+6\times \frac{1}{6}=3.5$$

But with the knowledge we have, there is no such procedure available for $\mathbb{E}(Y)$. For that, we’d have to know the PDF or PMF of women’s heights in the UK – but this is much more complex than what we’re looking for!

We therefore resort to guessing; that is, we must ask some women their heights, and use this data to estimate $\mu=\mathbb{E}(Y)$.

Background