What is a Random Sample?
A random sample is a collection of independent random variables, all with the same distribution.
Suppose we are interested in the height of a woman in the UK. Then since we cannot calculate this using maths alone, it makes sense to ask some women their heights and use this data to form an estimate.
In statistics, we think of selecting a woman and asking her height in the same way we think of rolling a die.
If we think of the height of a randomly-chosen woman as a random variable $Y$, then the population mean we are looking for is modelled as $\mu = \mathbb{E}(Y)$. However, to figure out $\mu$, we would usually not ask just one woman but many.
If we plan to ask $n$ women their heights, we think of the responses we will get as $n$ random variables:
$$Y_1,Y_2, \dots, Y_n$$
As we have said, this collection forms a random sample if two conditions are met:
1.) They are independent
Intuitively, the height of one woman in the sample should not affect the height of the others.
So, we should pick women to ask individually, and not, for example, ask a group of friends – who may be similar in various ways such as their height.
2.) They each have the same distribution
They should all reflect the same “population” variable, corresponding to selecting a single woman at random from the UK population.
For instance, it should not be the case that women later in the list are any different height-wise than those asked earlier. Such a difference might e.g. arise if we collected parts of our example at different times of day in a rough neighbourhood, if taller women feel more confident in being out and about later at night.
We can think of these two assumptions as relating to the way we’ve collected our data. But later we will use them in mathematical proofs.
Our random sample $Y_1,Y_2, \dots, Y_n$ differs from our dataset $y_1,y_2, \dots ,y_n$, which is not random, but instead a collection of numbers obtained from some specific women we have already asked the heights of.
What is an Estimator?
An estimator is a random variable which is a function of a random sample $Y_1,Y_2, \dots, Y_n$. Typically it is used to estimate an unknown number like $\mu$.
Given some data $y_1, y_2, \dots, y_n$, the heights of $n$ particular women, it is natural to estimate the population average height by looking at their average:
$$\frac{y_1+y_2+\dots +y_n}{n}$$
The corresponding estimator looks similar, but with the random variables $Y_i$ instead of the numerical observations $y_i$:
$$\frac{Y_1+Y_2+\dots +Y_n}{n}$$
Sometimes we think of it as a formula or “method” for producing an estimate from a dataset. But we should also be conscious that it is a random variable in its own right, like the individual $Y_i$.
In particular, since estimators are random variables, we can investigate the properties of estimators using probability theory.