What is a Consistent Estimator?
Given an unknown $\theta$ and a sequence of estimators $\hat{\theta}_1, \hat{\theta}_2, \hat{\theta}_3, \dots $, we say that the sequence is consistent if the probability of getting an estimate that is “far away” from $\theta$ tends to $0$ as $n \to \infty$.
Formally, we first choose an “error tolerance” $\varepsilon>0$, representing what we mean for an estimator to have “gone wrong”. That is, we think of $\hat{\theta}_n$ and $\theta$ as being “far apart” if $ | \hat{\theta}_n - \theta | \gt \varepsilon$.
Then we require that the probability of this event $| \hat{\theta}_n - \theta | \gt \varepsilon$ tends to $0$ as $ n \to \infty $:
$$\forall \varepsilon \gt 0, \ \ \lim_{n \to \infty} \mathbb{P}(| \hat{\theta}_n - \theta | \gt \varepsilon) = 0 $$
Intuitively, having more data makes the possibility of estimation “going wrong” increasingly unlikely.
For example, consider the series of sample means:
$$ \bar{Y}_1 = \frac{Y_1}{1}, \ \ \bar{Y}_2 = \frac{Y_1+Y_2}{2}, \ \ \bar{Y}_3 = \frac{Y_1+Y_2+Y_3}{3}, \ \ \dots $$
Then this sequence is consistent for the population mean $\mu=\mathbb{E}(Y_i)$ as the sample size $n \to \infty$.
Defining $\operatorname{Bias}_{\theta}(\hat{\theta})=\mathbb{E}(\hat{\theta})-\theta$, a sequence $\hat{\theta}_n$ is automatically consistent if both the bias and the variance tend to 0 as $n \to \infty$.
However, the converse is not true; there are estimators such that as $n$ increases they are increasingly likely not to “go wrong”, but when they do so this happens in an increasingly catastrophic way, meaning that the bias and/or variance do not vanish overall.
Consistency is the same as convergence in probability to the constant $\theta$:
$$ \hat{\theta}_n \overset{p}{\longrightarrow} \theta \ \text{ as } \ n \to \infty$$