In this video I answer the common question of why we divide by n-1 when calculating variance from a sample, known as Bessel's Correction. I focus on conceptual understanding of why this adjustment is needed and why n-1 is the appropriate adjustment on average, rather than making up a population and possible samples to illustrate this. I show why x-bar (the mean of the sample) tends to underestimate the squared deviations, then provide 2 arguments for why n-1 adjusts for this; one based on degrees of freedom, and the other based on trying to estimate the average amount of bias of the sample variance.
Here's a more detailed explanation for why the average bias is cut in half when moving from a sample of 1 to a sample of 2, etc. for larger samples, which I didn't fully explain in the video:
We can think of this bias relating to how bad our estimate of mu (and thus our estimate of deviations) can possibly be.
If we think about using 1 score to estimate mu, it could be anywhere on the full range of x in the population. Now we can ask how much adding a 2nd score would improve our estimate of mu and our deviations. If we assume a normal distribution, the probability of any one score being above or below mu is 50%, but the probability of selecting 2 scores on the same side is only 25%.
And the lower the first score is, the greater the probability the second score will be above that score and improve the estimate (though it could over-correct in the other direction).
If our first score happened to be the lowest possible value of x, then any 2nd score could only move our estimate of x-bar closer to the true population mean or keep it the same. We can't be any more wrong in our estimate of mu; getting the same extreme low value again wouldn't change x-bar, and even if the 2nd score were the highest possible value of x, this would just bring x-bar to the population mean because the distribution is symmetrical. So a 2nd score could improve our estimate up to being exactly correct, with 0% chance of over-estimating mu.
But as the first score falls closer to the true population mean, the probability that a 2nd score will improve the estimate of x-bar decreases, and the possibility of worsening the estimate starts to increase (because it's more and more likely to get values below the first score, and high values could now over-correct, giving us estimates that are farther above mu than the first score was below it). But the maximum amount it could possibly worsen the estimate is cut in half because the 2nd score will only move x-bar half its distance from the 1st score.
So if the first score happened to be the true population mean (which we wouldn't actually know), a second score could only keep this the same or worsen this a maximum of 50% in either direction compared to how bad the estimate could possibly be with only one score by itself.
So then if we imagine all the different possible combinations of 2 scores and the probabilities of their pairings, having a second score will improve some estimates and worsen others (depending on how close the first score was to mu) but cut the occurrence of extremes in half. The extreme estimates are still possible (2 extremely low or high scores together) but these are half as likely to occur as they would be if we had only picked one score to estimate mu.
This continues as n increases. If the current x-bar differs from mu, then the probability an additional score will shift it in the correct direction is always greater than 50% (because more than half of scores in the population will be above or below that estimate) while the amount it might over-correct gets smaller and smaller (the 10th score can only pull the mean a maximum of 10% of its distance from the estimate using 9 scores, etc.).
To give a concrete example; if I had 2 people, one with an IQ of 100 and one with 140, my estimate of mu (assume it's 100 in the population) could be off by 40 using just the 2nd score, but only off by 20 using both scores. If I had an average of 100 from 9 people and added a 10th at 140, my estimate would only move to 104, compared to possibly being off by 40 if I had used the single 10th score by itself.
And as we get to higher sample sizes, the possibility of drawing more extreme scores is very low because there just aren't that many scores there. If we have a sample size of 10,000 we won't get all extreme high/low values on one side not just because it's unlikely but because there just aren't 10,000 scores there in the population, so very extreme values for mu start to become impossible to select.
Hope this helps!