# Standard error of the mean (Central limit theorem)

Videos

Notes

## Biostatistics and epidemiology

#### Biostatistics

Introduction to biostatistics
Statistical probability distributions
Parametric tests
Non-parametric tests
Outcomes of statistical hypothesis testing

0 / 11 complete

0 / 1 complete
High Yield Notes
3 pages
Flashcards

11 flashcards
Questions

1 questions

### USMLE® Step 2 style questions USMLE

1 questions
Preview

A group of students is trying to determine the average height of 9th graders at their high school. The students survey the entire 9th grade student population and are calculating a parameter that measures how spread out the data points are from the average. Which of the following best describes this parameter?

Transcript

#### Content Reviewers:

Rishi Desai, MD, MPH

#### Contributors:

Let’s say you ask 1000 men for their weights and then plot those weights on a histogram, which is a type of plot that shows the distribution of measurements or data.

So let’s say that the majority of men weighed the same as the average - which in this case might be 170 pounds, or around 77 kilograms - while fewer men weighed a little bit higher or a little bit lower than the average, and even fewer men weighed much higher or much lower than the average.

If we draw a curve over the top of our histogram, we get the normal distribution curve, which is also called the bell curve, because it’s shaped like a bell.

The bell curve is symmetrical, with half the data on the left of the average and half the data on the right side of the average.

The area under the bell curve is equal to 1, or 100%, with the highest percentage of data in the middle section and the lowest percentage of data in the outer tails of the curve.

Typically, for population data, the average point in a bell curve is labeled with the greek letter mu, and mu refers to the mean, median, and mode, because when data are normally distributed, the mean, median, and mode are all equal to each other.

The standard deviation is a measure of how spread out the data are from the average, and for population data it’s represented by the lowercase greek letter sigma.

For example, let’s say the standard deviation of weight for our sample of men is 29 pounds, or 13 kilograms.

In a normal distribution, 68 percent of the data are found within one standard deviation.

That means that 68 percent of men will weigh somewhere between 170 minus 29, or 141 pounds, and 170 plus 29, or 199 pounds.

Also, 95 percent of the data are found within two standard deviations - so, since 29 times 2 is 58, then 95 percent of men will weigh somewhere between 170 minus 58, or 112 pounds, and 170 plus 58, or 228 pounds.

Finally, 99.7 percent the data are found within three standard deviations, and since 29 times 3 is 87, 99.7% of men will weigh between 170 minus 87, or 83 pounds, and 170 plus 87, or 257 pounds.

This is called the empirical rule, or the 68-95-99.7 rule.

Now, the shape of the bell curve depends on the size of the standard deviation.

A small standard deviation, like if it was only 5 pounds, tells you that most of the data are clustered around the average - and this makes the bell curve very tall and skinny.

On the other hand, a large standard deviation, like if it was 50 pounds, tells you that most of the data are way above and way below the average - and this makes the bell curve look very wide and flat.

It’s also possible that the population of 1000 men have a skewed distribution instead of a normal distribution, meaning one tail of the bell curve is longer than the other.

A right-skewed distribution means that the right tail is longer than the left tail, and a left-skewed distribution means that the left tail is longer than the right tail.

Typically, when the distribution is skewed, the mean, median, and mode are not equal.

Oftentimes it’s impossible to collect measurements from every single person in the population, so we choose a sample which is basically a small number of people that we think represent the larger group.

As a general rule, if we collect the sample randomly, meaning people are chosen solely by chance, then we expect that sample to have similar characteristics - like the same distribution of weight - as the population they’re chosen from.

And if the two groups have similar characteristics, we also expect that the mean and the standard deviation to be the same in the two groups.

For example, let’s say we randomly take a sample of 50 men from the total population of 1000 men.

If the population has a mean weight of 170 pounds and a standard deviation of 29 pounds, we also expect the sample to have a mean weight of 170 pounds and a standard deviation of 29 pounds.

But in some cases the sample we collect won’t have a mean of exactly 170.

For example, a random sample of people might weigh more than the population mean, so the sample mean will be higher than the population mean.

Summary
Central limit theorem states that if the desired data is obtained repeatedly from random samples and the mean is calculated for each sample, these means will form a normal Gaussian curve.This curve will always be normal regardless to the shape of the original curve. The standard deviation of this curve is called the standard error of mean.The standard error of mean does not measure the dispersion of data but measures how much the sample represents the population. It is directly proportional to standard deviation and inversely proportional to sample size.