AssessmentsNormal distribution and z-scores
Normal distribution and z-scores
In Gaussian curves, 99% of the data lies within (number) standard deviations from the mean.
Content Reviewers:Rishi Desai, MD, MPH
Let’s say you ask 1000 men for their weight, and then you plot their answers on a histogram, which is a plot that shows the distribution of any measurement or data.
Let’s say that the average weight is 170 pounds or about 77 kilograms, and that it turns out that the majority of men weighed that amount, whereas fewer men weighed a little bit higher or a little bit lower than the average, and even fewer men weighed much higher or much lower than the average.
If we draw a curve over the top of our histogram, we get the normal distribution curve, which is also called the bell curve, because it’s shaped like a bell.
The bell curve is symmetrical, with half the data on the left of the average and half the data on the right side of the average.
The area under the bell curve is equal to 1, or 100%, with the highest percentage of data in the middle section and the lowest percentage of data in the outer tails of the curve.
Typically, for population data, the average point in a bell curve is labeled with the greek letter mu, and mu refers to the mean, median, and mode, because when data are normally distributed, the mean, median, and mode are all equal to each other.
The standard deviation is a measure of how spread out the data are from the average, and for population data it’s represented by the greek letter sigma.
For example, let’s say the standard deviation of weight for our sample of men is 29 pounds, or 13 kilograms.
In a normal distribution, 68 percent of the data are within one standard deviation.
That means that 68 percent of men will weigh somewhere between 170 minus 29, or 141 pounds, and 170 plus 29, or 199 pounds.
Also, 95 percent of the data are found within two standard deviations - so, since 29 times 2 is 58, then 95 percent of men will weigh somewhere between 170 minus 58, or 112 pounds, and 170 plus 58, or 228 pounds.
Finally, 99.7 percent the data are found within three standard deviations, and since 29 times 3 is 87, 99.7% of men will weigh between 170 minus 87, or 83 pounds, and 170 plus 87, or 257 pounds.
This is called the empirical rule, or the 68-95-99.7 rule.
Now, the shape of the bell curve depends on the size of the standard deviation.
A small standard deviation, like if it was only 5 pounds, tells you that most of the data are clustered around the average - and this makes the bell curve very tall and skinny.
On the other hand, a large standard deviation, like if it was 50 pounds, tells you that most of the data are way above and way below the average - and this makes the bell curve look very wide and flat.
Now, let’s say a man named Micah weighs 220 pounds, and he wants to know how close his weight is to the average weight.
We can calculate how much more he weighs than the average by subtracting the average weight, 170 from his weight, 220, which equals 50.
But telling Micah that he weighs 50 pounds over the average doesn’t really have much meaning, because he probably doesn’t know if 50 pounds is a lot higher or only a little higher than the average.
Instead, we might tell Micah his z-score, or standard score, which is a measure of how many standard deviations his weight is from the average weight.
Z-scores range from negative 3 standard deviations, which would be on the very far end of the left tail, to positive 3 standard deviations, which would be on the very far end of the right tail.
In the normal distribution, the average value is the reference point, so the average value equals 0 standard deviations.
To figure out a z-score for an individual measurement - like Micah’s weight - we use the equation z equals the measurement minus the average measurement in the population, divided by the standard deviation for the population.
Usually, the individual measurement is represented by the letter x, so the equation can also be written z equals x minus mu, divided by sigma.
So, to figure out Micah’s z-score, we do 220 minus 170, divided by 29, which equals 1.72.
This means that Micah weight is 1.72 standard deviations above the population average.