Type I and type II errors
Biostatistics and epidemiology
AssessmentsType I and type II errors
Type I and type II errors
USMLE® Step 1 style questions USMLE
USMLE® Step 2 style questions USMLE
A cross-sectional study is performed to measure low density lipoprotein (LDL) levels in different patient groups. The mean LDL level is found to be 80 mg/dL in 150 normotensive hospitalized patients and 120 mg/dL in 150 hospitalized patients with stage I hypertension. The probability that the observed difference is due to chance alone is 1%. There is also a 10% probability of concluding that there is no difference in LDL levels when one truly exists. What is the power of this study?
Type I and type II errors exam links
Content Reviewers:Rishi Desai, MD, MPH
Contributors:Pauline Rowsome, BSc (Hons), Evan Debevec-McKenney
Let’s say that you’re trying to figure out if a certain medication, Medication A, lowers blood pressure better than the currently prescribed medication, Medication B. So you find 100 people with high blood pressure and give 50 of them Medication A and 50 of them Medication B, and after 6 months see which group has lower mean or average blood pressure.
For this study we would make two hypotheses.
The first hypothesis is called the null hypothesis, and it basically says there’s no difference between two variables that you care about.
For example, our null hypothesis would state that there’s no difference between the mean blood pressure after the 6 month study period, for the group that takes Medication A compared to the mean blood pressure for the group that takes Medication B.
In other words, that there’s no relationship between medication type and blood pressure.
On the other hand, the alternate hypothesis would state that there is a difference between the mean blood pressure for the group that takes Medication A compared to the mean blood pressure for the group that takes Medication B.
Again, in other words, that there is a relationship between medication type and blood pressure.
In theory, there are four possible conclusions that can come from this study, and we can organize them in a 2 by 2 table, where the true relationship between medication and blood pressure is on top, and the study conclusions are on the side.
When a study doesn’t see a relationship between medication and blood pressure, represented here as an arrow with a red cross, and there really isn’t one, then this is called a true negative.
When the study finds that there is a relationship, represented on our table by a green arrow, between medication and blood pressure, and there really is one, then this is a true positive.
Similarly, when the study concludes that there is a relationship between medication and blood pressure but there really is no difference - this is a false positive, also called a type I error.
And lastly when the study concludes that there isn’t a relationship between medication and blood pressure, but there really is - this is a false negative and is also called a type II error.
Ideally, a study would have all true positives and true negatives.
But this isn’t always the case, because there’s a chance that some other variable - besides the type of medication a person uses - could change their blood pressure.
For example, let’s say that in reality, Medication A doesn’t lower blood pressure better than Medication B. But in our study, we find that Medication A does seem to lower blood pressure better than Medication B, then that would be a type I error.
Maybe this happened because we accidentally chose people in the Medication A group that all started to exercise regularly halfway through the study, so their blood pressure decreased over the 6 months, but not necessarily because of Medication A. In that situation, we ended up having a type I error, because the two groups had different characteristics, simply by chance.
Now, in statistics there’s a threshold for how many type I errors we’re willing to accept in a study. This is called the alpha level or significance level, and usually it’s set at 0.05, which means that researchers are willing to get type I errors 5% of the time.
Once the alpha level has been set, we can use a statistical test to calculate a p-value for our specific data.
For example, let’s say that we use a t-test to see if there’s a difference in the mean blood pressure levels for people that take Medication A or Medication B, and we get a mean difference of 10 points, and a p-value of 0.02.
So going back to our two hypotheses, what does this mean? It means that, if the null hypothesis is true, then the probability of getting a mean difference in blood pressure of 10 points - or higher than 10 points - simply by chance, is about 2%.
In other words, there’s a very small probability - below 5% - that we would’ve gotten a type I error if the null hypothesis is true! And because that probability is less than our alpha level of 5%, we can conclude that, most likely, the null hypothesis is false and the alternate hypothesis is true.
And the alternative hypothesis is that there really is a significant difference in the mean blood pressure for those who took Medication A and those who took Medication B.
So, the alpha level sets the standard for how many type I errors there can be before we can reject the null hypothesis.
And the lower the alpha level, the harder it is to get a false positive result, or Type I error.
Two types of errors can occur in statistics and hypothesis testing. These are Type I and Type II errors. Type I error, also known as a false positive, occurs when a researcher rejects a null hypothesis that is actually true. In other words, the researcher concludes that there is a significant effect or relationship when there really isn't. On the other hand, type II error, which is also known as a false negative, occurs when a researcher fails to reject a null hypothesis that is actually false. In other words, the researcher concludes that there is no significant effect or relationship when there really is.