AssessmentsType I and type II errors
Type I and type II errors
USMLE® Step 1 style questions USMLE
A student researcher has gathered data from a study to evaluate treatment options for weight loss in children who are overweight or obesity. The study was a non-randomized and non-blinded family-based intervention. The main outcome of interest is improvement in quality of life scores as measured by a subjective scoring system completed by both the parent and child. The data from the study are presented below including the average score and the standard deviation (SD). When considering statistical testing, an alpha level of 0.05 is chosen. Which of the following best explains the implications of this alpha level?
Content Reviewers:Rishi Desai, MD, MPH
Let’s say that you’re trying to figure out if a certain medication, Medication A, lowers blood pressure better than the currently prescribed medication, Medication B. So you find 100 people with high blood pressure and give 50 of them Medication A and 50 of them Medication B, and after 6 months see which group has lower mean or average blood pressure.
For this study we would make two hypotheses.
The first hypothesis is called the null hypothesis, and it basically says there’s no difference between two variables that you care about.
For example, our null hypothesis would state that there’s no difference between the mean blood pressure after the 6 month study period, for the group that takes Medication A compared to the mean blood pressure for the group that takes Medication B.
In other words, that there’s no relationship between medication type and blood pressure.
On the other hand, the alternate hypothesis would state that there is a difference between the mean blood pressure for the group that takes Medication A compared to the mean blood pressure for the group that takes Medication B.
Again, in other words, that there is a relationship between medication type and blood pressure.
In theory, there are four possible conclusions that can come from this study, and we can organize them in a 2 by 2 table, where the true relationship between medication and blood pressure is on top, and the study conclusions are on the side.
When a study doesn’t see a relationship between medication and blood pressure, represented here as an arrow with a red cross, and there really isn’t one, then this is called a true negative.
When the study finds that there is a relationship, represented on our table by a green arrow, between medication and blood pressure, and there really is one, then this is a true positive.
Similarly, when the study concludes that there is a relationship between medication and blood pressure but there really is no difference - this is a false positive, also called a type I error.
And lastly when the study concludes that there isn’t a relationship between medication and blood pressure, but there really is - this is a false negative and is also called a type II error.
But this isn’t always the case, because there’s a chance that some other variable - besides the type of medication a person uses - could change their blood pressure.
For example, let’s say that in reality, Medication A doesn’t lower blood pressure better than Medication B. But in our study, we find that Medication A does seem to lower blood pressure better than Medication B, then that would be a type I error.
Maybe this happened because we accidentally chose people in the Medication A group that all started to exercise regularly halfway through the study, so their blood pressure decreased over the 6 months, but not necessarily because of Medication A. In that situation, we ended up having a type I error, because the two groups had different characteristics, simply by chance.
Now, in statistics there’s a threshold for how many type I errors we’re willing to accept in a study. This is called the alpha level or significance level, and usually it’s set at 0.05, which means that researchers are willing to get type I errors 5% of the time.
Once the alpha level has been set, we can use a statistical test to calculate a p-value for our specific data.
For example, let’s say that we use a t-test to see if there’s a difference in the mean blood pressure levels for people that take Medication A or Medication B, and we get a mean difference of 10 points, and a p-value of 0.02.
So going back to our two hypotheses, what does this mean? It means that, if the null hypothesis is true, then the probability of getting a mean difference in blood pressure of 10 points - or higher than 10 points - simply by chance, is about 2%.
In other words, there’s a very small probability - below 5% - that we would’ve gotten a type I error if the null hypothesis is true! And because that probability is less than our alpha level of 5%, we can conclude that, most likely, the null hypothesis is false and the alternate hypothesis is true.
And the alternative hypothesis is that there really is a significant difference in the mean blood pressure for those who took Medication A and those who took Medication B.
So, the alpha level sets the standard for how many type I errors there can be before we can reject the null hypothesis.
And the lower the alpha level, the harder it is to get a false positive result, or Type I error.