AssessmentsTest precision and accuracy
Test precision and accuracy
The statistic measures the extent by which two observers agree on a test result exceeding that of chance alone.
Content Reviewers:Rishi Desai, MD, MPH
Let’s say you want to figure out if eating more daily servings of vegetables will decrease a person’s body mass index (BMI), which is a number calculated by dividing a person’s weight in kilograms by their height in meters squared.
The first step to figuring this out is to collect data about each person in the study, and this is typically done using some type of measurement tool.
For example, we might use a scale to measure a person’s weight, a measuring rod to measure a person’s height, and design a survey to find out how many daily servings of vegetables a person eats.
Now, it’s important to collect high quality data in a study, which means the information collected in the study should accurately reflect what’s really happening.
For example, if a person eats 5 servings of vegetables per day, the data should reflect that they eat 5 servings, instead of 2 servings.
Data quality is determined by the tools used to collect the information, and ideally, these tools have high validity - or accuracy - and high reliability - or repeatability.
A tool with high validity will provide a measurement that’s very close to the true or known value for the thing being measured.
Let’s say we’re going to measure a woman’s weight using two different scales.
One scale is a family heirloom that was passed down over multiple generations - so it’s pretty old - and the other scale was a gift from your friend who’s a doctor - so it’s really modern and sophisticated.
The old scale provides a measurement of 80 kilograms, and the modern scale provides a very different measurement of 66 kilograms.
In reality, this woman weighs 65 kilograms, so, since the modern scale provides a measurement that is closer to the woman’s true weight, the modern scale has higher validity.
Using tools with high validity is important for getting correct results in descriptive or inferential statistics.
For example, if we used the old scale for all the people in the group with hypertension, but used the new scale for the people in the group without hypertension, then we would think the group with hypertension has a much higher mean body mass index than they really do.
This would lead to an overestimation of the association between body mass index and hypertension.
On the other hand, a tool with high reliability will consistently get the same results, no matter how many times the measurement is repeated.
So, let’s say you measure each person’s weight 3 times in a row on each scale.
On the old scale, the 3 measurements are 80 kilograms, 81 kilograms, and 80 kilograms, and on the modern scale, the 3 measurements are 66 kilograms, 75 kilograms, and 60 kilograms.
Now, even though the modern scale has higher validity, it actually has lower reliability, because the results of the 3 tests were not consistent with each other.