Content Reviewers:Rishi Desai, MD, MPH
Selection bias is a type of bias or error that can occur when researchers choose who will be included in a study.
Studies with selection bias might end up having results that can’t be applied to the population outside the study - so lacking external validity.
They may also result in an inaccurate representation of the relationship between an exposure and an outcome - so lacking internal validity.
Typically, the goal of a study is to figure out if an exposure is associated with an outcome in a target population.
So ideally a study should be done on a sample population of individuals that is similar to that target population in every meaningful way, which would give the study high external validity.
For example, if you want to figure out how smoking impacts the risk of lung cancer in Portland, Oregon, then people living in Portland are your target population.
Ideally, your sample population would include individuals from Portland.
And in addition, the sample population should include people of ages, races, and socioeconomic statuses that reflect the target population as well, because these are all factors that are likely to affect the risk of lung cancer.
If your study only recruits students from one of the local high schools, then your sample population probably won’t represent your target population, since the average age in your study will be younger than the average age in Portland, which is 36 years old.
Now, to make the sample population represent the target population, one tool that can be used is randomization, meaning that individuals get selected to enter the study through a process of chance.
To show how that works, let’s say the researchers put the names of every person in Portland into a brown paper bag, which would have to be pretty big, since there would be over 600,000 names in that bag - probably with a number of repeats.
Then let’s say that you choose a thousand names out of the bag to include in the study - either by simply picking them or by using a computer program to make sure that it’s truly by chance.
That’s randomization. Using randomization, there’s a pretty high chance that the sample population and that target population will be similar, and that the study has high external validity, meaning that any conclusions made about the sample population can be applied to the target population.
Sometimes, even when a population is randomly selected, selection bias can still decrease a study’s external validity.
For example, perhaps you decide to randomly choose your sample population from a list of all the house addresses in Portland, or from a list of all the phone numbers in Portland.
In this situation, there’s a high chance of sampling bias, which is a type of selection bias.
That’s where some individuals in the target population might have a lower chance or no chance of being selected to join the sample population, because there are some people living in the city that don’t have a fixed address or phone number.
Now, if the people who don’t have a fixed address or phone number have a lower socioeconomic status than those that have a fixed address, then the average income in your study will be higher than the average income of everyone in Portland.
Another common example of sampling bias happens when researchers have the correct address or phone number, but they simply can’t reach that person.
For example, let’s say researchers have a list of names of people in Portland with lung cancer and a list of names of people in Portland without lung cancer, and they want to ask each person about their smoking status in the past ten years.
The researchers decide to call each person on the list between 5pm and 9pm on Wednesdays, since most people are home from work at that time.
But this list excludes people that work during the evenings, like people that work night-shifts like nurses and police officers or have to work multiple jobs to make ends meet.
To avoid sampling bias, researchers can make phone calls at different times during the day and on different days of the week.
In addition, researchers can try to use multiple modes of contact, like emailing or texting the individual or going to their home to see them in person.
Another type of selection bias is called non-response bias - and it’s particularly problematic in studies that take time or effort on the part of the participant - like a survey.
In general, younger people, females, white people, and people with higher education, and higher socioeconomic statuses are most likely to respond to a survey.
Oftentimes, people choose to not complete a survey because they think it will take up too much time or simply because they don’t like answering questions.
So for these types of studies, researchers sometimes use incentives like money or free food to motivate people to participate in an effort to ensure that the sample population accurately reflects the target population.
Now, let’s switch gears and talk about how selection bias can influence a study’s internal validity - or a study’s quality.
Ultimately, to draw conclusions from a study, the key is to make sure that the two groups - for example, the individuals with lung cancer and the individuals without lung cancer - have similar baseline characteristics to one another.
That way the only key difference is the exposure that we’re trying to study, the exposure to smoking.