Selection bias is a type of bias or error that can occur when researchers choose who will be included in a study.
Studies with selection bias might end up having results that can’t be applied to the population outside the study - so lacking external validity.
They may also result in an inaccurate representation of the relationship between an exposure and an outcome - so lacking internal validity.
Typically, the goal of a study is to figure out if an exposure is associated with an outcome in a target population.
So ideally a study should be done on a sample population of individuals that is similar to that target population in every meaningful way, which would give the study high external validity.
For example, if you want to figure out how smoking impacts the risk of lung cancer in Portland, Oregon, then people living in Portland are your target population.
Ideally, your sample population would include individuals from Portland.
And in addition, the sample population should include people of ages, races, and socioeconomic statuses that reflect the target population as well, because these are all factors that are likely to affect the risk of lung cancer.
If your study only recruits students from one of the local high schools, then your sample population probably won’t represent your target population, since the average age in your study will be younger than the average age in Portland, which is 36 years old.
Now, to make the sample population represent the target population, one tool that can be used is randomization, meaning that individuals get selected to enter the study through a process of chance.
To show how that works, let’s say the researchers put the names of every person in Portland into a brown paper bag, which would have to be pretty big, since there would be over 600,000 names in that bag - probably with a number of repeats.
Then let’s say that you choose a thousand names out of the bag to include in the study - either by simply picking them or by using a computer program to make sure that it’s truly by chance.
That’s randomization. Using randomization, there’s a pretty high chance that the sample population and that target population will be similar, and that the study has high external validity, meaning that any conclusions made about the sample population can be applied to the target population.
Sometimes, even when a population is randomly selected, selection bias can still decrease a study’s external validity.
For example, perhaps you decide to randomly choose your sample population from a list of all the house addresses in Portland, or from a list of all the phone numbers in Portland.
In this situation, there’s a high chance of sampling bias, which is a type of selection bias.