# Introduction to biostatistics

### Video information

#### Content Reviewer

#### Voice

#### Editor

#### Last updated

03-25-2020

#### Citation

Osmosis: Introduction to biostatistics. (2020, October 27). Retrieved from (https://www.osmosis.org/learn/Introduction_to_biostatistics).

17,560 views

Evan Debevec-McKenney

Evan Debevec-McKenney

03-25-2020

Osmosis: Introduction to biostatistics. (2020, October 27). Retrieved from (https://www.osmosis.org/learn/Introduction_to_biostatistics).

Contents

Summary

Document search

References

Sampling error occurs when the sample has a variation that does not reflect the variation within the general population. Selection bias occurs when the sample is not selected randomly. Validity means the ability of a test to measure what it is supposed to measure accurately while reliability means the production of similar results on repetition. Independent variables are the variables that induce change in other variables. The Gaussian curve is bell-shaped and has similar mean, median and mode. Non-Gaussian curves can be positively skewed or negatively skewed.

References and External Links

Let’s say you want to figure out if people with high body mass index, or BMI, are at a higher risk of hypertension - or high blood pressure.

Let’s say that you decide to go out and find 100 people with hypertension and 100 people without hypertension and find out the BMI of each person in each group.

You might also collect other information about the individuals in each group, like how old they are, if they smoke cigarettes, or if they drink alcohol, since all of these factors can influence a person’s risk of hypertension.

All of these different pieces of information - called variables - can be put together into a single document or file, called a data set.

A data set usually includes independent variables which are thought to influence or change dependent variables.

In our example, the body mass index would be the independent variable and hypertension would be the dependent variable.

The process of collecting, organizing, and analyzing variables in a data set is called statistics, and when the data were collected from living things - like humans, aardvarks, algae, or bacteria - it’s called biostatistics, bio meaning life.

Now, there are two main types of biostatistics.

The first type is descriptive statistics, which is used to describe or summarize information about each individual variable in the data set.

Descriptive statistics can be used to find the mean - the average number calculated from a particular variable, the median - the middle number in a variable, and the mode - the number that occurs the most in the variable.

The descriptive statistics of each variable can be calculated for the whole sample - all 200 people - or in each group separately - the 100 people in the group with hypertension or the other 100 people in the group without hypertension.

For example, we might find that the mean body mass index of all people in the study is 24.5, or that the mean body mass index is 28 for the group with hypertension and 21 for the group without hypertension.

We can also use descriptive statistics to find the range, variance, or standard deviation, all of which are ways of understanding how the data are spread out or distributed for a given variable.

For example, we might find that the lowest measured body mass index in the group with hypertension is 23, and the highest is 33, so the range for body mass index in this group is 23 to 33.

Typically, descriptive statistics are reported in a graph or a table.

The second type of biostatistics is inferential, which is different from descriptive statistics in two ways.

First, inferential statistics looks at relationships between two or more variables, instead of looking at each individual variable.

For example, we could use inferential statistics to explore the relationship between body mass index and hypertension.

We could categorize body mass index into two groups - above 25, or high, and below 25, or low - and we might find that people with high body mass indices have 3 times the odds of hypertension compared to people with low body mass indices.

Typically, inferential statistics are reported by relative risks, attributable risks, odds ratios, or hazard ratios.

The goal of descriptive statistics is to describe how similar or different the study groups in a particular sample population are to one another.

For example, let’s say we use descriptive statistics to find that 72% of people in the group with hypertension are male, but only 16% of people in the group without hypertension are male.

This is an important finding because men tend to have slightly lower body mass indices than women.

As a result, having more men in the group with hypertension, means that the average body mass index in that group will be lower.

Ultimately, if the descriptive statistics find that the study groups are not very similar, we say that the study has low internal validity, and that the results found by inferential statistics may be the result of differences in the two study groups.

On the other hand, the goal of inferential statistics is to apply the results of the sample population to a target population - which is usually just the general population.

So, inferential statistics is concerned about whether or not the two study groups are similar, as well as whether or not the sample population represents the target population.

Ideally, a study should be done on a sample population of individuals that is similar to that target population in every meaningful way.

For example, if your target population is people from Lagos, Nigeria, then ideally your sample population would include people of ages, races, and socioeconomic statuses that reflect the characteristics of people in Lagos.