Logistic regression

00:00 / 00:00

High Yield Notes

6 pages

Flashcards

Logistic regression

of complete

Questions

USMLE® Step 1 style questions USMLE

of complete

USMLE® Step 2 style questions USMLE

of complete

A researcher is studying the effects of alcohol use on the risk of developing esophageal varices. As part of the study, 100 random participants are asked to report the number of alcoholic beverages they consume per day and whether they have been diagnosed with esophageal varices. Study participants consumed anywhere from 0 to 20 alcoholic beverages daily. A logistic regression is applied and the best fit trend curve is defined by the equation below. Which of the following conclusions can be most appropriately drawn from the study results?  

External Links

Transcript

Watch video only

Content Reviewers

Logistic regression is a type of statistical method that’s used to describe the relationship between an outcome variable and one or more exposure variables.

In logistic regression, the outcome variable is always categorical, and the exposure variables can be either categorical or quantitative.

For example, let’s say you want to figure out if smoking more cigarettes increases the chance of having a heart attack. In this case, the number of cigarettes is a quantitative exposure and whether or not a person has a heart attack is a categorical outcome.

Now, to figure this out, you might ask 200 people how many cigarettes they smoke in a day, and then follow that group of people for five years and see who has a heart attack and who doesn’t.

You could organize your data in a table like this—where the first column, or variable, is the number of cigarettes a person smokes, the second column is if they had a heart attack or not, and the rest of the columns are other characteristics, or variables, that you collected about each person, like their age, sex, and body mass index, or BMI.

Usually, for binary variables, like yes or no, we use the numbers zero and 1 to represent the two possible answers.

So, for the heart attack variable, we might say that zero represents “no” and 1 represents “yes”. We could do the same thing for sex, where zero represents females and 1 represents males.

Now, let’s just look at the first two variables, so how many cigarettes they smoke and if they had a heart attack or not. You could plot these measurements, or data points, on a scatterplot, with the number of cigarettes on the x-axis, and heart attack on the y-axis, and where each data point represents one individual.

This scatterplot might seem a little funny looking, and that’s because all of the data points are clustered on two points on the y-axis—they’re either on the zero, which represents no, or the 1, which represents yes.

This scatterplot can help us figure out how the odds of having a heart attack changes for people as they smoke more and more cigarettes.

And that’s the goal of logistic regression.

Now, in statistics, probability and odds are often confused with one another, so let’s break down the difference.

The probability is the number of times an outcome happened divided by the number of times the outcome could have happened, and it’s often represented by a capital P.

So, using our data, we could figure out the probability of having a heart attack for each number of cigarettes smoked per day.

Let’s say the range for the number of cigarettes smoked is between zero and 19, so we can break up the scatterplot up into 20 different sections - and it’s 20 sections instead of just 19 because zero is also a section.

Now, to find the probability of having a heart attack in a specific section, we count up the number of people who had heart attacks in that section and divide it by the total number of people in that section.

Summary

Logistic regression is a statistical method used to describe the relationship between an outcome variable and one or more exposure variables. Logistic regression can help to figure out the effect of an exposure variable (e.g. the number of cigarettes per day) on a categorical outcome variable (e.g. Having a heart attack). Note that the outcome variable is always categorical, but the exposure variables can be either categorical or quantitative.

Elsevier

Copyright © 2024 Elsevier, its licensors, and contributors. All rights are reserved, including those for text and data mining, AI training, and similar technologies.

Cookies are used by this site.

USMLE® is a joint program of the Federation of State Medical Boards (FSMB) and the National Board of Medical Examiners (NBME). COMLEX-USA® is a registered trademark of The National Board of Osteopathic Medical Examiners, Inc. NCLEX-RN® is a registered trademark of the National Council of State Boards of Nursing, Inc. Test names and other trademarks are the property of the respective trademark holders. None of the trademark holders are endorsed by nor affiliated with Osmosis or this website.

RELX