Methods of regression analysis

2,993views

Methods of regression analysis

Watch later

Watch later

Diabetes mellitus: Pathology review
Osmoregulation
Cranial nerves
Renin-angiotensin-aldosterone system
Light microscopy and staining methods
Glucose-6-phosphate dehydrogenase (G6PD) deficiency
Tobacco use disorder
Introduction to biostatistics
Types of data
Probability
Mean, median, and mode
Range, variance, and standard deviation
Standard error of the mean (Central limit theorem)
Normal distribution and z-scores
Paired t-test
Two-sample t-test
Hypothesis testing: One-tailed and two-tailed tests
One-way ANOVA
Two-way ANOVA
Repeated measures ANOVA
Correlation
Methods of regression analysis
Linear regression
Logistic regression
Spearman's rank correlation coefficient
Mann-Whitney U test
Kappa coefficient
Chi-squared test
Fisher's exact test
Kaplan-Meier survival analysis
Type I and type II errors
Cardiovascular system anatomy and physiology
Coronary circulation
Blood pressure, blood flow, and resistance
Pressures in the cardiovascular system
Measuring cardiac output (Fick principle)
Stroke volume, ejection fraction, and cardiac output
Cardiac contractility
Cardiac preload
Cardiac afterload
Law of Laplace
Cardiac and vascular function curves
Altering cardiac and vascular function curves
Cardiac cycle
Cardiac work
Pressure-volume loops
Changes in pressure-volume loops
Frank-Starling relationship
Microcirculation and Starling forces
Abnormal heart sounds
Normal heart sounds
HIV (AIDS)
Integrase and entry inhibitors
Nucleoside reverse transcriptase inhibitors (NRTIs)
Protease inhibitors
Hepatitis medications
Non-nucleoside reverse transcriptase inhibitors (NNRTIs)
Neuraminidase inhibitors
Herpesvirus medications
Diarrhea: Clinical
Celiac disease
Ketone body metabolism
Pediatric allergies: Clinical
Phenylketonuria (NORD)
Antituberculosis medications
Diabetes mellitus
Insulins
Hypertension
Hypertension: Clinical
Type III hypersensitivity
Type IV hypersensitivity
Type I hypersensitivity
Type II hypersensitivity
Poliovirus
Gastrointestinal hormones
Cell cycle
Osteoarthritis
Pediatric brain tumors
Adult brain tumors
Pediatric bone tumors: Clinical
Bone tumors: Pathology review
Inflammatory bowel disease: Clinical
Cholinergic receptors
Adrenergic receptors
Cholinomimetics: Direct agonists
Cholinomimetics: Indirect agonists (anticholinesterases)
Muscarinic antagonists
Sympathomimetics: Direct agonists
Sympatholytics: Alpha-2 agonists
Adrenergic antagonists: Presynaptic
Adrenergic antagonists: Alpha blockers
Adrenergic antagonists: Beta blockers
Sexually transmitted infections: Clinical
Cell wall synthesis inhibitors: Penicillins
Lung volumes and capacities
Gas exchange in the lungs, blood and tissues
Clostridium botulinum (Botulism)
Dyslipidemias: Pathology review
Lactose intolerance
Glucagon
Cystic fibrosis: Pathology review
MHC class I and MHC class II molecules
Fetal circulation
Hypokalemia: Clinical
Hyperkalemia: Clinical
Anatomy and physiology of the male reproductive system
Anatomy of the male reproductive organs of the pelvis
Anatomy and physiology of the female reproductive system
Anatomy of the female urogenital triangle
Vaginal and vulvar disorders: Pathology review
Iron deficiency anemia
Appendicitis: Clinical
Hyperthyroidism: Pathology review
Hunger and satiety
Thyroid cancer
Syndrome of inappropriate antidiuretic hormone secretion (SIADH)
Autoimmune polyglandular syndrome type 1 (NORD)
Multiple endocrine neoplasia
Multiple endocrine neoplasia: Pathology review
Selective serotonin reuptake inhibitors
Serotonin and norepinephrine reuptake inhibitors
Tricyclic antidepressants
Monoamine oxidase inhibitors
Atypical antidepressants
Typical antipsychotics
Atypical antipsychotics
Lithium
Nonbenzodiazepine anticonvulsants
Anticonvulsants and anxiolytics: Barbiturates
Anticonvulsants and anxiolytics: Benzodiazepines
Psychomotor stimulants
Anticoagulants: Heparin
Anticoagulants: Warfarin
Anticoagulants: Direct factor inhibitors
Antiplatelet medications
Thrombolytics
Nervous system anatomy and physiology
Blood brain barrier
Ascending and descending spinal tracts
Pyramidal and extrapyramidal tracts
Dementia: Pathology review
Muscular dystrophies and mitochondrial myopathies: Pathology review
Hidradenitis suppurativa
Viral hepatitis: Clinical
Cauda equina syndrome
Cervical cancer
Skin cancer
Gastric cancer
Lung cancer
Colorectal cancer
Pancreatic cancer
Skin cancer: Clinical
Breast cancer: Clinical
Cytokines
Intracerebral hemorrhage
Amino acid metabolism
Citric acid cycle
DNA mutations
Rotator cuff tear
Compartment syndrome
Anatomy of the knee joint
Acute intermittent porphyria
Primary sclerosing cholangitis
Primary biliary cholangitis
Drug misuse, intoxication and withdrawal: Alcohol: Pathology review
Substance misuse and addiction: Clinical
Gene regulation
General anesthetics
Retinopathy of prematurity
Erythema multiforme
Papulosquamous skin disorders: Clinical
Psoriasis
DNA damage and repair
Attention deficit hyperactivity disorder
Glycogen storage disorders: Pathology review
Coronary steal syndrome
Anatomy of the coronary circulation
Coronary artery disease: Clinical
ECG cardiac infarction and ischemia
Local anesthetics
Chest trauma: Clinical
Polycystic ovary syndrome
Pediatric vomiting: Clinical
Pediatric ophthalmological conditions: Clinical
BRUE, ALTE, and SIDS: Clinical
Pediatric orthopedic conditions: Clinical
Congenital heart defects: Clinical
Neonatal jaundice: Clinical
Congenital adrenal hyperplasia: Clinical
Thyroid nodules and thyroid cancer: Clinical
Hypothyroidism and thyroiditis: Clinical
Ectoderm
Endoderm
Mesoderm
Breast cancer
Amyloidosis
Coronary artery disease: Pathology review
Introduction to the immune system
Contracting the immune response and peripheral tolerance
Innate immune system
Viral structure and functions
Bone histology
Bone remodeling and repair
Vessels and nerves of the hand
Jaundice: Clinical
Neonatal ICU conditions: Clinical
Jaundice: Pathology review
Stroke: Clinical
Transcription of DNA
Lac operon
Oncogenes and tumor suppressor genes
Epigenetics
Dizziness and vertigo: Clinical
ECG axis
ECG basics
ECG intervals
ECG QRS transition
ECG normal sinus rhythm
ECG rate and rhythm
ECG cardiac hypertrophy and enlargement
Carcinoid syndrome
Cushing syndrome and Cushing disease: Pathology review
Lung cancer and mesothelioma: Pathology review
Lung cancer: Clinical
Imaging features of COVID-19 (LifeBridge Health)
Development of the COVID-19 vaccine
Standards of care for COVID-19 patients
Safety of the COVID-19 vaccines
COVID-19 mutant variants and herd immunity
COVID-19 vaccines: What healthcare providers need to know
Mitosis and meiosis
Amino acids and protein folding
Neurofibromatosis
Drug administration and dosing regimens
Neuron action potential
Gestational trophoblastic disease: Clinical
Physiological changes during exercise
Nitrogen and urea cycle
Fatty acid synthesis
Electron transport chain and oxidative phosphorylation
Cellular structure and function
Carbohydrates and sugars
Glycolysis
Rheumatoid arthritis
Systemic lupus erythematosus
Ischemic stroke
Anatomy of the heart
Headaches: Pathology review
Herpes simplex virus
Neurocutaneous disorders: Pathology review
Temporomandibular joint dysfunction
Pituitary tumors: Pathology review
Anatomy of the blood supply to the brain
Anatomy of the brainstem
Immunodeficiencies: T-cell and B-cell disorders: Pathology review

Transcript

Watch video only

Content Reviewers

There are four basic types of statistical analyses commonly used in epidemiological research, and the analysis you pick depends on two main criteria.

The first criterion is the type of data you have, which can be either individual data or binned data, which is also called group data.

So, for example, let’s say we want to know how many people out of 100 people developed lung cancer the past 5 years.

With individual data, we have information about each person, so we can tell whether or not each of the 100 people developed lung cancer.

So let’s say that 6 people developed lung cancer. If we have individual data, we can look at the individual characteristics for each of those 6 people, like their sex, age, race, or past history of migraines, and we can compare them to the people that didn’t developed lung cancer.

On the other hand, if we have group data, we don’t actually know which specific individuals out of the 100 people developed lung cancer.

So even though we know that 6 people had them, we don’t know which 6 people they were or any of their individual characteristics.

The second criterion is the type of outcome or y-variable you’re measuring, which can be either quantitative, categorical, or time to event.

Quantitative variables have a numeric value, like a person’s forced expiratory volume, which is the total amount of air, in liters, that a person can exhale in a single forced breath.

A very fit person might have an FEV of 5, while a less fit person might have an FEV of 3.

On the other hand, categorical variables have distinct levels.

For example, we could use a categorical variable to characterize if a person was diagnosed with lung cancer in the past five years or if they were not.

And finally, time to event variables describe how long a person was followed before the event or outcome occurred.

For example, if we started following a person at age 50 and they developed lung cancer at age 53, then their time to event would be 3 years.

Now, one of the simplest and most widely used types of analysis is linear regression.

Linear regression uses individual data, and the outcome variable is always quantitative, while the exposure variable can be either categorical or quantitative.

For example, let’s say we want to figure out if there’s an association between the number of cigarettes smoked and FEV, so we ask 100 people how many cigarettes they smoke in a day and then measure each person’s FEV. In this study, the exposure is the number of cigarettes, so it’s quantitative, and the outcome is FEV, which is also quantitative.

Typically, we use statistical software to calculate the linear equation, and the software will provide b0 and b1, which are two numbers we can then plug into the equation y-hat = b0 + b1x1.

Y-hat is the estimated value for the outcome variable, which in this case is FEV, and x1 is the value of the exposure variable, so in this case that’s the number of cigarettes a person smokes.

So let’s say the software gives us a b0 of 4 and a b1 of negative 0.1, so the equation is y-hat equals 4 minus 0.1 times x1.

Now, b1 is the most important number for interpretation because it tells us the effect size, or how much the outcome variable changes for every one-unit increase in the exposure variable.

For example, a b1 of negative 0.1 means that, on average, the FEV will decrease by 0.1 liters per second for every one additional cigarette smoked per day.

One important thing to know is that linear regression can be used in any type of study design as long as the two criteria of individual data and quantitative outcome variable are met.

The next type of statistical analysis is logistic regression. Logistic regression uses individual data, and the outcome variable is always categorical while the exposure variables can be either categorical or quantitative.

For example, let’s say we want to figure out if smoking more cigarettes increases the chance of lung cancer between the ages of 55-64. So, we follow a hundred 55-year-olds that smoke and a hundred 55-year-olds that don’t smoke for 10 years, and compare how many of them develop lung cancer.

In this example, the exposure variable is whether or not a person smokes cigarettes, so it’s categorical; and the outcome variable is whether or not the person develops lung cancer, so it’s also categorical.

And more specifically, because there are only two levels for each variable, they’re called binary categorical variables.

Now, like linear regression, the statistical software will give us b0 and b1, and we can plug them into the same equation of y-hat = b0 + b1x1, but the interpretation of the beta-coefficients are different.

In logistic regression, the beta-coefficients represent the log-odds of the outcome occurring.

For example, let’s say the software gives us a b0 of 0.05 and a b1 of 1.9, so the equation for the line would be y-hat equals 0.05 plus 1.9 times x1.

If we only look at b1, the effect size, it tells us how much the log-odds of the outcome variable changes for the unexposed group, or the non-smokers, versus the exposed group, or the smokers.

So, a b1 of 1.9 means that, on average, the log-odds of developing lung cancer for smokers is 1.9 times the log-odds of developing lung cancer for non-smokers.

Since the log-odds can be a confusing interpretation, we can also convert these numbers to regular odds by exponentiating them by a base of e.

For example, e to the 1.9 equals 6.7, so the odds of developing lung cancer for smokers is 6.7 times the odds of developing lung cancer for non-smokers.

Logistic regression can be used for any type of study, but the interpretation changes slightly depending on the study design.

Our example was a longitudinal cohort study, because we had a group of exposed individuals—those are the ones that smoked—and a group of unexposed individuals—those are the ones that didn’t smoke—and followed them over time.

This type of study design allows you to measure the incidence or the risk, which is the number of new cases that occur over a certain period of time.

Using logistic regression, we then calculate what’s called the risk odds ratio.

On the other hand, logistic regression can also be used in case-control studies, which is where you compare the history of two groups of people—those that have a certain outcome, called cases, and those that don’t have a certain outcome, called controls—to see if they’ve been exposed to different things.

So, for example, we could’ve looked at 100 people that had lung cancer, which would be the cases, and 100 people that don’t have lung cancer, which would be the controls, and then compare how many people in each group smoked cigarettes in the past ten years.

Now, in case-control studies, we can’t measure the incidence, since we’re selecting people that already have the outcome.

Instead, we’re measuring the prevalence, or the number of people that already smoked cigarettes before we started measuring them.

In case-control studies, we can use logistic regression to then calculate the prevalence odds ratio.

Key Takeaways

There are a variety of methods of regression analysis, each with its own strengths and weaknesses. The most commonly used methods are linear regression, logistic regression, and Poisson regression.

Linear regression is used when the data is assumed to be linear in nature. Logistic regression is used when the data is assumed to be binary (e.g., success/failure, yes/no), while Poisson regression is used when the data follows a Poisson distribution, and is used for modeling count data.