Types of data

9,025views

00:00 / 00:00

High Yield Notes

7 pages

Flashcards

Types of data

0 of 9 complete

Questions

USMLE® Step 1 style questions USMLE

0 of 2 complete

USMLE® Step 2 style questions USMLE

0 of 2 complete

A researcher is currently studying the health characteristics of a small Alaskan town, including baseline health metrics, vital signs, and risk factors for common diseases. During this study, the researcher notes that the average systolic blood pressure for the population is 137 mmHg. Which of the following best describes this type of data?  

Transcript

Watch video only

Data are like a set of facts that are measured and recorded, and then summarized to help us make conclusions.

Data can either be quantitative or in numbers, like a person’s age, measured in number of years they’ve been alive or it can be qualitative, which is non-numerical, like someone’s blood type {A, B, AB, or O}.

So, data are classified into two main groups - quantitative or numeric data and qualitative or categorical data.

Let’s start with categorical data, which involves assigning subjects to a category, ie. “red” vs “blue” or “high” vs. “low”.

Categorical data can be further broken down into nominal and ordinal data.

Nominal data is based on categories that cannot be logically ordered.

For example, blood types - A, B, AB and O are nominal data; There is no logical order or magnitude in blood type.

A is not higher than AB, and O is not less than B - they are just different, like apples and oranges.

Now you could say that type AB blood has more antigens than type O blood or that apples are firmer than oranges, but then we’re looking at different data - antigen number and firmness, and not simply the blood type or fruit type.

Other attributes like sex, type of religion, or ethnic background are all examples of nominal data.

These attributes are measured in categories, instead of numbers.

Therefore, they don’t have any magnitude; that’s why when you summarize nominal data, you have to use proportions.

For example, take a group of 20 classmates: 10 are Blood Type A’s, 5 are Blood Type B’s, and 5 are Blood Type O’s.

You can say that 50% are A, 25% are Type B, and 25% are Type O.

And while you can’t calculate a mean or median, you can identify the “mode”, which is the most frequently appearing value of this data: Blood Type A.

Ordinal data are also measured in categories, but unlike nominal data, ordinal data come with a logical order attached.

For example, let’s say you want to measure happiness, and you send 100 people a survey that asks: “How happy are you?”

They can answer 1 of 4 answer choices: “1. Sad”, “2. Not happy” “3. Okay” and “4. Great!”.

Unlike the blood type example, there is a clear logical order here.

In ordinal data, categories are ranked as being higher or lower than one another.

As we go from category 1 to category 4, happiness increases.

And because there is an order to the data, you can calculate the “median”, which is the middle most value in a dataset arranged in order from the highest to the lowest values, and the “mode” but not the “mean” since we can’t clearly state whether the difference between category 1 and category 2 is quantitatively the same as the difference between category 3 and category 4.

One potential problem with ordinal data is that it can sometimes oversimplify relationships between categories.

Summary

Categorical data includes nominal data in which the order does not matter, such as the hair color of a certain population and ordinal data in which the order is important, such as estimating the degree of pain on a scale from one to ten. Numeric data includes interval data, such as the temperature and ratio data, such as the length of a particular group of students. Numeric data can be continuous, i.e., with intermediary values or discrete, i.e., without intermediary values.