# Types of data

Videos

Notes

## Biostatistics and epidemiology

#### Biostatistics

Introduction to biostatistics
Statistical probability distributions
Parametric tests
Non-parametric tests
Outcomes of statistical hypothesis testing

0 / 9 complete
High Yield Notes
7 pages
Flashcards

### Types of data

9 flashcards
Preview

Assessing the (mean/median/mode) is the only method for estimating the central tendency of nominal data.

Transcript

#### Contributors:

Data are like a set of facts that are measured and recorded, and then summarized to help us make conclusions.

Data can either be quantitative or in numbers, like a person’s age, measured in number of years they’ve been alive or it can be qualitative, which is non-numerical, like someone’s blood type {A, B, AB, or O}.

So, data are classified into two main groups - quantitative or numeric data and qualitative or categorical data.

Let’s start with categorical data, which involves assigning subjects to a category, ie. “red” vs “blue” or “high” vs. “low”.

Categorical data can be further broken down into nominal and ordinal data.

Nominal data is based on categories that cannot be logically ordered.

For example, blood types - A, B, AB and O are nominal data; There is no logical order or magnitude in blood type.

A is not higher than AB, and O is not less than B - they are just different, like apples and oranges.

Now you could say that type AB blood has more antigens than type O blood or that apples are firmer than oranges, but then we’re looking at different data - antigen number and firmness, and not simply the blood type or fruit type.

Other attributes like sex, type of religion, or ethnic background are all examples of nominal data.

These attributes are measured in categories, instead of numbers.

Therefore, they don’t have any magnitude; that’s why when you summarize nominal data, you have to use proportions.

For example, take a group of 20 classmates: 10 are Blood Type A’s, 5 are Blood Type B’s, and 5 are Blood Type O’s.

You can say that 50% are A, 25% are Type B, and 25% are Type O.

And while you can’t calculate a mean or median, you can identify the “mode”, which is the most frequently appearing value of this data: Blood Type A.

Ordinal data are also measured in categories, but unlike nominal data, ordinal data come with a logical order attached.

For example, let’s say you want to measure happiness, and you send 100 people a survey that asks: “How happy are you?”

They can answer 1 of 4 answer choices: “1. Sad”, “2. Not happy” “3. Okay” and “4. Great!”.

Unlike the blood type example, there is a clear logical order here.

In ordinal data, categories are ranked as being higher or lower than one another.

As we go from category 1 to category 4, happiness increases.

And because there is an order to the data, you can calculate the “median”, which is the middle most value in a dataset arranged in order from the highest to the lowest values, and the “mode” but not the “mean” since we can’t clearly state whether the difference between category 1 and category 2 is quantitatively the same as the difference between category 3 and category 4.

One potential problem with ordinal data is that it can sometimes oversimplify relationships between categories.

The jump from category 1 to category 2 may be very small, whereas the jump from category 3 to category 4 may be quite large.

Ordinal data are blind to this nuance, and treat the differences in categories as if they were all the same.

In medicine, disease severity is often recorded as ordinal data.

For example, chronic kidney disease is put into 5 stages {Stage 1, Stage 2, Stage 3, Stage 4, and Stage 5}.