Percentiles, Quartiles, IQR, & The Five-Number Summary


The Five Number Summary

The Five-Number Summary =  (Minimum, Q1, Median, Q3, Maximum)

 

Quartiles

There are three quartiles which split the data into four parts. The first quartile (Q1) corresponds to the 25th percentile. The second quartile (Q2) corresponds to the 50th percentile and is hence also known as the median. The third quartile (Q3) corresponds to the 75th percentile. 

Example: Given the data set {2, 6, 8, 9, 12, 13, 18, 20, 22, 23, 49}. Identify the first, second, and third quartiles.

Solution:

Step 1: Find median → 13 is the median because there are five data points to the right and left of 13, and thus it splits the data 50/50. 

Step 2: Group the data left and right of the median → {2, 6, 8, 9, 12, 13, 18, 20, 22, 23, 49}

Step 3: Repeat the process of step 1 for each group → 

Group 1: {2, 6, 8, 9, 12} → 8 is the first quartile because there are 2 data points on either side of 8, and thus it divides the first half of data in half.

Group 2: {18, 20, 22, 23, 49→ 22 is the third quartile because there are 2 data points on either side of 22, and thus it divides the second half of data in half. 

Final Answer: Q1 = 8, Q2 = 13, Q3 = 22

 

Applications of Quartiles

A) The Interquartile Range (IQR)

IQR = Q3 - Q1

The IQR is a resistant measure of spread. This means that unlike range, which is the maximum minus the minimum, it is not easily influenced by outliers.

B) Outliers

The IQR can be used to find the upper and lower thresholds of a data set, above and below which numbers are considered outliers.

lower threshold = Q1 - (1.5)(IQR)
upper threshold = Q3 + (1.5)(IQR)

 

Let's continue with the prior example to find the IQR of the data set {2, 6, 8, 9, 12, 13, 18, 20, 22, 23, 49}. 

Step 1: Find Q1 and Q3 → We already found these values above. We know that Q1 = 8 and Q3 = 22. 

Step 2: Find the IQR → IQR = Q3 - Q1 = 22 - 8 = 14

Step 3: Find the thresholds  

Lower: Q1 - (1.5)(IQR) = 8 - (1.5)(14) = -13

Upper: Q3 + (1.5)(IQR) = 22 + (1.5)(14) = 43

Step 4: Evaluate → There are no numbers in the dataset below -13, but there is one number greater than 43. Hence, we can conclude that 49 is an outlier because it is beyond the upper threshold.