Chi Squared Goodness of Fit

Introduction

The Chi-Squared (X2) Goodness of Fit test is used to compare an observed sample distribution with the expected probability distribution. It is used in situations when there is a single population. There is only a hypothesis test (no confidence interval) for Chi-Squared, and the test is always right sided.

X2 Distribution

• It is a continuous density curve.
• There is one parameter (degrees of freedom).
• All X2 values are greater than zero.
• For small degrees of freedom (df < 20), the X2 distribution is skewed right.

Example

A state university claims the following about its student body:

 Instate Out of state International 81% 11% 8%

Using a simple random sample, you gather your own data:

 Instate Out of state International 125 34 41

(Total = 200)

Based on the data, is there enough evidence to disprove the university's claims?

Step 1: Name Test: Chi-Squared Godness of Fit

Step 2: Define Test:

Ho: The actual population proportions are equal to the stated proportions.

HA: The actual population proportions differ from the stated proportions.

Step 3: Assume $$H_0$$ is true and define its normal distribution. Then check for specific conditions which vary depending on the type of hypothesis test.

1. Data is draw from a random sample

2. N > 10n

3. All expected counts are at least 5

Step 4: Using the normal distribution, calculate the test statistic and p-value.

Test Statistic:  X2 = Σ $$(observed - expected)^2 \over expected$$
Expected = (total) $$\times$$ (stated %)
 Instate Out of state International University Claim 81% 11% 8% Data from Sample 125 34 41 Expected Count 162 22 16

Solving the test statistic by hand:

X2$$(125 - 162)^2 \over 162$$  +  $$(34 - 22)^2 \over 22$$  + $$(41 - 16)^2 \over 16$$  = 54.059

Solving the test statistic with calculator:

L1 = observed, L2 = expected, L3 = (L1 - L2)2 / L2

Put L3 value into 1 Var Stats ⇒ X2 = ΣX

Finding the p-value

In your calculator, go to 2nd VARS and scroll to the X2cdf option. You will enter the following:

• The X2 value as a lower limit
• 999 as the upper limit
• the degrees of freedom
• Note: the degrees of freedom = the number of cells $$-$$ 1
• All together, it looks like this: (X2, 999, df) ⇒ p-value

​In this case, you will enter (54.059, 999, 2) and get a p-value of 1.82 $$\times$$ 10-12

Step 5: Analyze your results and determine if they are statistically significant.

The p-value is approximately zero, which is less than the significance level of 0.05. Therefore, we reject the null hypothesis. The data supports the claim that the residency distribution differs from the stated resicency distribution on the website.