Chi Squared Goodness of Fit 


Introduction

The Chi-Squared (X2) Goodness of Fit test is used to compare an observed sample distribution with the expected probability distribution. It is used in situations when there is a single population. There is only a hypothesis test (no confidence interval) for Chi-Squared, and the test is always right sided. 

 

X2 Distribution

  • It is a continuous density curve.
  • There is one parameter (degrees of freedom).
  • All X2 values are greater than zero.
  • For small degrees of freedom (df < 20), the X2 distribution is skewed right.

 

Example  

A state university claims the following about its student body:

Instate Out of state International
81% 11% 8%

 

Using a simple random sample, you gather your own data:

Instate Out of state International
125 34 41

(Total = 200)

Based on the data, is there enough evidence to disprove the university's claims?

 

Step 1: Name Test: Chi-Squared Godness of Fit

Step 2: Define Test:

Ho: The actual population proportions are equal to the stated proportions.

HA: The actual population proportions differ from the stated proportions.  

Step 3: Assume \(H_0\) is true and define its normal distribution. Then check for specific conditions which vary depending on the type of hypothesis test. 

1. Data is draw from a random sample

2. N > 10n

3. All expected counts are at least 5

Step 4: Using the normal distribution, calculate the test statistic and p-value.

Test Statistic:  X2 = Σ \((observed - expected)^2 \over expected \)
Expected = (total) \(\times\) (stated %)
  Instate Out of state International
University Claim 81% 11% 8%
Data from Sample  125 34 41

Expected Count

162 22 16 

 

Solving the test statistic by hand:

X2\((125 - 162)^2 \over 162 \)  +  \((34 - 22)^2 \over 22 \)  + \((41 - 16)^2 \over 16 \)  = 54.059

 

Solving the test statistic with calculator:

L1 = observed, L2 = expected, L3 = (L1 - L2)2 / L2

Put L3 value into 1 Var Stats ⇒ X2 = ΣX 

 

Finding the p-value

In your calculator, go to 2nd VARS and scroll to the X2cdf option. You will enter the following: 

  • The X2 value as a lower limit
  • 999 as the upper limit
  • the degrees of freedom
    • Note: the degrees of freedom = the number of cells \(-\) 1
  • All together, it looks like this: (X2, 999, df) ⇒ p-value

​In this case, you will enter (54.059, 999, 2) and get a p-value of 1.82 \(\times\) 10-12 

Step 5: Analyze your results and determine if they are statistically significant.

The p-value is approximately zero, which is less than the significance level of 0.05. Therefore, we reject the null hypothesis. The data supports the claim that the residency distribution differs from the stated resicency distribution on the website.