1
2
3
4
5
#

R: Chi-Square - Hypothesis Testing


I. Conduct Goodness-of-Fit
A local firewood company, Long-Burn Firewood, sells firewood in bulk. In their advertising, they claim that the following percentage of tree species will be delivered. 
 
Rock Elm Oak Beech Ash
20% 35% 25% 20%

A local customer ordered from Long-Burn Firewood and felt as if their percentages where off based on the count of each firewood type delivered. The customer sorted and then counted the number of pieces of firewood by tree species. His data is listed in the table below:

Rock Elm Oak Beech Ash
134 256 169 183
 
SOLUTION:
The p-value for this problem can be calculated either manually, using a step-by-step process, or automatically using the chisq.test function.
> # ENTER THE DATA INTO R
> obs = c(134, 256, 169, 183)
> null.probs = c(0.20,0.35,0.25,0.20)
> 
> # CALCULATE CHI-SQUARE AND P-VALUE
> chisq.test(x = obs, p=null.probs)

	Chi-squared test for given probabilities

data:  obs 
X-squared = 10.9848, df = 3, p-value = 0.01181
> # ENTER THE DATA INTO R
> obs = c(134, 256, 169, 183)
> null.probs = c(0.20,0.35,0.25,0.20)
> 
> # CALCULATE SUM OF OBSERVED
> sum.obs = sum(obs)
> sum.obs
[1] 742
> 
> # CALCULATE THE EXPECTED COUNTS
> exp = sum.obs*null.probs
> exp
[1] 148.4 259.7 185.5 148.4
> 
> # CALCULATE INDIVIDUAL COMPONENTS OF CHI-SQUARE VALUE
> chi.sq.values = ((obs - exp)^2/exp)
> chi.sq.values
[1] 1.39730458 0.05271467 1.46765499 8.06711590
> 
> # CALCULATE THE CHI-SQUARE VALUE
> chi.sq.val = sum(chi.sq.values)
> chi.sq.val
[1] 10.98479
> 
> # CALCULATE P-VALUE 
> pchisq(q = chi.sq.val, df = 3, lower.tail = FALSE)
[1] 0.01180841

2. Conduct Test of Independence/Homogeneity

A survey was conducted among adults in a large metropolitan area. Adults were randomly chosen and asked about their exercise and coffee consumption habits. Using R and the survey data shown below, perform a chi-squared hypothesis test to determine if there is a relationship between exercise and coffee consumption habits.

  Frequently Exerise Moderate Exercise Never Exercise
Heavy Coffee Drinker 23 31 35
Moderate Coffee Drinker 37 36 24
Never Drink Coffee 48 24 29

 

Lets begin by entering the data into R row by row as shown below.

# ENTER THE DATA BY ROW
row1 = c(23, 31, 35)
row2 = c(37, 36, 24)
row3 = c(48, 24, 29)

Next, let's use the rbind() function to combine the rows to create a single matrix. 

> # USE THE rbind FUNCTION TO BIND THE ROWS
> data.table = rbind(row1, row2, row3)
> 
> data.table
     [,1] [,2] [,3]
row1   23   31   35
row2   37   36   24
row3   48   24   29

Now that we have the data in 1 matrix, lets go ahead and use the chisq.test() function to perform a chi-square hypothesis test.

> chisq.test(data.table)

	Pearson's Chi-squared test

data:  data.table 
X-squared = 12.5119, df = 4, p-value = 0.01392

Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.