Confidence Intervals: Two Proportions (aka Difference of Proportions)


Introduction

Two Proportions confidence intervals are used when... 

  • You are comparing two different populations
  • You have TWO proportions from TWO INDEPENDENT random samples

For 2 Proportions/Difference of Proportions confidence intervals, the point estimate will be the difference between the two sample proportions (\(\hat{p_1} - \hat{p_2}\)).

For example, let's say you want to compare the proportion of boys in your school who have Facebook accounts to the proportion of girls in your school who have Facebook accounts. The boys make up one population, and the girls make up a second. After drawing independent random samples from each population, you will get a proportion for boys (\(\hat{p_B}\)) and a proportion for girls (\(\hat{p_G}\)). You can then create a confidence interval to estimate the difference between the two proportions (\(\hat{p_B} - \hat{p_G}\)). 

 

Example 

A toy company had a history of producing defective toys. When several toy stores complained, the manager of the toy company claimed that he had found the source of the problem and corrected it. However, the toy stores began noticing even more defective toys than before. In an attempt to discover if the manager of the toy company was lying, the owner of a toy store decided to sample toys from a delivery before the manager’s claim, and one after the manager’s claim. He randomly sampled 1000 toys from a delivery before the manager said that the problem was corrected. Of the 1000 toys, he found that 20 were defective.  He then randomly sampled 1000 toys from a delivery after the manager claimed that the problem was corrected. From that sample, 42 were defective. Construct and interpret a 95% confidence interval for the difference in the proportion of defective toys before and after the manager’s claim.

 

Step 1: Name the Confidence Interval: Two Proportions / Difference of Proportions 

Step 2: Check the Conditions

1.  The data is drawn from TWO independent random samples.

2a.  From Sample 1:  \(N_1 > 10n_1\)

2b.  From Sample 2:  \(N_2 > 10n_2\)

3a.  From Sample 1: \(n_1 \hat{p_1} ≥ 10\)  and  \(n_1 \hat{q_1} ≥ 10\)

3b.  From Sample 2: \(n_2 \hat{p_2} ≥ 10\) and  \(n_2 \hat{q_2} ≥ 10\)

 

Step 3: Construct the Interval (Apply the Formula) 

2 Proportions Confidence Interval Formula: \((\widehat{p}_1-\widehat{p}_2) \pm z^{*} {\sqrt{\frac{\widehat{p}_1 (1-\widehat{p}_1)}{n_1}+\frac{\widehat{p}_2 (1-\widehat{p}_2)}{n_2}}}\)

\((\widehat{p}_1-\widehat{p}_2) \pm z^{*} {\sqrt{\frac{\widehat{p}_1 (1-\widehat{p}_1)}{n_1}+\frac{\widehat{p}_2 (1-\widehat{p}_2)}{n_2}}}\)

→  \((0.02 - 0.042) \pm 1.96 {\sqrt{\frac{(0.02) (0.98)}{1000}+\frac{(0.042)(0.958)}{1000}}}\)

Interval: (-0.0372, -0.0068)

 

Note: The critical value was found using a z-table. A small portion of the table is listed below with the part needed for our problem highlighted:

Confidence Level z* Value
 
95% 1.960
99% 2.576

 

Step 4: State the Conclusion

Based on the data, I am 95% confident that the difference in the proportions of defective toys before and after the manager’s claim is between -0.0372 and -0.0068.