Hypothesis Testing: Two Sample Mean (aka Difference of Means)


Two Sample Mean hypothesis tests are used when...

  • You take two independent random samples and compare their sample means.  
    • For example, you could compare average hours of sleep at night between students in high school and students in college. 
  • You want to prove that the sample means are not equal. 

A few symbols need to be defined before we dive in: 

  • \(\mu_1\) and \(\mu_2\) refer to the separate means for each population. 
  • \(\bar{x}_1\) and \(\bar{x}_2\) refer to the means derived from your samples, which you will use to disprove the null. 
  • \(s_1\) and \(s_2\) refer to the standard deviations from each sample.
  • \(n_1\)  and \(n_2\) refer to the sample sizes. 



Two separate tutoring programs offer SAT prep courses: "SAT Ready" and "SAT Prepared." They both claim that their students get the same average SAT score after completing the course, but you disagree. You think that the "SAT Ready" course will give you a higher score, but before you choose, you decide to test your theory. You gather two random samples: the first sample of 32 "SAT Ready" students produces an average SAT score of 1410 and standard deviation of 440; the second sample of 37 "SAT Prepared" students produces an average SAT score of 1380 and standard deviation of 490. Based on the data, conduct a hypothesis test at the significance level of 0.05 to determine whether "SAT Ready" students have a higher average SAT score. 


Step 1: Name Test: 2 Sample Mean / Difference of Means

Step 2Define Test: 

The null hypothesis assumes that the two means are equal to each other (\(\mu_1 = \mu_2\))

With this null hypothesis, the options for the alternative hypothesis are as follows: 

Left-Sided Test Two-Sided Test Right-Sided Test

\(H_{o}\!\!: \mu_{1} = \mu_{2}\)
\(H_{A}\!\!: \mu_{1} < \mu_{2}\)

\(H_{o}\!\!: \mu_{1} = \mu_{2}\)
\(H_{A}\!\!: \mu_{1} \neq \mu_{2}\)

\(H_{o}\!\!: \mu_{1} = \mu_{2}\)
\(H_{A}\!\!: \mu_{1} > \mu_{2}\)

In this case, let's define the "SAT Ready" mean as \(\mu_R\) and the "SAT Prepared" mean as \(\mu_P\)

\(H_0 : \mu_R = \mu_P\)

\(H_A : \mu_R > \mu_P\)


Step 3: Assume \(H_0\) is true and define its normal distribution. Then check the conditions.

1.  The data is from two independent random samples.

2a. From Sample 1:  \(N_1 ≥ 10n_1\)

2b. From Sample 2:  \(N_2 ≥ 10n_2\)

3. Both sampling distributions are approximately normal.


Step 4: Using the normal distribution, calculate the test statistics and p-value.

Although the full formula is  \(t=\frac{(\bar{x}_1\:-\:\bar{x}_2)\: -\: (\mu_{1}\: -\:\mu_{2})}{ \sqrt{\frac{s_1^2}{n_1}+\frac{s_2^2}{n_2}}}\)it can be simplified. Recall that the null is  \(H_0 : \mu_1 = \mu_2\)Thus, \(\mu_1 - \mu_2 = 0\) . This leaves us with the formula below:

Test Statistic (2 Sample Mean):   \(t=\frac{\bar{x}_1-\bar{x}_2}{ \sqrt{\frac{s_1^2}{n_1}+\frac{s_2^2}{n_2}}}\)

Although it is important to know the formula and feel comfortable using it, typically you will use your calculator to find the test statistic and the p-value. Because there are two different sample sizes for the two separate samples, it is difficult to calculate the degrees of freedom (which is necessary for using the tcdf function to find the p-value). The formula for degrees of freedom (\(df=\frac{(\frac{s_1^2}{n_1}+\frac{s_2^2}{n_2})^2}{\frac{1}{n_1-1} (\frac{s_1^2}{n_1})^2 + \frac{1}{n_2-1} (\frac{s_2^2}{n_2})^2}\)) is unnecessary to use. 

Luckily, there is a calculator function that will give you the degrees of freedom, the test statistic, and the p-value! 

Calculator Steps

1. Hit STAT 

2. Scroll right to TESTS

3. Scroll down to 4: 2-SampTTest

4. Input \(\bar{x}_1\), \(s_1\), \(n_1\), \(\bar{x}_2\), \(s_2\), \(n_2\), and the alternative hypothesis (either \(\mu_1 \neq \mu_2\) , \(\mu_1 < \mu_2\) , or \(\mu_1 > \mu_2\)). Hit No for Pooled.

5. Hit Calculate, and voila! 


After going through those steps, you should have gotten 66.89 for the degrees of freedom, 0.268 for the test statistic, and 0.395 for the p-value. 


Step 5: Analyze your results and determine if they are statistically significant. 

We calculated a p-value of 0.395. This p-value is greater than the significance level of 0.05. Therefore, we FAIL to reject the null hypothesis. The data does NOT support the claim that the "SAT Ready" course will produce a higher SAT score than the "SAT Prepared" course.