Home Picture

Statistics Summary Part 3: T-Test

29 Jun 2020 |

Categories: Math

Statistics 101

T-Test

The t-test is any statistical hypothesis test in which the test statistic follows a Student’s t-distribution under the null hypothesis.

A t-test is most commonly applied when the test statistic would follow a normal distribution if the value of a scaling term in the test statistic were known. When the scaling term is unknown and is replaced by an estimate based on the data or it has the small samples ($n < 30$), the test statistics (under certain conditions) follow a Student’s t distribution. The t-test can be used, for example, to determine if the means of two sets of data are significantly different from each other.

Student t distribution table; Source

The t-test can be categorized into two:

  1. One-Sample
  2. Two-Sample
    • Paired
    • Unpaired

We are going to discuss each of cases in detail.

(1). One-Sample:

The One-Sample Test tests whether the mean of a normally distributed population is different from a specified value.

The workflow of the Test is as below:
Null Hypothesis ($H_0$): states that the population mean is equal to some value ($\mu_0$)
Alternative Hypothesis ($H_a$): states that the mean does not equal/is greater than/is less than $\mu_0$
t-statistic: standardizes the difference between $\bar{x}$ and $\mu_0$

$ t = \frac{\bar{x} - \mu_0}{ \frac{s}{\sqrt{n}} } $

Degree of freedom(df)=n-1

Formula for continuous data

Read the table of t-distribution critical values for the p-value, probability that the sample mean was obtained by chance given $\mu_0$ is the popluation mean, using the calculated t-statistic and degrees of freedom.

If the p-value is less than the predetermined value for significance, called $\alpha$ and is usually 0.05, reject the null hypothsis and accept the alternative hypothesis.

Example:
You are experiencing hair loss and skin discoloration and think it might be because of selenuim toxicity. You decide to measure the selenium levles in your tap water once a day for one week. Your results are given below. The EPA maximum contaminant level for safe drinking water is 0.05 mg/L. Does the selenium level in your tap water exceed the legal limit (assuming $\alpha = 0.05$)?

Day Selenium mg/L
1 0.051
2 0.0505
3 0.049
4 0.0516
5 0.052
6 0.0508
7 0.0506

Hypothesis:

$H_0: \mu \leq 0.05$

$H_a: \mu > 0.05$

Calculate the mean and standard deviation of the sample:

$\bar{x}=0.0508$

$s^2 = \frac{\Sigma{(x-\bar{x})^2}}{n-1} = \frac{(0.051-0.0508)^2 + (0.0505-0.0508)^2 + etc}{6} = 9.15 \times 10^{-7}$

$s=\sqrt{s^2}=9.56 \times 10^{-4}$

$n=7$

The t-statistics is:

$t = \frac{\bar{x} - \mu_0}{ \frac{s}{\sqrt{n}} } = \frac{0.0508 - 0.05}{ \frac{9.56 \times 10^{-4}}{\sqrt{7}} } = 2.17$

And the degrees of freedom are:

$n-1 = 7-1 = 6$

By looking at the t-distribution table above, 2.17 with 6 degrees of freedom is between $p=0.05$ and $p=0.025$. This means that the p-vale is less than 0.05, so, we can reject $H_0$ and coclude that the selenium level in the tap water exceeds the legal limit.


(2). Two-Sample:

The Two-Sample Test tests whether the means of two population are significantly different from one another. A common application is to test if a new process or treatment is superior to a current process or treatment.

Paired

Each value of one group corresponds directly to a value in the other group. ie: before and after values after drug treatment for each individual patient

Subtract the two values for each individual to get one set of values (the differences) and use $\mu_0=0$ to perform a one-sample t-test.

Unpaired

Assume the two population are independent.

$H_0: \text{means of the two population are equal}$ $H_a: \text{means of the two population are unequal or one is greater than the other}$ Test
$\mu_1 = \mu_2$ $\mu_1 \neq \mu_2$ Two-tail test
$\mu_1 \leq \mu_2$ $\mu_1 > \mu_2$ One-tail test: right-tail
$\mu_1 \geq \mu_2$ $\mu_1 < \mu_2$ One-tail test: left-tail

t-statistic:

$ t = \frac{\bar{x_1}-\bar{x_2}}{\sqrt{\frac{S_1^2}{n_1} + \frac{S_2^2}{n_2} }} $

Degree of freedom(df)= $ (n_1-1) + (n_2-1) = n_1 + n_2 -2$

Formula for continuous data

Read the table of t-distribution critical values for the p-value, probability that the sample mean was obtained by chance given $\mu_0$ is the popluation mean, using the calculated t-statistic and degrees of freedom.

Example:
Consider the lifespan of 18 rats. 12 were fed a restricted calorie diet and lived an average of 700 days (standard deviation = 21 days). The other 6 had unsretricted access to food and lived an average of 668 days (standard deviation = 30 days). Does a restricted calorie diet increase the lifespan of rats (assume $\alpha = 0.05$)?

Control Group 1 Treatment Group
$n_1=12$
$\bar{x_1}=700$
$s_1=21$
$n_2=6$
$\bar{x_2}=668$
$s_2=30$

Hypothesis:

$H_0: \mu_1 \leq \mu_2$

$H_a: \mu_1 > \mu_2$

(Because we are only asking if a restricted calorie diet increases lifespan)

The t-statistics is:

$t = \frac{\bar{x_1}-\bar{x_2}}{\sqrt{\frac{S_1^2}{n_1} + \frac{S_2^2}{n_2} }} = \frac{700-668}{\sqrt{\frac{21^2}{12} + \frac{30^2}{6} }} = 2.342$

And the degrees of freedom are:

$(n_1-1) + (n_2-1) = n_1 + n_2 -2 = 12+6-2 = 16$

From the t-distribution table, the p-value (t-statistic = 2.342 with 16 degrees of freedom) falls between 0.01 and 0.02, so, we reject $H_0$. The restricted calorie diet does increase the lifespan of rats.

Top