Statistics Summary Part 3: T-Test

Statistics 101

T-Test

The t-test is any statistical hypothesis test in which the test statistic follows a Student’s t-distribution under the null hypothesis.

A t-test is most commonly applied when the test statistic would follow a normal distribution if the value of a scaling term in the test statistic were known. When the scaling term is unknown and is replaced by an estimate based on the data or it has the small samples ($n < 30$), the test statistics (under certain conditions) follow a Student’s t distribution. The t-test can be used, for example, to determine if the means of two sets of data are significantly different from each other.

Student t distribution table; Source

The t-test can be categorized into two:

One-Sample
Two-Sample
- Paired
- Unpaired

We are going to discuss each of cases in detail.

(1). One-Sample:

The One-Sample Test tests whether the mean of a normally distributed population is different from a specified value.

The workflow of the Test is as below:
Null Hypothesis ($H_0$): states that the population mean is equal to some value ($\mu_0$)
Alternative Hypothesis ($H_a$): states that the mean does not equal/is greater than/is less than $\mu_0$
t-statistic: standardizes the difference between $\bar{x}$ and $\mu_0$

$ t = \frac{\bar{x} - \mu_0}{ \frac{s}{\sqrt{n}} } $

Degree of freedom(df)=n-1

Formula for continuous data

Read the table of t-distribution critical values for the p-value, probability that the sample mean was obtained by chance given $\mu_0$ is the popluation mean, using the calculated t-statistic and degrees of freedom.

$H_0: \mu \leq \mu_0 \\ H_a: \mu > \mu_0 $
$\rightarrow\ \\ \text{one-tail test on right-tail; p = area right of t}$

$H_0: \mu \geq \mu_0 \\ H_a: \mu < \mu_0 $
$\rightarrow\ \text{one-tail test on left-tail; p = area left of t}$

$H_0: \mu = \mu_0 \\ H_a: \mu \neq \mu_0 $
$\rightarrow\ \text{two-tail test; p = 2 * area past t}$

If the p-value is less than the predetermined value for significance, called $\alpha$ and is usually 0.05, reject the null hypothsis and accept the alternative hypothesis.

Example:
You are experiencing hair loss and skin discoloration and think it might be because of selenuim toxicity. You decide to measure the selenium levles in your tap water once a day for one week. Your results are given below. The EPA maximum contaminant level for safe drinking water is 0.05 mg/L. Does the selenium level in your tap water exceed the legal limit (assuming $\alpha = 0.05$)?

Day	Selenium mg/L
1	0.051
2	0.0505
3	0.049
4	0.0516
5	0.052
6	0.0508
7	0.0506

Hypothesis:

$H_0: \mu \leq 0.05$

$H_a: \mu > 0.05$

Calculate the mean and standard deviation of the sample:

$\bar{x}=0.0508$

$s^2 = \frac{\Sigma{(x-\bar{x})^2}}{n-1} = \frac{(0.051-0.0508)^2 + (0.0505-0.0508)^2 + etc}{6} = 9.15 \times 10^{-7}$

$s=\sqrt{s^2}=9.56 \times 10^{-4}$

$n=7$

The t-statistics is:

$t = \frac{\bar{x} - \mu_0}{ \frac{s}{\sqrt{n}} } = \frac{0.0508 - 0.05}{ \frac{9.56 \times 10^{-4}}{\sqrt{7}} } = 2.17$

And the degrees of freedom are:

$n-1 = 7-1 = 6$

By looking at the t-distribution table above, 2.17 with 6 degrees of freedom is between $p=0.05$ and $p=0.025$. This means that the p-vale is less than 0.05, so, we can reject $H_0$ and coclude that the selenium level in the tap water exceeds the legal limit.

(2). Two-Sample:

The Two-Sample Test tests whether the means of two population are significantly different from one another. A common application is to test if a new process or treatment is superior to a current process or treatment.

Paired

Each value of one group corresponds directly to a value in the other group. ie: before and after values after drug treatment for each individual patient

Subtract the two values for each individual to get one set of values (the differences) and use $\mu_0=0$ to perform a one-sample t-test.

Unpaired

Assume the two population are independent.

$H_0: \text{means of the two population are equal}$	$H_a: \text{means of the two population are unequal or one is greater than the other}$	Test
$\mu_1 = \mu_2$	$\mu_1 \neq \mu_2$	Two-tail test
$\mu_1 \leq \mu_2$	$\mu_1 > \mu_2$	One-tail test: right-tail
$\mu_1 \geq \mu_2$	$\mu_1 < \mu_2$	One-tail test: left-tail

t-statistic:

$ t = \frac{\bar{x_1}-\bar{x_2}}{\sqrt{\frac{S_1^2}{n_1} + \frac{S_2^2}{n_2} }} $

Degree of freedom(df)= $ (n_1-1) + (n_2-1) = n_1 + n_2 -2$

Formula for continuous data

Example:
Consider the lifespan of 18 rats. 12 were fed a restricted calorie diet and lived an average of 700 days (standard deviation = 21 days). The other 6 had unsretricted access to food and lived an average of 668 days (standard deviation = 30 days). Does a restricted calorie diet increase the lifespan of rats (assume $\alpha = 0.05$)?

Control Group 1	Treatment Group
$n_1=12$ $\bar{x_1}=700$ $s_1=21$	$n_2=6$ $\bar{x_2}=668$ $s_2=30$

Hypothesis:

$H_0: \mu_1 \leq \mu_2$

$H_a: \mu_1 > \mu_2$

(Because we are only asking if a restricted calorie diet increases lifespan)

The t-statistics is:

$t = \frac{\bar{x_1}-\bar{x_2}}{\sqrt{\frac{S_1^2}{n_1} + \frac{S_2^2}{n_2} }} = \frac{700-668}{\sqrt{\frac{21^2}{12} + \frac{30^2}{6} }} = 2.342$

And the degrees of freedom are:

$(n_1-1) + (n_2-1) = n_1 + n_2 -2 = 12+6-2 = 16$

From the t-distribution table, the p-value (t-statistic = 2.342 with 16 degrees of freedom) falls between 0.01 and 0.02, so, we reject $H_0$. The restricted calorie diet does increase the lifespan of rats.

Top