TD_ZTest performs a Z-test, for which the distribution of the test statistic under the Null hypothesis can be approximated by normal distribution.
TD_ZTest tests the equality of two means under the assumption that the population variances are known (rarely true). For large samples, sample variances approximate population variances, so TD_ZTest uses sample variances instead of population variances in the test statistic.
Assumptions
- Sample distribution is normal.
- Data is numeric, not categorical.
Test Type
- One-tailed or two-tailed (your choice)
- One-sample or two-sample (your choice)
Use one-sample to test whether the mean of a population is greater than, less than, or not equal to a specific value. TD_ZTest finds the answer by comparing the critical values of the normal distribution at levels of significance (alpha = 0.01, 0.05, 0.10) to the Z-test statistic.
- Unpaired
Computational Method
- A Null hypothesis H0 and an alternative hypothesis H1
- A random sample x1, x2,....xn in the case of a one sample test
- Two random samples x1, x2,....xn and y1, y2,....yn in the case of a two sample test
- A test statistic Zstat
- A level of significance α ϵ {0.10, 0.05, 0.01}
- Compare the sample based Zstat with the percentage point of the normal distribution |ᴢ| or |ᴢ α/2|
- Compute the p-value
- Conclusion
One Sample Z-Tests
Let x1, x2,....xn be a random sample drawn from a population with mean µ and variance σ2. Also, assume that the data follows a normal distribution Ɲ (µ, σ2).
H0; µ ≤ µ0
versus
H1; µ > µ0
or
H0: µ ≥ µ0
versus
H1: µ < µ0
H0: µ = µ0
versus
H1: µ ≠ µ0
The test statistic for testing the above hypotheses is the Z-stat. The validity of the Z-stat is predicated on the assumption that the population variance σ2 is known.
The assumption of known variance is not practical because if the variance is known, then the mean µ is known. So, if the mean µ is known, the test is not required.
However, for large sample sizes (which is common in Big data applications), the sample variance s 2 is approximately equal to the unknown variance σ2. Therefore, a scenario that involves a large sample size validates the application of the Z-statistic.
The z-statistic is calculated as:
where the unknown standard deviation σ is replaced by the sample standard deviation as n → ∞ (sample size is very large). Therefore, the z-statistic is rewritten as:
Two Sample Z tests
The two sample z-test is used for testing equality of means of two populations. Let x1, x2,....xn1 ~ Ɲ (µ1, ) and y1, y2,....yn2 ~ Ɲ (µ2, ) be random samples from two independent populations. The Null hypothesis H0 and the alternative hypothesis H1 respectively for a one-sided lower-tailed test is given as:
H0; µ 1 ≥ µ2
versus
H1; µ1 < µ2
The Null hypothesis is rejected if Zstat < - ᴢ α where α ϵ {0.10, 0.05, 0.01}. Also, note that - ᴢ α is a percentile of the normal distribution with area to its left.
A one-sided upper-tailed test is calculated as:
H0; µ 1 ≤ µ2
versus
H1; µ1 > µ2
The Null hypothesis is rejected if Zstat > ᴢ α with α ϵ {0.10, 0.05, 0.01}. Also, note that ᴢα is a percentile of the normal distribution with (1- α) x 100 area to its left. So, - ᴢ α puts 100xα area to its left.
H0: µ 1 = µ2
versus
H1: µ1 ≠ µ2
The Null hypothesis is rejected if Zstat > ᴢ 1-α/2 or Zstat < -ᴢ α/2 with α ϵ {0.10, 0.05, 0.01}. Also, note that ᴢ 1-α/2 is a percentile of the normal distribution with (1- α/2) x 100 area to its left. So, - ᴢ α puts 100xα area to its left. Note Zstat ~ Ɲ (0,1).