Two-sample tests

A number of classical statistics and tests for comparing two univariate samples, as given in two columns. It is also possible to specify the two groups using a single column of values and an additional Group column. Missing data are disregarded. For mathematical details, see the Past manual.

t test and related tests for equal means

Sample statistics

Means and variances are estimated as described above under Univariate statistics. The 95% confidence interval for the mean is based on the standard error for the estimate of the mean, and the t distribution. Normal distribution is assumed.

The 95% confidence interval for the difference between the means accepts unequal sample sizes. The confidence interval is computed for the larger mean minus the smaller, i.e. the center of the CI should always be positive. The confidence interval for the difference in means is also estimated by bootstrapping (simple bootstrap), with the given number of replicates (default 9999).

t test

The t test has null hypothesis

H0: The two samples are taken from populations with equal means.

The t test assumes normal distributions and equal variances.

Unequal variance t test

The unequal variance t test is also known as the Welch test. It can be used as an alternative to the basic t test when variances are very different, although it can be argued that testing for difference in the means in this case is questionable.

Monte Carlo permutation test

The permutation test for equality of means uses the absolute difference in means as test statistic. This is equivalent to using the t statistic. The permutation test is non-parametric with few assumptions, but the two samples are assumed to be equal in distribution if the null hypothesis is true. The number of permutations can be set by the user. The power of the test is limited by the sample size – significance at the p<0.05 level can only be achieved for n>3 in each sample.

Exact permutation test

As the Monte Carlo permutation test, but all possible permutations are computed. Only available if the sum of the two sample sizes is less than 27.

F test for equal variances

The F test has null hypothesis

H0: The two samples are taken from populations with equal variance.

Normal distribution is assumed. The F statistic is the ratio of the larger variance to the smaller. The significance is two-tailed, with n1 and n2 degrees of freedom. Monte Carlo and exact permutation tests on the F statistic are computed as for the t test above.

Mann-Whitney test for equal medians

The two-tailed (Wilcoxon) Mann-Whitney U test can be used to test whether the medians of two independent samples are different. It is a non-parametric test and does not assume normal distribution, but does assume equal-shaped distribution in both groups. The null hypothesis is

H0: The two samples are taken from populations with equal medians. For each value in sample 1, count the number of values in sample 2 that are smaller than it (ties count 0.5). The total of these counts is the test statistic U (sometimes called T). If the value of U is smaller when reversing the order of samples, this value is chosen instead (it can be shown that U1+U2=n1n2).

The program computes an asymptotic approximation to p based on the normal distribution (two-tailed), which is only valid for large n. It includes a continuity correction and a correction for ties.

A Monte Carlo value based on the given number of random permutations (default 9999) is also given – the purpose of this is mainly as a control on the asymptotic value.

For n1+n2<=30 (e.g. 15 values in each group), an exact p value is given, based on all possible group assignments. If available, always use this exact value. For larger samples, the asymptotic approximation is quite accurate.

Mood’s median test for equal medians

The median test is an alternative to the Mann-Whitney test for equal medians. The median test has low power, and the Mann-Whitney test is therefore usually preferable. However, there may be cases with strong outliers where the Mood’s test may perform better.

The test simply counts the number of values in each sample that are above or below the pooled median, producing a 2x2 contingency table that is tested with a standard chi-squared test with two degrees of freedom, without Yate’s correction.

Kolmogorov-Smirnov test for equal distributions

The Kolmogorov-Smirnov test is a nonparametric test for overall equal distribution of two univariate samples. In other words, it does not test specifically for equality of mean, variance or any other parameter. The null hypothesis is H0: The two samples are taken from populations with equal distribution.

In the version of the test provided by Past, both columns must represent samples. You can not test a sample against a theoretical distribution (one-sample test).

The algorithm is based on Press et al. (1992), with significance estimated after Stephens (1970).

The permutation test uses 10,000 permutations. Use the permutation p value for N<30 (or generally).


Press, W.H., Teukolsky, S.A., Vetterling, W.T. & Flannery, B.P. 1992. Numerical Recipes in C. 2nd edition. Cambridge University Press.

Stephens, M.A. 1970. Use of the Kolmogorov-Smirnov, Cramer-von Mises and related statistics without extensive tables. Journal of the Royal Statistical Society, Series B 32:115-122.

Anderson-Darling test for equal distributions

The Anderson-Darling test is a nonparametric test for overall equal distribution of two univariate samples. It is an alternative to the Kolmogorov-Smirnov test.

The test statistic A2N is computed according to Pettitt (1976). This statistic is transformed to a statistic called Z according to Scholz & Stephens (1987). The p value is computed by interpolation and extrapolation in Table 1 (m=1) of Scholz & Stephens (1987), using a curve fit to the von Bertalanffy model. The approximation is fairly accurate for p<0.25. For p>0.25, the p values are estimated using a polynomial fit to values obtained by permutation. A p value based on Monte Carlo permutation with N=999 is also provided.


Pettitt, A.N. 1976. A two-sample Anderson-Darling rank statistic. Biometrika 63:161-168.

Scholz, F.W. & Stephens, M.A. 1987. K-sample Anderson–Darling tests. Journal of the American Statistical Association 82:918–924.

Epps-Singleton test for equal distributions

The Epps-Singleton test (Epps & Singleton 1986; Goerg & Kaiser 2009) is a nonparametric test for overall equal distribution of two univariate samples. It is typically more powerful than the Kolmogorov-Smirnov test, and unlike the Kolmogorov-Smirnov it can be used also for non-continuous (i.e. ordinal) data. The null hypothesis is H0: The two samples are taken from populations with equal distribution.

The mathematics behind the Epps-Singleton test are complicated. The test is based on the Fourier transform of the empirical distribution function, called the empirical characteristic function (ECF). The ECF is generated for each sample and sampled at two points (t1=0.4 and t2=0.8, standardized for the pooled semi-interquartile range). The test statistic W2 is based on the difference between the two sampled ECFs, standardized by their covariance matrices. A small-sample correction to W2 is applied if both sample sizes are less than 25. The p value is based on the chi-squared distribution. For details, see Epps & Singleton (1986) and Goerg & Kaiser (2009).


Epps, T.W. & Singleton, K.J. 1986. An omnibus test for the two-sample problem using the empirical characteristic function. Journal of Statistical Computation and Simulation 26:177–203.

Goerg, S.J. & Kaiser, J. 2009. Nonparametric testing of distributions – the Epps-Singleton two-sample test using the empirical characteristic function. The Stata Journal 9:454-465.

Coefficient of variation (Fligner-Kileen test)

This module tests for equal coefficient of variation in two samples. The coefficient of variation (or relative variation) is defined as the ratio of standard deviation to the mean in percent.

The 95% confidence intervals are estimated by bootstrapping (simple bootstrap), with the given number of replicates (default 9999).

The null hypothesis if the statistical test is:

H0: The samples were taken from populations with the same coefficient of variation.

If the given p(normal) is less than 0.05, equal coefficient of variation can be rejected. Donnelly & Kramer (1999) describe the coefficient of variation and review a number of statistical tests for the comparison of two samples. They recommend the Fligner-Killeen test (Fligner & Killeen 1976), as implemented in Past. This test is both powerful and is relatively insensitive to distribution.

The following statistics are reported:


T The Fligner-Killeen test statistic, which is a sum of transformed ranked positions of the smaller sample within the pooled sample (see Donnelly & Kramer 1999 for details).
E(T) The expected value for T.
z The z statistic, based on T, Var(T) and E(T). Note this is a large-sample approximation.
p The p(H0) value. Both the one-tailed and two-tailed values are given. For the alternative hypothesis of difference in either direction, the two-tailed value should be used. However, the Fligner-Killeen test has been used to compare variation within a sample of fossils with variation within a closely related modern species, to test for multiple fossil species (Donnelly & Kramer 1999). In this case the alternative hypothesis might be that CV is larger in the fossil population, if so then a one-tailed test can be used for increased power.


Donnelly, S.M. & Kramer, A. 1999. Testing for multiple species in fossil samples: An evaluation and comparison of tests for equal relative variation. American Journal of Physical Anthropology 108:507-529.

Fligner, M.A. & Killeen, T.J. 1976. Distribution-free two sample tests for scale. Journal of the American Statistical Association 71:210-213.

Published Aug. 31, 2020 9:56 PM - Last modified Aug. 31, 2020 9:56 PM