t-tests, ANOVA, chi-squared, Fisher exact, Mann-Whitney, Wilcoxon, normality tests, and variance homogeneity tests in kstats-hypothesis.
kstats-hypothesis provides statistical tests organized by the question they answer. Most functions return a TestResult; oneWayAnova() returns an AnovaResult with the full ANOVA table.
Every test produces a result with a consistent shape. The statistic is the computed test value, pValue is the probability of observing a result at least as extreme under the null hypothesis, and isSignificant() compares the p-value to a threshold.
isSignificant() defaults to α=0.05. Pass a different threshold explicitly: result.isSignificant(alpha = 0.01).
Set alternative explicitly for one-sided tests. The default is Alternative.TWO_SIDED. Using Alternative.GREATER tests whether the sample mean exceeds the reference value; Alternative.LESS tests whether it falls below.
Is my sample mean different from a reference value?
The one-sample t-test compares the mean of a single sample against a known or hypothesized value.
val sample = doubleArrayOf(5.1, 4.9, 5.3, 5.0, 4.8)// Two-sided: is the mean different from 5.0?val two = tTest(sample, mu = 5.0)two.statistic // t valuetwo.pValue // two-sided p-value// One-sided: is the mean greater than 5.0?val one = tTest(sample, mu = 5.0, alternative = Alternative.GREATER)one.pValue // one-sided p-value
Use Alternative.GREATER or Alternative.LESS for directional hypotheses instead of halving a two-sided p-value. The confidence interval adjusts accordingly.
equalVariances is false by default, meaning Welch’s t-test is used. Set equalVariances = true only after confirming equal variances with leveneTest() or bartlettTest().
Compares two related measurements (before/after, left/right) on the same subjects.
val before = doubleArrayOf(200.0, 190.0, 210.0, 180.0, 195.0)val after = doubleArrayOf(190.0, 180.0, 195.0, 170.0, 185.0)val result = pairedTTest(before, after)result.statistic // positive t (before > after)result.pValue // p-value for the differenceresult.isSignificant() // true if the change is significant
ANOVA assumes normality within each group and equal variances across groups. Check normality with shapiroWilkTest() and equal variances with leveneTest() or bartlettTest() before running ANOVA.
Four normality tests are available. Each returns a TestResult — a significant result (low p-value) indicates evidence against normality.
val sample = doubleArrayOf( -1.2, -0.5, 0.0, 0.5, 1.2, 0.3, -0.1, 0.8, -0.4, 0.6, -0.8, 0.2, 0.9, -0.3, 0.4, -0.6, 1.0, -0.9, 0.1, 0.7)shapiroWilkTest(sample).pValue // high p → consistent with normalityandersonDarlingTest(sample).pValue // high p → consistent with normalitydagostinoPearsonTest(sample).pValue // combines skewness and kurtosis testsjarqueBeraTest(sample).pValue // asymptotic test based on skewness and kurtosis
Shapiro-Wilk is the most common choice and generally the most powerful for small-to-moderate samples (n < 5000). Anderson-Darling is more sensitive to deviations in the tails. D’Agostino-Pearson and Jarque-Bera are faster for large samples.
Tests whether the observed proportion of successes matches a hypothesized probability.
val result = binomialTest(successes = 60, trials = 100, probability = 0.5)result.pValue // p-value for H₀: p = 0.5result.isSignificant() // true if 60/100 is significantly different from 0.5
Variance homogeneity is an assumption of ANOVA and some t-test variants. Three tests are available:
val g1 = doubleArrayOf(1.0, 2.0, 3.0, 4.0, 5.0)val g2 = doubleArrayOf(2.0, 4.0, 6.0, 8.0, 10.0)val g3 = doubleArrayOf(1.0, 3.0, 5.0, 7.0, 9.0)leveneTest(g1, g2, g3).pValue // robust to non-normalitybartlettTest(g1, g2, g3).pValue // most powerful when normality holdsflignerKilleenTest(g1, g2, g3).pValue // rank-based, robust to outliers
Levene’s test is the safest default — it is robust to non-normality. Use Bartlett’s test only when the data within each group is approximately normal (Bartlett is more powerful in that case). Fligner-Killeen is the most robust to outliers.
Grubbs’ test (the extreme studentized deviate test) formally checks whether the observation farthest from the mean is an outlier, assuming the remaining data is approximately normal. The test statistic isG=smaxi∣xi−xˉ∣,converted to a Student-t statistic on N−2 degrees of freedom and Bonferroni-corrected for having tested every observation.
// Response times (ms) with a suspected outlierval latencies = doubleArrayOf(12.0, 14.0, 11.0, 13.0, 15.0, 98.0, 12.0)val result = grubbsTest(latencies)result.statistic // G statisticresult.pValue // Bonferroni-corrected p-valueresult.additionalInfo["outlierIndex"] // index of the suspected outlierresult.additionalInfo["outlierValue"] // the suspected outlier's valueresult.isSignificant() // true if outlier is significant at α = 0.05
Use Alternative.GREATER or Alternative.LESS to test a single tail when you only care about a suspiciously large or small value:
// Only test for a suspiciously large value (upper tail)val data = doubleArrayOf(2.1, 2.5, 2.3, 2.8, 10.0, 2.4, 2.2)val upper = grubbsTest(data, alternative = Alternative.GREATER)upper.additionalInfo["outlierValue"] // 10.0 — the maximum// Only test for a suspiciously small value (lower tail)val dataLow = doubleArrayOf(2.1, 2.5, 2.3, 2.8, -5.0, 2.4, 2.2)val lower = grubbsTest(dataLow, alternative = Alternative.LESS)lower.additionalInfo["outlierValue"] // -5.0 — the minimum
For multiple outliers, grubbsTestIterative() reapplies the test and removes one significant outlier at a time until none remain or the sample shrinks below three observations.
// Remove multiple outliers by repeatedly applying the testval data = doubleArrayOf(10.0, 11.0, 12.0, 13.0, 14.0, 80.0, 90.0)val cleaned = grubbsTestIterative(data, alpha = 0.05)cleaned.outlierIndices // indices (in the original array) that were removedcleaned.cleanedData // observations after removing all detected outlierscleaned.iterations // TestResult from each round (last one is non-significant)
Grubbs’ test assumes the data is approximately normal apart from the outlier. Validate with shapiroWilkTest() on the data with the suspected extreme removed before reporting a significant result.
The iterative procedure can mask outliers when several extremes cluster together — each test may be diluted by its peers. For large clusters prefer a dedicated multiple-outlier test (e.g. generalized ESD) or a robust estimator.