A/B Testing — Compare Variants with Statistical Tests
Run a complete A/B test workflow in Kotlin: summarize groups, check assumptions, choose the right test, measure effect size, and correct for multiple comparisons with kstats.
Use this file to discover all available pages before exploring further.
Kotlin Notebook
Try this guide as a Kotlin Notebook with Kandy visualizations — run the cells to see charts and explore the data interactively.
This guide walks through an A/B test comparing two checkout flow variants in a mobile app. The primary metric is session duration (seconds); the secondary metric is number of completed steps.
// Welch's t-test (default: equalVariances = false)val result = tTest(controlDurationSec, treatmentDurationSec)result.statisticresult.pValueresult.confidenceInterval // 95% CI for the difference in meansresult.isSignificant() // true if p < 0.05
If the Levene test confirmed equal variances:
val equalVar = tTest( controlDurationSec, treatmentDurationSec, equalVariances = true)equalVar.pValue
val result = mannWhitneyUTest(controlDurationSec, treatmentDurationSec)result.statisticresult.pValueresult.isSignificant()
A tells you whether a difference exists; effect size tells you how large it is. Cohen’s d expresses the difference in standard-deviation units: |d| < 0.2 negligible, 0.2 small, 0.5 medium, 0.8+ large.
// Cohen's d: how large is the difference in standard-deviation units?val d = cohensD(controlDurationSec, treatmentDurationSec)d // ~2.9 → large effect (|d| ≥ 0.8)
Check whether the two metrics move together within each group.
// Within the treatment group: do faster sessions correlate with more completed steps?val correlation = spearmanCorrelation(treatmentDurationSec, treatmentSteps)correlation.coefficient // negative means shorter sessions correlate with more stepscorrelation.pValue
Spearman correlation is preferred here because one metric (steps) is ordinal.