Match your data to the right probability distribution using domain-driven examples and verification.
Kotlin Notebook
Try this guide as a Kotlin Notebook with Kandy visualizations — run the cells to see charts and explore the data interactively.
Every distribution encodes assumptions about what values are possible and how likely they are. This guide groups distributions by the kind of data they model and shows how to verify the fit.
// API response times — often right-skewed with a long tailval responseTime = LogNormalDistribution(mu = 4.5, sigma = 0.8)responseTime.mean // expected average in msresponseTime.quantile(0.95) // P95 latencyresponseTime.quantile(0.99) // P99 latencyresponseTime.cdf(200.0) // probability of responding under 200ms
// Time between server errors — memoryless waiting processval timeBetweenErrors = ExponentialDistribution(rate = 0.5)timeBetweenErrors.mean // average time between errorstimeBetweenErrors.sf(10.0) // probability of waiting longer than 10 unitstimeBetweenErrors.quantile(0.5) // median waiting time
// Hardware component lifetime — models wear-out (shape > 1) or burn-in (shape < 1)val componentLifetime = WeibullDistribution(shape = 2.0, scale = 5000.0)componentLifetime.quantile(0.1) // 10% of components fail before this timecomponentLifetime.cdf(3000.0) // probability of failure before 3000 hourscomponentLifetime.sf(4000.0) // probability of surviving past 4000 hours
// Errors per hour on a production serverval errorsPerHour = PoissonDistribution(rate = 3.2)errorsPerHour.pmf(0) // probability of zero errorserrorsPerHour.cdf(5) // probability of at most 5 errorserrorsPerHour.quantileInt(0.99) // error count exceeded only 1% of the time
// Conversions out of 1000 page views with 3.5% conversion rateval conversions = BinomialDistribution(trials = 1000, probability = 0.035)conversions.mean // expected number of conversionsconversions.pmf(40) // probability of exactly 40 conversionsconversions.sf(50) // probability of more than 50 conversions
// Support tickets until 5th resolution (variance > mean)val tickets = NegativeBinomialDistribution(successes = 5, probability = 0.1)tickets.mean // expected ticket counttickets.variance // larger than mean — overdispersion
// User session duration in minutes (roughly symmetric)val sessionDuration = NormalDistribution(mu = 12.5, sigma = 3.2)sessionDuration.cdf(15.0) // probability of session under 15 minsessionDuration.quantile(0.95) // 95th percentile// Small-sample estimate — heavier tails give more conservative intervalsval smallSampleEstimate = StudentTDistribution(degreesOfFreedom = 8.0)smallSampleEstimate.quantile(0.975) // critical value for 95% CI
After choosing a distribution, compare it against observed data using the Kolmogorov-Smirnov test.
val processingTimesMs = doubleArrayOf( 45.2, 51.8, 48.1, 52.3, 47.6, 49.9, 53.1, 46.5, 50.7, 48.8, 51.2, 47.3, 49.1, 52.8, 46.9, 50.3, 48.5, 51.6, 47.8, 49.4)// Fit a Normal from sample statisticsval fitted = NormalDistribution( mu = processingTimesMs.mean(), sigma = processingTimesMs.standardDeviation())val ks = kolmogorovSmirnovTest(processingTimesMs, fitted)ks.statistic // KS statistic — smaller means better fitks.pValue // high p-value means data does not contradict the distribution
A non-significant KS test does not prove the distribution is correct — it means the data does not strongly contradict that choice.