Choosing a Distribution

Kotlin Notebook

Try this guide as a Kotlin Notebook with Kandy visualizations — run the cells to see charts and explore the data interactively.

Every distribution encodes assumptions about what values are possible and how likely they are. This guide groups distributions by the kind of data they model and shows how to verify the fit.

Decision Table

Data type	Example domain	Distribution	Constructor
Symmetric measurements around a center	User session duration	`NormalDistribution`	`NormalDistribution(mu, sigma)`
Heavier tails than Normal, small samples	Estimated means with few observations	`StudentTDistribution`	`StudentTDistribution(degreesOfFreedom)`
Positive waiting times or durations	Time between server errors	`ExponentialDistribution`	`ExponentialDistribution(rate)`
Positive durations with wear-out pattern	Hardware component lifetime	`WeibullDistribution`	`WeibullDistribution(shape, scale)`
Right-skewed positive values	API response times	`LogNormalDistribution`	`LogNormalDistribution(mu, sigma)`
Proportions or rates in [0, 1]	Click-through rate	`BetaDistribution`	`BetaDistribution(alpha, beta)`
Event counts per interval	Errors per hour	`PoissonDistribution`	`PoissonDistribution(rate)`
Successes in fixed trials	Conversions out of page views	`BinomialDistribution`	`BinomialDistribution(trials, probability)`
Overdispersed counts	Support tickets per day	`NegativeBinomialDistribution`	`NegativeBinomialDistribution(successes, probability)`
Heavy right tail on positive data	Income distribution, file sizes	`ParetoDistribution`	`ParetoDistribution(shape, scale)`

Durations and Waiting Times

Response times
Time between events
Component lifetime

// API response times — often right-skewed with a long tail
val responseTime = LogNormalDistribution(mu = 4.5, sigma = 0.8)

responseTime.mean              // expected average in ms
responseTime.quantile(0.95)    // P95 latency
responseTime.quantile(0.99)    // P99 latency
responseTime.cdf(200.0)        // probability of responding under 200ms

// Time between server errors — memoryless waiting process
val timeBetweenErrors = ExponentialDistribution(rate = 0.5)

timeBetweenErrors.mean          // average time between errors
timeBetweenErrors.sf(10.0)      // probability of waiting longer than 10 units
timeBetweenErrors.quantile(0.5) // median waiting time

// Hardware component lifetime — models wear-out (shape > 1) or burn-in (shape < 1)
val componentLifetime = WeibullDistribution(shape = 2.0, scale = 5000.0)

componentLifetime.quantile(0.1)  // 10% of components fail before this time
componentLifetime.cdf(3000.0)    // probability of failure before 3000 hours
componentLifetime.sf(4000.0)     // probability of surviving past 4000 hours

Counts and Events

Event counts
Fixed trials
Overdispersed counts

// Errors per hour on a production server
val errorsPerHour = PoissonDistribution(rate = 3.2)

errorsPerHour.pmf(0)            // probability of zero errors
errorsPerHour.cdf(5)            // probability of at most 5 errors
errorsPerHour.quantileInt(0.99) // error count exceeded only 1% of the time

// Conversions out of 1000 page views with 3.5% conversion rate
val conversions = BinomialDistribution(trials = 1000, probability = 0.035)

conversions.mean    // expected number of conversions
conversions.pmf(40) // probability of exactly 40 conversions
conversions.sf(50)  // probability of more than 50 conversions

// Support tickets until 5th resolution (variance > mean)
val tickets = NegativeBinomialDistribution(successes = 5, probability = 0.1)

tickets.mean     // expected ticket count
tickets.variance // larger than mean — overdispersion

Proportions and Rates

// Click-through rate estimated from 120 clicks in 4000 impressions
val ctr = BetaDistribution(alpha = 120.0, beta = 3880.0)

ctr.mean            // point estimate of CTR
ctr.quantile(0.025) // lower bound of 95% credible interval
ctr.quantile(0.975) // upper bound
ctr.cdf(0.035)      // probability that true CTR is below 3.5%

General-Purpose Symmetric

// User session duration in minutes (roughly symmetric)
val sessionDuration = NormalDistribution(mu = 12.5, sigma = 3.2)

sessionDuration.cdf(15.0)      // probability of session under 15 min
sessionDuration.quantile(0.95) // 95th percentile

// Small-sample estimate — heavier tails give more conservative intervals
val smallSampleEstimate = StudentTDistribution(degreesOfFreedom = 8.0)
smallSampleEstimate.quantile(0.975) // critical value for 95% CI

Verifying the Fit

After choosing a distribution, compare it against observed data using the Kolmogorov-Smirnov test.

val processingTimesMs = doubleArrayOf(
    45.2, 51.8, 48.1, 52.3, 47.6, 49.9, 53.1, 46.5, 50.7, 48.8,
    51.2, 47.3, 49.1, 52.8, 46.9, 50.3, 48.5, 51.6, 47.8, 49.4
)

// Fit a Normal from sample statistics
val fitted = NormalDistribution(
    mu = processingTimesMs.mean(),
    sigma = processingTimesMs.standardDeviation()
)

val ks = kolmogorovSmirnovTest(processingTimesMs, fitted)
ks.statistic // KS statistic — smaller means better fit
ks.pValue    // high p-value means data does not contradict the distribution

A non-significant KS test does not prove the distribution is correct — it means the data does not strongly contradict that choice.

How-To Guides

Tutorials

Choosing a Distribution

Kotlin Notebook

Decision Table

Durations and Waiting Times

Counts and Events

Proportions and Rates

General-Purpose Symmetric

Verifying the Fit

How-To Guides

Tutorials

Kotlin Notebook

​Decision Table

​Durations and Waiting Times

​Counts and Events

​Proportions and Rates

​General-Purpose Symmetric

​Verifying the Fit

Decision Table

Durations and Waiting Times

Counts and Events

Proportions and Rates

General-Purpose Symmetric

Verifying the Fit