Skip to main content
kstats-core provides descriptive statistics as extension functions on DoubleArray and Iterable<Double>. It also includes Int and Long overloads for the most common operations.

Summary Snapshot

describe() is the fastest way to get a complete picture of a sample. It returns a DescriptiveStatistics object combining count, center, spread, quartiles, shape, and standard error.
val data = doubleArrayOf(2.0, 4.0, 4.0, 4.0, 5.0, 5.0, 7.0, 9.0)
val stats = data.describe()

stats.count              // 8
stats.mean               // 5.0
stats.median             // 4.5
stats.standardDeviation  // 2.1380
stats.variance           // 4.5714
stats.q1                 // 4.0
stats.q3                 // 5.5
stats.interquartileRange // 1.5
stats.skewness           // 0.6563
stats.kurtosis           // -0.1640
stats.range              // 7.0
stats.standardError      // 0.7559
variance() and standardDeviation() use the sample formula (dividing by n1n - 1) by default. Pass PopulationKind.POPULATION to use the population formula (dividing by nn).

Central Tendency

Different notions of “center” suit different data shapes. mean() is the arithmetic average, median() is the positional center, mode() returns the most frequent values, and the trimmed and weighted variants handle outliers and importance weights.
val data = doubleArrayOf(2.0, 4.0, 4.0, 4.0, 5.0, 5.0, 7.0, 9.0)

data.mean()                    // 5.0
data.median()                  // 4.5
data.toList().mode()           // {4.0}

data.trimmedMean(0.1)          // 4.8333 — trims 10% from each tail

val positive = doubleArrayOf(1.0, 2.0, 4.0, 8.0)
positive.geometricMean()       // 2.8284
positive.harmonicMean()        // 2.1333

val values = doubleArrayOf(1.0, 2.0, 3.0)
val weights = doubleArrayOf(3.0, 1.0, 1.0)
values.weightedMean(weights)   // 1.6
  • Geometric mean — use for growth rates and ratios. Requires all positive values.
  • Harmonic mean — use for rates and speed averages. Requires all positive values.
  • Trimmed mean — drops a proportion from each tail before averaging. Robust to outliers.
  • Weighted mean — each observation contributes proportionally to its weight.
xˉ=1ni=1nxi\bar{x} = \frac{1}{n}\sum_{i=1}^{n} x_ixˉgeo=(i=1nxi)1/n\bar{x}_{\text{geo}} = \left(\prod_{i=1}^{n} x_i\right)^{1/n}xˉharm=ni=1n1xi\bar{x}_{\text{harm}} = \frac{n}{\sum_{i=1}^{n} \frac{1}{x_i}}xˉw=i=1nwixii=1nwi\bar{x}_w = \frac{\sum_{i=1}^{n} w_i x_i}{\sum_{i=1}^{n} w_i}The trimmed mean drops pn\lfloor p \cdot n \rfloor observations from each end of the sorted sample, then computes the arithmetic mean of the remaining values.

Dispersion

Spread measures describe how far values deviate from the center.
val data = doubleArrayOf(2.0, 4.0, 4.0, 4.0, 5.0, 5.0, 7.0, 9.0)

data.variance()                                    // 4.5714 (sample)
data.variance(PopulationKind.POPULATION)            // 4.0
data.standardDeviation()                            // 2.1380
data.range()                                        // 7.0
data.interquartileRange()                           // 1.5
data.meanAbsoluteDeviation()                        // 1.5

data.trimmedVariance(0.1)                           // trimmed sample variance

data.semiVariance(5.0, SemiVarianceDirection.DOWNSIDE) // 1.7143
data.semiVariance(5.0, SemiVarianceDirection.UPSIDE)   // 2.8571
semiVariance measures variability on one side of a threshold. DOWNSIDE captures risk below the threshold; UPSIDE captures variability above it.
s2=1n1i=1n(xixˉ)2,s=s2,SE=sns^2 = \frac{1}{n-1}\sum_{i=1}^{n}(x_i - \bar{x})^2, \qquad s = \sqrt{s^2}, \qquad SE = \frac{s}{\sqrt{n}}MAD=1ni=1nxixˉ\text{MAD} = \frac{1}{n}\sum_{i=1}^{n}|x_i - \bar{x}|SemiVardown(t)=1n1xi<t(xit)2\text{SemiVar}_{\text{down}}(t) = \frac{1}{n-1}\sum_{x_i < t}(x_i - t)^2

Quantiles and Position

Quantiles split the data at specified probability thresholds. percentile() takes a value on the 0–100 scale; quantile() takes a value on the 0–1 scale.
val data = doubleArrayOf(1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0, 9.0, 10.0)

data.quantile(0.5)         // 5.5 (median)
data.quantile(0.25)        // 3.25 (Q1)
data.percentile(90.0)      // 9.1 (90th percentile)

val (q1, median, q3) = data.quartiles()
// q1 = 3.25, median = 5.5, q3 = 7.75
quantile() supports a QuantileInterpolation parameter with options: LINEAR (default), LOWER, HIGHER, NEAREST, and MIDPOINT. These control how the value is interpolated when the quantile falls between two data points.
With linear interpolation (the default), the quantile at probability pp for sorted data x(1),,x(n)x_{(1)}, \ldots, x_{(n)} is computed as:Q(p)=x(h)+(hh)(x(h)x(h))Q(p) = x_{(\lfloor h \rfloor)} + (h - \lfloor h \rfloor)(x_{(\lceil h \rceil)} - x_{(\lfloor h \rfloor)})where h=p(n1)+1h = p(n - 1) + 1.

Shape and Moments

Shape measures describe asymmetry and tail weight beyond what the mean and variance capture.
val data = doubleArrayOf(2.0, 4.0, 4.0, 4.0, 5.0, 5.0, 7.0, 9.0)

data.skewness()            // 0.6563 — positive: right tail is longer
data.kurtosis()            // -0.1640 — negative excess: lighter tails than normal
data.kurtosis(excess = false) // raw kurtosis (not centered at 0)

data.centralMoment(2)      // 4.0 (equals population variance)
data.centralMoment(3)      // 5.25
data.centralMoment(4)      // 44.5

data.kStatistic(1)          // 5.0 (equals mean)
data.kStatistic(2)          // 4.5714 (equals sample variance)
Skewness near zero suggests symmetry. Positive skewness indicates a longer right tail; negative indicates a longer left tail. Excess kurtosis near zero is similar to a normal distribution; positive values indicate heavier tails.
kurtosis() returns excess kurtosis by default (excess = true), meaning the value is shifted so that a normal distribution has excess kurtosis of 0. Pass excess = false for the raw fourth standardized moment.
g1=n(n1)(n2)i=1n(xixˉs)3g_1 = \frac{n}{(n-1)(n-2)} \sum_{i=1}^{n} \left(\frac{x_i - \bar{x}}{s}\right)^3g2=n(n+1)(n1)(n2)(n3)i=1n(xixˉs)43(n1)2(n2)(n3)g_2 = \frac{n(n+1)}{(n-1)(n-2)(n-3)} \sum_{i=1}^{n} \left(\frac{x_i - \bar{x}}{s}\right)^4 - \frac{3(n-1)^2}{(n-2)(n-3)}The rr-th central moment is μr=1ni=1n(xixˉ)r\mu_r = \frac{1}{n}\sum_{i=1}^{n}(x_i - \bar{x})^r. K-statistics (krk_r) are unbiased estimators of the corresponding cumulants.

Frequency Analysis

The Frequency class counts occurrences, proportions, and cumulative frequencies for any Comparable type. frequencyTable() groups numeric data into equal-width bins.
val freq = listOf("a", "a", "b", "b", "b", "c").toFrequency()
freq.totalCount            // 6
freq.count("b")            // 3
freq.proportion("b")       // 0.5
freq.cumulativeCount("b")  // 5 (a=2 + b=3)
freq.mode                  // {b}
val data = doubleArrayOf(1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0, 9.0, 10.0)
val bins = data.asIterable().frequencyTable(3)
// Each FrequencyBin has: range, count, relativeFrequency, cumulativeFrequency
bins[0].count              // number of values in the first bin
bins[0].relativeFrequency  // proportion of total
Frequency works with any Comparable type — strings, enums, integers, or custom classes. For numeric binning, use frequencyTable() with either a bin count or a bin width.

Streaming Statistics

OnlineStatistics computes mean, variance, skewness, and kurtosis incrementally without storing the full sample. Values can be added one at a time or in batches.
val online = OnlineStatistics()
online.addAll(doubleArrayOf(2.0, 4.0, 4.0, 4.0, 5.0, 5.0, 7.0, 9.0))

online.count               // 8
online.mean                // 5.0
online.sum                 // 40.0
online.min                 // 2.0
online.max                 // 9.0
online.variance()          // 4.5714 (sample)
online.standardDeviation() // 2.1380
online.skewness()          // 0.6563

// Add more data later
online.add(3.0)
online.count               // 9
online.mean                // 4.7778
Use OnlineStatistics when data arrives incrementally (streaming, event-driven systems) or when buffering the full sample in memory is impractical.
OnlineStatistics uses Welford’s algorithm with the Terriberry extension for numerically stable single-pass computation of higher-order moments. The algorithm updates running sums of powers of deviations from the current mean, avoiding the catastrophic cancellation that affects naive two-pass formulas on large datasets.

Process Capability

Process capability indices quantify how well a process fits within its . processCapability(lsl, usl) returns four Statistical Process Control (SPC) indices in one call:
  • Cp, Cpk — potential and actual capability using the sample standard deviation (short-term, within-subgroup spread).
  • Pp, Ppk — the same formulas using the population standard deviation (long-term, overall spread).
The -k variants penalize a process that is off-center relative to the midpoint of the spec, so Cpk is never larger than Cp.
// Ten parts measured against a spec window of [48, 52]
val measurements = doubleArrayOf(
    50.0, 50.5, 49.5, 50.2, 49.8, 50.1, 49.9, 50.3, 49.7, 50.0
)
val capability = measurements.processCapability(lsl = 48.0, usl = 52.0)

capability.cp   // 2.2646 — potential capability (spread vs tolerance)
capability.cpk  // 2.2646 — actual capability (penalizes off-centering)
capability.pp   // 2.3870 — overall (population σ) counterpart of Cp
capability.ppk  // 2.3870 — overall counterpart of Cpk
Use these indices only on a process that is already in statistical control (stable over time — see Shewhart Control Charts below). For an unstable process the measured spread is not a fixed property of the process.
Values ≥ 1.33 are usually considered capable; ≥ 1.67 highly capable. When Cpk ≪ Cp, re-center the process before trying to reduce variance.
Cp=USLLSL6σs,Cpk=min ⁣(USLxˉ3σs, xˉLSL3σs)\mathrm{Cp} = \frac{\mathrm{USL} - \mathrm{LSL}}{6\sigma_s}, \qquad \mathrm{Cpk} = \min\!\left(\frac{\mathrm{USL} - \bar{x}}{3\sigma_s},\ \frac{\bar{x} - \mathrm{LSL}}{3\sigma_s}\right)Pp and Ppk use the population standard deviation σp\sigma_p (divisor nn) in place of the sample standard deviation σs\sigma_s (divisor n1n-1). processCapability computes both in a single numerically stable Welford pass.

Shewhart Control Charts

Shewhart plot subgroup statistics over time with three-sigma control limits. xBarRChart() monitors the process mean together with the range within each subgroup; xBarSChart() uses the sample standard deviation instead — more efficient for subgroup sizes above 10. Both require equal-sized subgroups of 2–25 observations.
// Five subgroups of four parts; bracket width monitored per batch
val subgroups = listOf(
    doubleArrayOf(72.0, 84.0, 79.0, 49.0),
    doubleArrayOf(56.0, 87.0, 33.0, 42.0),
    doubleArrayOf(55.0, 73.0, 22.0, 60.0),
    doubleArrayOf(44.0, 80.0, 54.0, 74.0),
    doubleArrayOf(97.0, 26.0, 48.0, 58.0),
)
val chart = xBarRChart(subgroups)

chart.centerLine     // 59.65 — grand mean (x-double-bar)
chart.ucl            // 95.6626 — upper control limit for the mean
chart.lcl            // 23.6374 — lower control limit for the mean
chart.rChart.centerLine // 49.4 — average range (R-bar)
chart.rChart.ucl     // 112.7308 — upper limit for within-subgroup range
chart.rChart.lcl     // 0.0 — lower limit (D₃ = 0 for n ≤ 6)
Control limits use the standard SPC constants A2,A3,D3,D4,B3,B4,c4A_2, A_3, D_3, D_4, B_3, B_4, c_4 (Montgomery, Introduction to Statistical Quality Control, Appendix VI) available for subgroup sizes 2–25 through spcConstants(n).
For kk subgroups of size nn with subgroup means xˉi\bar{x}_i, ranges RiR_i, and standard deviations sis_i:xˉ-R:UCL/LCL=xˉˉ±A2Rˉ,R-chart: [D3Rˉ, D4Rˉ]\text{x̄-R:}\quad \mathrm{UCL}/\mathrm{LCL} = \bar{\bar{x}} \pm A_2 \bar{R}, \quad R\text{-chart: } [D_3 \bar{R},\ D_4 \bar{R}]xˉ-S:UCL/LCL=xˉˉ±A3sˉ,S-chart: [B3sˉ, B4sˉ]\text{x̄-S:}\quad \mathrm{UCL}/\mathrm{LCL} = \bar{\bar{x}} \pm A_3 \bar{s}, \quad S\text{-chart: } [B_3 \bar{s},\ B_4 \bar{s}]

CUSUM Chart

A Shewhart chart reacts slowly to drifts of less than 2σ because every point is judged in isolation. cusum() accumulates deviations from target over time, so a 0.5σ–1σ drift is detected within a few observations. The two-sided tabular form tracks an upper sum C+C^+ that catches upward shifts and a lower sum CC^- for downward shifts. An alarm fires on the first index where either sum exceeds the decision interval HH.
// Individual measurements from a process with target 10, drifting upward
val observations = doubleArrayOf(10.2, 10.4, 10.6, 10.9, 11.2, 11.5, 11.8, 12.0)
val result = cusum(observations, target = 10.0, k = 0.5, h = 3.0)

result.sPlus      // [0.0, 0.0, 0.1, 0.5, 1.2, 2.2, 3.5, 5.0]
result.sMinus     // all zero — no downward drift
result.alarmIndex // 6 — first index where C⁺ > H
Tune k to half the shift size you want to detect, in units of σ — a common default is K0.5σK \approx 0.5\sigma targeting a 1σ drift. Set h to 4σ–5σ to match the in-control run length of a 3σ Shewhart chart while reacting much faster to small shifts.
Ci+=max(0,  Ci1++(xiμ0K)),Ci=max(0,  Ci1+(μ0Kxi))C^+_i = \max\bigl(0,\; C^+_{i-1} + (x_i - \mu_0 - K)\bigr), \qquad C^-_i = \max\bigl(0,\; C^-_{i-1} + (\mu_0 - K - x_i)\bigr)starting from C0±=0C^\pm_0 = 0, with an alarm the first time Ci+>HC^+_i > H or Ci>HC^-_i > H.

EWMA Chart

EWMA (Roberts, 1959) is the other classic small-shift detector. Instead of an unbounded cumulative sum, ewma() maintains a weighted moving average that gives recent observations more influence while retaining memory of the past. Control limits widen with time until they reach a steady state, so the chart is most sensitive early — useful for catching an initial shift.
// EWMA chart: target = 25, σ = 1, λ = 0.2, L = 3
val observations = doubleArrayOf(25.0, 24.5, 25.2, 26.1, 25.8, 27.0, 26.5, 28.0)
val result = ewma(
    observations,
    target = 25.0,
    sigma = 1.0,
    lambda = 0.2,
    controlLimitWidth = 3.0
)

result.smoothedValues[0] // 25.0 — Z₀ = λ·x + (1-λ)·target
result.smoothedValues[7] // 26.2549 — smoothed statistic at t = 7
result.ucl[0]            // 25.6 — narrow at first, widens with t
result.ucl[7]            // 25.9858 — approaching steady state
result.outOfControl      // [7] — Z₇ exceeds UCL₇
λ=0.2\lambda = 0.2 with L2.7L \approx 2.73.03.0 is a common default. Smaller λ\lambda emphasizes memory and detects smaller shifts; λ=1\lambda = 1 collapses EWMA into a Shewhart individuals chart.
Zt=λxt+(1λ)Zt1,Z0=μ0Z_t = \lambda x_t + (1 - \lambda) Z_{t-1}, \qquad Z_0 = \mu_0UCLt/LCLt=μ0±Lσλ2λ(1(1λ)2t)\mathrm{UCL}_t/\mathrm{LCL}_t = \mu_0 \pm L \sigma \sqrt{\frac{\lambda}{2 - \lambda}\,\bigl(1 - (1 - \lambda)^{2t}\bigr)}

Western Electric Rules

westernElectricRules() extends a Shewhart chart beyond the basic ±3σ check with four run-length heuristics that catch trends, clusters, and prolonged one-sided runs.
RulePatternDetects
11 point beyond ±3σ\pm 3\sigmaextreme single excursion
22 of last 3 points beyond ±2σ\pm 2\sigma, same sidestrong shift
34 of last 5 points beyond ±1σ\pm 1\sigma, same sidemoderate shift
48 consecutive points on the same side of the centersustained shift, any magnitude
Each rule’s array contains the trigger indices — the observation whose arrival completes the offending pattern.
// Process drifting upward in the last four observations
val observations = doubleArrayOf(
    0.1, 0.2, -0.3, 0.0, 1.4, 1.2, 2.4, 2.6, 3.5, 2.2
)
val violations = westernElectricRules(observations, center = 0.0, sigma = 1.0)

violations.rule1 // indices of points beyond ±3σ
violations.rule2 // indices where 2 of last 3 points are beyond ±2σ (same side)
violations.rule3 // indices where 4 of last 5 points are beyond ±1σ (same side)
violations.rule4 // indices where 8 consecutive points fall on the same side
Combine a Shewhart chart (large shifts) with CUSUM or EWMA (small shifts) and Western Electric Rules (patterns) — the three views together catch the widest range of out-of-control conditions.

Error Handling

Empty arrays throw InsufficientDataException. Functions that require at least two observations (variance, standard deviation, skewness) throw InsufficientDataException for single-element input. Functions requiring positive data (geometricMean, harmonicMean) throw InvalidParameterException for non-positive values.

API Reference

Full API Reference

Browse all public types, functions, parameter overloads, and return types in the Dokka-generated reference.
Last modified on April 23, 2026