Pearson, Spearman, Kendall tau, partial correlation, point-biserial, matrices, covariance, and simple linear regression in kstats-correlation.
kstats-correlation covers two related tasks: measuring the strength of association between variables, and modeling a linear relationship. The module is split into two sections reflecting this distinction.
Measures the strength and direction of the linear association between two numeric variables. The coefficient ranges from -1 (perfect negative) to +1 (perfect positive).
val x = doubleArrayOf(1.0, 2.0, 3.0, 4.0, 5.0)val y = doubleArrayOf(2.1, 3.9, 6.2, 7.8, 10.1)val r = pearsonCorrelation(x, y)r.coefficient // 0.9987r.pValue // 0.0001r.n // 5
Use Pearson when both variables are continuous and the relationship is approximately linear. Sensitive to outliers.
Applies Pearson correlation to the ranks of the data. Measures monotonic association — whether the variables tend to increase together, regardless of linearity.
val x = doubleArrayOf(1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0)val y = doubleArrayOf(2.0, 4.0, 8.0, 16.0, 32.0, 64.0, 128.0, 256.0)val r = spearmanCorrelation(x, y)r.coefficient // 1.0 — perfect monotonic relationshipr.pValue // 0.0
Use Spearman when the relationship is monotonic but not necessarily linear, or when the data contains outliers.
Counts concordant and discordant pairs to measure ordinal association. More robust than Spearman for small samples and heavy ties.
val x = doubleArrayOf(1.0, 2.0, 3.0, 4.0, 5.0)val y = doubleArrayOf(1.0, 3.0, 2.0, 5.0, 4.0)val tau = kendallTau(x, y)tau.coefficient // 0.6tau.pValue // p-value for tau
The Kendall tau implementation runs in O(nlogn) time using a merge-sort–based algorithm, making it efficient for large datasets.
Measures the association between a binary variable (coded as 0/1 integers) and a continuous variable. Equivalent to Pearson correlation when one variable is dichotomous.
val binary = intArrayOf(0, 0, 0, 1, 1, 1, 1)val continuous = doubleArrayOf(1.0, 2.0, 1.5, 4.0, 5.0, 4.5, 3.5)val r = pointBiserialCorrelation(binary, continuous)r.coefficient // positive — group 1 has higher valuesr.pValue // p-value
Use when one variable is naturally binary: treatment/control, pass/fail, male/female.
Measures the association between two variables after controlling for a third variable. Removes the effect of the confounding variable.
val x = doubleArrayOf(1.0, 2.0, 3.0, 4.0, 5.0)val y = doubleArrayOf(2.0, 4.0, 5.0, 4.0, 5.0)val z = doubleArrayOf(1.0, 1.0, 2.0, 3.0, 3.0)val r = partialCorrelation(x, y, z)r.coefficient // correlation between x and y, controlling for zr.pValue // p-value
Use when a third variable might explain the apparent relationship between the first two.
For multi-variable analysis, build pairwise matrices. Each cell (i,j) contains the correlation (or covariance) between variables i and j.
val x = doubleArrayOf(1.0, 2.0, 3.0, 4.0, 5.0)val y = doubleArrayOf(2.1, 3.9, 6.2, 7.8, 10.1)val z = doubleArrayOf(5.0, 4.0, 3.0, 2.0, 1.0)val corr = correlationMatrix(x, y, z)corr[0][1] // Pearson r between x and y ≈ 0.9987corr[0][2] // Pearson r between x and z = -1.0val cov = covarianceMatrix(x, y, z)cov[0][0] // variance of x = 2.5cov[0][1] // covariance of x and y
Association after removing the effect of a third variable
partialCorrelation()
Pairwise summaries for many variables
correlationMatrix(), covarianceMatrix()
Math details
Pearson:r=∑i(xi−xˉ)2∑i(yi−yˉ)2∑i(xi−xˉ)(yi−yˉ)Spearman: Pearson correlation applied to ranks.Kendall tau-b:τb=(C+D+TX)(C+D+TY)C−Dwhere C = concordant pairs, D = discordant pairs, TX = pairs tied only on X, TY = pairs tied only on Y.Partial correlation:rxy⋅z=(1−rxz2)(1−ryz2)rxy−rxz⋅ryz
Fits the line y^=β0+β1x to the data using ordinary least squares. The result includes the slope, intercept, goodness-of-fit (R2), standard errors, residuals, and a prediction function.
val x = doubleArrayOf(1.0, 2.0, 3.0, 4.0, 5.0)val y = doubleArrayOf(2.1, 3.9, 6.2, 7.8, 10.1)val model = simpleLinearRegression(x, y)model.slope // 1.99model.intercept // 0.06model.rSquared // 0.9973model.standardErrorSlope // standard error of the slope estimatemodel.standardErrorIntercept // standard error of the intercept estimatemodel.n // 5model.residuals // [0.05, -0.07, 0.15, -0.17, 0.05]// Predictionmodel.predict(6.0) // 11.99model.predict(doubleArrayOf(6.0, 7.0, 8.0)) // batch prediction