GPU-accelerated statistical computing for Python
Validated against R to machine precision. Free forever.
PyStatistics is a comprehensive statistical computing library for Python that maintains two parallel computational paths:
The library covers the full spectrum of classical statistics: regression, survival analysis, ANOVA, mixed models, bootstrap methods, hypothesis testing, descriptive statistics, and multivariate normal MLE with missing data.
1. Correctness > Fidelity > Performance > Convenience
2. Fail fast, fail loud — no silent fallbacks
3. Explicit over implicit — require parameters, don't assume intent
4. Two-tier validation — CPU vs R, then GPU vs CPU
pip install pystatistics
With GPU support (requires PyTorch):
pip install "pystatistics[gpu]"
from pystatistics.regression import fit
import numpy as np
X = np.random.randn(1000, 5)
y = X @ [1, 2, 3, -1, 0.5] + np.random.randn(1000) * 0.1
result = fit(X, y)
print(result.summary())
# Logistic regression
y_binary = (X @ [1, -1, 0.5, 0, 0] + np.random.randn(1000) > 0).astype(float)
result = fit(X, y_binary, family='binomial')
# GPU acceleration (any model)
result = fit(X, y, backend='gpu')
from pystatistics.hypothesis import t_test, p_adjust
result = t_test([1,2,3,4,5], [3,4,5,6,7])
print(result.statistic, result.p_value, result.conf_int)
print(result.summary()) # R-style output
# Multiple testing correction
p_adjusted = p_adjust([0.01, 0.04, 0.03, 0.005], method='BH')
from pystatistics.survival import kaplan_meier, coxph
import numpy as np
time = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9, 10])
event = np.array([1, 0, 1, 1, 0, 1, 1, 0, 1, 1])
km = kaplan_meier(time, event)
print(km.survival, km.se, km.ci_lower, km.ci_upper)
X = np.column_stack([np.random.randn(10)])
cox = coxph(time, event, X)
print(cox.coefficients, cox.hazard_ratios)
Every module follows the same architecture: DataSource → Design → fit() → Backend.solve() → Result
Linear & generalized linear models. OLS, logistic, Poisson, Gamma, negative binomial via IRLS.
API Reference →Mean, SD, correlation, covariance, quantiles (all 9 R types), skewness, kurtosis.
API Reference →t-test, chi-squared, Fisher exact, Wilcoxon, KS, proportions, F-test, p.adjust.
API Reference →Bootstrap (ordinary, balanced, parametric), permutation tests, 5 CI methods, batched GPU solver.
API Reference →One-way, factorial, ANCOVA, repeated measures. Type I/II/III SS. Tukey, Bonferroni, Dunnett.
API Reference →LMM & GLMM. Random intercepts/slopes, nested/crossed, REML/ML, Satterthwaite df.
API Reference →Multivariate normal MLE with missing data. Direct & EM algorithms. Little's MCAR test.
API Reference →Proportional odds models. Logistic, probit, cloglog links. Matches R MASS::polr().
API Reference →Multinomial logit (softmax) regression. Multi-class outcomes. Matches R nnet::multinom().
API Reference →PCA via SVD, maximum likelihood factor analysis. Varimax & promax rotations.
API Reference →ACF/PACF, ADF/KPSS tests, ETS, ARIMA/SARIMA, auto.arima, decomposition, STL.
API Reference →Generalized additive models. Penalized splines, GCV/REML, P-IRLS. Matches R mgcv::gam().
API Reference →