Professional Distribution Fitting for Python
A comprehensive, production-ready library for statistical distribution fitting that surpasses EasyFit and R's fitdistrplus with modern statistical methods, exceptional user experience, and robust software engineering.
English | فارسی | Deutsch | 📋 CHANGELOG
✅ 30 Statistical Distributions (25 continuous + 5 discrete)
✅ Goodness-of-Fit Tests (KS, AD, Chi-Square, Cramér-von Mises)
✅ Bootstrap Confidence Intervals (Parametric & Non-parametric with BCa)
✅ Enhanced Diagnostics (Residuals, Influence, Outlier Detection)
✅ Weighted Data Support (Survey data, stratified sampling, frequency counts)
✅ Multiple Estimation Methods (MLE, Moments, Quantile matching)
✅ Multilingual (English, فارسی, Deutsch)
✅ Comprehensive Documentation (9 tutorials + API reference)
✅ 20+ Complete Examples (8,500+ lines across 7 folders) 🆕
📚 New: Comprehensive Examples
Explore 20+ production-ready examples covering:
- Basics & common distributions
- Advanced fitting methods (MLE, MoM)
- Model selection (AIC, BIC, Cross-validation)
- Goodness-of-fit testing
- Beautiful visualizations (PDF, CDF, Q-Q plots, interactive)
- Real-world applications (Finance, Reliability, Quality Control)
- Advanced topics (Mixture models, Bootstrap, Custom distributions)
- ✅ Free and open source (MIT license)
- ✅ Python ecosystem integration (NumPy, SciPy, pandas)
- ✅ Advanced GOF tests (not just visual assessment)
- ✅ Bootstrap CI (uncertainty quantification)
- ✅ Weighted data support
- ✅ Automated model selection (AIC/BIC)
- ✅ Simpler, cleaner API
- ✅ Better performance (parallel processing built-in)
- ✅ Modern visualizations (matplotlib + plotly)
- ✅ Self-documenting code and outputs
- ✅ Multilingual support
- ✅ More distributions (30 vs 23)
- ✅ Production-ready code
- ✅ Comprehensive test suite
- ✅ Full documentation (9 tutorials + 20+ examples)
- ✅ Type hints throughout
- ✅ Clean, maintainable architecture
pip install distfit-proDevelopment Installation:
git clone https://github.com/alisadeghiaghili/py-distfit-pro.git
cd py-distfit-pro
pip install -e ".[dev]"Requirements:
- Python >= 3.8
- NumPy >= 1.20
- SciPy >= 1.7
- Matplotlib >= 3.3
- Plotly >= 5.0
- joblib >= 1.0
- tqdm >= 4.60
from distfit_pro import get_distribution
import numpy as np
# Generate data
np.random.seed(42)
data = np.random.normal(loc=10, scale=2, size=1000)
# Fit distribution
dist = get_distribution('normal')
dist.fit(data, method='mle')
# View results
print(dist.summary()) # Complete statistical summary
print(dist.explain()) # Conceptual explanationfrom distfit_pro.core.gof_tests import GOFTests
# Run all GOF tests
results = GOFTests.run_all_tests(data, dist)
print(GOFTests.summary_table(results))from distfit_pro.core.bootstrap import Bootstrap
# Parametric bootstrap (1000 samples, parallel)
ci_results = Bootstrap.parametric(data, dist, n_bootstrap=1000, n_jobs=-1)
for param, result in ci_results.items():
print(result)from distfit_pro.core.diagnostics import Diagnostics
# Residual analysis
residuals = Diagnostics.residual_analysis(data, dist)
print(residuals.summary())
# Detect outliers
outliers = Diagnostics.detect_outliers(data, dist, method='zscore')
print(outliers.summary())from distfit_pro.core.weighted import WeightedFitting
# Data with weights (e.g., survey sampling weights)
weights = np.random.uniform(0.5, 1.5, 1000)
# Weighted fit
params = WeightedFitting.fit_weighted_mle(data, weights, dist)
dist.params = params
dist.fitted = True
print(dist.summary())| Distribution | Use Cases | Key Features |
|---|---|---|
| Normal | Heights, test scores, errors | Symmetric, bell curve |
| Lognormal | Income, stock prices | Right-skewed, positive |
| Weibull | Reliability, lifetimes | Flexible hazard rate |
| Gamma | Waiting times, rainfall | Sum of exponentials |
| Exponential | Time between events | Memoryless property |
| Beta | Probabilities, rates | Bounded [0,1] |
| Student's t | Small samples | Heavy tails |
| Pareto | Wealth, power law | 80-20 rule |
| Gumbel | Extreme maxima | Flood analysis |
| Laplace | Differences, errors | Double exponential |
And 15 more: Uniform, Triangular, Logistic, Frechet, Cauchy, Chi-Square, F, Rayleigh, Inverse Gamma, Log-Logistic, and others.
- Poisson - Count of rare events
- Binomial - Success/failure trials
- Negative Binomial - Overdispersed counts
- Geometric - Trials to first success
- Hypergeometric - Sampling without replacement
# Maximum Likelihood (most accurate)
dist.fit(data, method='mle')
# Method of Moments (fast, robust)
dist.fit(data, method='moments')
# Quantile Matching (robust to outliers)
dist.fit(data, method='quantile', quantiles=[0.25, 0.5, 0.75])- Kolmogorov-Smirnov - General purpose
- Anderson-Darling - Sensitive to tails
- Chi-Square - Frequency-based
- Cramér-von Mises - Middle-focused
All tests include p-values, critical values, and interpretations.
# Parametric bootstrap
Bootstrap.parametric(data, dist, n_bootstrap=1000)
# Non-parametric bootstrap (more conservative)
Bootstrap.nonparametric(data, dist, n_bootstrap=1000)
# BCa method (most accurate)
Bootstrap.bca_ci(boot_samples, estimate, data, estimator_func)Features:
- Parallel processing (uses all CPU cores)
- Progress bars (tqdm integration)
- Multiple confidence levels (90%, 95%, 99%)
Residual Analysis:
- Quantile residuals
- Pearson residuals
- Deviance residuals
- Standardized residuals
Influence Diagnostics:
- Cook's distance
- Leverage values
- DFFITS
- Automatic identification of influential observations
Outlier Detection (4 methods):
- Z-score
- IQR (Interquartile Range)
- Likelihood-based
- Mahalanobis distance
Diagnostic Plots:
- Q-Q plot data
- P-P plot data
- Worm plot (detrended Q-Q)
# Survey weights
WeightedFitting.fit_weighted_mle(data, sampling_weights, dist)
# Frequency data
WeightedFitting.fit_weighted_mle(values, frequencies, dist)
# Precision weights
weights = 1 / measurement_errors**2
WeightedFitting.fit_weighted_mle(measurements, weights, dist)Utilities:
- Weighted statistics (mean, var, quantiles)
- Effective sample size calculation
- Weighted bootstrap
# Compare distributions
from distfit_pro import list_distributions
candidates = ['normal', 'lognormal', 'gamma', 'weibull']
results = {}
for name in candidates:
dist = get_distribution(name)
dist.fit(data)
# AIC = 2k - 2*log(L)
k = len(dist.params)
log_lik = np.sum(dist.logpdf(data))
aic = 2 * k - 2 * log_lik
results[name] = {'aic': aic, 'dist': dist}
# Best model
best = min(results.items(), key=lambda x: x[1]['aic'])
print(f"Best: {best[0]}")DistFit Pro speaks 3 languages!
from distfit_pro import set_language
# 🇬🇧 English
set_language('en')
print(dist.explain())
# Output:
# 📊 Estimated Parameters:
# • μ (mean): 10.0173
# • σ (std): 1.9918
# 💡 Practical Applications:
# • Measurement errors
# • Heights and weights
# 🇮🇷 فارسی (Persian)
set_language('fa')
print(dist.explain())
# خروجی:
# 📊 پارامترهای برآورد شده:
# • μ (میانگین): 10.0173
# • σ (انحراف معیار): 1.9918
# 💡 کاربردهای عملی:
# • خطاهای اندازهگیری
# • قد و وزن
# 🇩🇪 Deutsch (German)
set_language('de')
print(dist.explain())
# Ausgabe:
# 📊 Geschätzte Parameter:
# • μ (Mittelwert): 10.0173
# • σ (Standardabweichung): 1.9918
# 💡 Praktische Anwendungen:
# • Messfehler
# • Größe und Gewicht- The Basics - Your first distribution fit
- Distributions Guide - All 30 distributions explained
- Fitting Methods - MLE, Moments, Quantile
- GOF Tests - Test goodness-of-fit
- Bootstrap CI - Uncertainty quantification
- Diagnostics - Residuals, outliers, influence
- Weighted Data - Survey weights, frequencies
- Visualization - Beautiful plots
- Advanced Topics - Custom distributions, mixtures
📁 examples/ - 20+ production-ready examples (8,500+ lines)
- 01_basics/ - Introduction to distribution fitting
- 02_advanced_fitting/ - MLE and Method of Moments
- 03_model_selection/ - AIC/BIC, Cross-validation
- 04_goodness_of_fit/ - KS, Chi-square, Anderson-Darling
- 05_visualization/ - PDF/CDF, Q-Q, Interactive plots
- 06_real_world/ - Finance, Reliability, Quality Control
- 07_advanced_topics/ - Mixture models, Bootstrap, Custom
- 📖 Installation Guide
- ⚡ Quick Start
- 📊 API Reference
- 💡 Examples
- 📋 CHANGELOG
- ❓ FAQ
import numpy as np
from distfit_pro import get_distribution
from distfit_pro.core.diagnostics import Diagnostics
# Manufacturing measurements
measurements = np.random.normal(100, 2, 1000)
# Fit distribution
dist = get_distribution('normal')
dist.fit(measurements)
# Detect outliers (defects)
outliers = Diagnostics.detect_outliers(
measurements,
dist,
method='zscore',
threshold=2.5 # Stricter for QC
)
print(f"Defect rate: {len(outliers.outlier_indices)/len(measurements)*100:.2f}%")👉 See full example: examples/06_real_world/quality_control.py
# Stock returns
returns = load_stock_data('AAPL')['daily_return']
# Fit heavy-tailed distribution
dist = get_distribution('studentt')
dist.fit(returns)
# Value at Risk (99% confidence)
var_99 = dist.ppf(0.01) # 1st percentile
print(f"VaR(99%): {var_99*100:.2f}%")
# Expected Shortfall
cvar_99 = dist.conditional_var(0.01)
print(f"CVaR(99%): {cvar_99*100:.2f}%")
# Bootstrap CI for VaR
from distfit_pro.core.bootstrap import Bootstrap
ci = Bootstrap.parametric(returns, dist, n_bootstrap=1000)👉 See full example: examples/06_real_world/finance_analysis.py
# Patient survival times
survival_times = np.array([12, 15, 18, 24, 30, 36, 48, 60])
# Fit Weibull distribution
dist = get_distribution('weibull')
dist.fit(survival_times)
# Reliability at 24 months
reliability = dist.reliability(24)
print(f"24-month survival: {reliability*100:.1f}%")
# Median survival time
median_survival = dist.ppf(0.5)
print(f"Median survival: {median_survival:.1f} months")👉 See full example: examples/06_real_world/reliability_engineering.py
Benchmarks on Intel i7-10700K (8 cores):
| Task | Dataset Size | Time (serial) | Time (parallel) | Speedup |
|---|---|---|---|---|
| Fit single distribution | 10,000 | 15ms | N/A | - |
| Fit single distribution | 1,000,000 | 450ms | N/A | - |
| Bootstrap (1000 samples) | 10,000 | 18s | 3.2s | 5.6x |
| GOF tests (all 4) | 10,000 | 85ms | N/A | - |
| Model selection (10 dists) | 10,000 | 280ms | 95ms | 2.9x |
Memory efficient: Handles datasets up to RAM limits.
See CHANGELOG.md for detailed version history.
First Stable and Complete Release
- ✅ 30 Statistical Distributions (25 continuous + 5 discrete)
- ✅ Multiple Estimation Methods (MLE, Moments, Quantile matching)
- ✅ Goodness-of-Fit Tests (4 tests: KS, AD, Chi-Square, CvM)
- ✅ Bootstrap Confidence Intervals (Parametric & Non-parametric with BCa)
- ✅ Enhanced Diagnostics (4 residual types, influence, outlier detection)
- ✅ Weighted Data Support (MLE + Moments)
- ✅ Multilingual (English, فارسی, Deutsch)
- ✅ Comprehensive Documentation (9 tutorials + API reference)
- ✅ 20+ Complete Examples (8,500+ lines of code)
- ✅ Parallel Processing (joblib with all cores)
- ✅ Progress Bars (tqdm)
Version: 1.0.0 ✅
Release Date: 2026-02-14
Status: Stable and Production-Ready
- ✅ 30 Statistical Distributions
- ✅ 3 Estimation Methods (MLE, Moments, Quantile)
- ✅ 4 GOF Tests (KS, AD, Chi-Square, CvM)
- ✅ Bootstrap CI (Parametric + Non-parametric + BCa)
- ✅ Enhanced Diagnostics (4 residual types, influence, outliers)
- ✅ Weighted Data Support (MLE + Moments)
- ✅ Multilingual (EN/FA/DE)
- ✅ Comprehensive Documentation (9 tutorials)
- ✅ 20+ Complete Examples (8,500+ lines)
- ✅ Parallel Processing (joblib)
- ✅ Progress Bars (tqdm)
Contributions welcome! See CONTRIBUTING.md.
Areas we need help:
- Additional distributions
- More GOF tests
- Performance optimizations
- Documentation improvements
- Translations (add your language!)
- More real-world examples
MIT License - see LICENSE.
Free for commercial and personal use.
Inspired by:
- R's
fitdistrpluspackage (Delignette-Muller & Dutang) - MathWave's EasyFit software
- SciPy's statistical distributions
Built with:
- NumPy & SciPy - numerical computing
- joblib - parallel processing
- matplotlib & plotly - visualization
- tqdm - progress bars
Ali Sadeghi Aghili
🦄 Data Unicorn
🌐 zil.ink/thedatascientist
🔗 linktr.ee/aliaghili
💻 @alisadeghiaghili
If you find this project useful, please consider giving it a star! ⭐
It helps others discover the project and motivates continued development.
Made with ❤️, ☕, and rigorous statistical methodology by Ali Sadeghi Aghili
"Better statistics through better software."