|
1 | | -# MannKS |
2 | | -MannKS (Mann-Kendall Sen slope) - Robust Trend Analysis in Python |
| 1 | +<div align="center"> |
| 2 | + <img src="assets/logo.png" alt="MannKS Logo" width="600"/> |
| 3 | + |
| 4 | + # MannKS |
| 5 | + ### (Mann-Kendall Sen) |
| 6 | + |
| 7 | + **Robust Trend Analysis in Python** |
| 8 | +</div> |
| 9 | + |
| 10 | +--- |
| 11 | + |
| 12 | +## 📦 Installation |
| 13 | + |
| 14 | +```bash |
| 15 | +pip install -r requirements.txt |
| 16 | +pip install -e . |
| 17 | +``` |
| 18 | + |
| 19 | +**Requirements:** Python 3.7+, NumPy, Pandas, SciPy, Matplotlib |
| 20 | + |
| 21 | +--- |
| 22 | + |
| 23 | +## ✨ What is MannKS? |
| 24 | + |
| 25 | +**MannKS** (Mann-Kendall Sen) is a Python package for detecting trends in time series data using non-parametric methods. It's specifically designed for environmental monitoring, water quality analysis, and other fields where data is messy, irregular, or contains detection limits. |
| 26 | + |
| 27 | +### When to Use MannKS |
| 28 | + |
| 29 | +Use this package when your data has: |
| 30 | +- **Irregular sampling intervals** (daily → monthly → quarterly) |
| 31 | +- **Censored values** (measurements like `<5` or `>100`) |
| 32 | +- **Seasonal patterns** you need to account for |
| 33 | +- **No normal distribution** (non-parametric methods don't require it) |
| 34 | +- **Small to moderate sample sizes** (n < 5,000 recommended) |
| 35 | + |
| 36 | +**Don't use** for highly autocorrelated data (test first) or if you need n > 46,340 observations. |
| 37 | + |
| 38 | +--- |
| 39 | + |
| 40 | +## 🚀 Quick Start |
| 41 | + |
| 42 | +```python |
| 43 | +import pandas as pd |
| 44 | +from MannKS import prepare_censored_data, trend_test |
| 45 | + |
| 46 | +# 1. Prepare data with censored values |
| 47 | +# Converts strings like '<5' into a structured format |
| 48 | +values = [10, 12, '<5', 14, 15, 18, 20, '<5', 25, 30] |
| 49 | +dates = pd.date_range(start='2020-01-01', periods=len(values), freq='ME') |
| 50 | +data = prepare_censored_data(values) |
| 51 | + |
| 52 | +# 2. Run trend test |
| 53 | +# slope_scaling converts slope from "per second" to "per year" |
| 54 | +result = trend_test( |
| 55 | + x=data, |
| 56 | + t=dates, |
| 57 | + slope_scaling='year', |
| 58 | + x_unit='mg/L', |
| 59 | + plot_path='trend.png' |
| 60 | +) |
| 61 | + |
| 62 | +# 3. Interpret results |
| 63 | +print(f"Trend: {result.classification}") |
| 64 | +print(f"Slope: {result.slope:.2f} {result.slope_units}") |
| 65 | +print(f"Confidence: {result.C:.2%}") |
| 66 | +``` |
| 67 | + |
| 68 | +**Output:** |
| 69 | +``` |
| 70 | +Trend: Highly Likely Increasing |
| 71 | +Slope: 24.57 mg/L per year |
| 72 | +Confidence: 98.47% |
| 73 | +``` |
| 74 | + |
| 75 | + |
| 76 | + |
| 77 | +--- |
| 78 | + |
| 79 | +## 🎯 Key Features |
| 80 | + |
| 81 | +### Core Functionality |
| 82 | +- **Mann-Kendall Trend Test**: Detect monotonic trends with statistical significance |
| 83 | +- **Sen's Slope Estimator**: Calculate trend magnitude with confidence intervals |
| 84 | +- **Seasonal Analysis**: Separate seasonal signals from long-term trends |
| 85 | +- **Regional Aggregation**: Combine results across multiple monitoring sites |
| 86 | + |
| 87 | +### Data Handling |
| 88 | +- **Censored Data Support**: Native handling of detection limits (`<5`, `>100`) |
| 89 | + - Three methods: Standard, LWP-compatible, Akritas-Theil-Sen (ATS) |
| 90 | + - Handles left-censored, right-censored, and mixed censoring |
| 91 | +- **Unequal Spacing**: Uses actual time differences (not just rank order) |
| 92 | +- **Missing Data**: Automatically handles NaN values and missing seasons |
| 93 | +- **Temporal Aggregation**: Multiple strategies for high-frequency data |
| 94 | + |
| 95 | +### Statistical Features |
| 96 | +- **Continuous Confidence**: Reports likelihood ("Highly Likely Increasing") not just p-values |
| 97 | +- **Data Quality Checks**: Automatic warnings for tied values, long runs, insufficient data |
| 98 | +- **Robust Methods**: ATS estimator for heavily censored data |
| 99 | +- **Flexible Testing**: Kendall's Tau-a or Tau-b, custom significance levels |
| 100 | + |
| 101 | +--- |
| 102 | + |
| 103 | +## 📊 Example Use Cases |
| 104 | + |
| 105 | +### Seasonal Water Quality Trend |
| 106 | +```python |
| 107 | +from MannKS import seasonal_trend_test, check_seasonality |
| 108 | + |
| 109 | +# Check if seasonality exists |
| 110 | +seasonality = check_seasonality(x=data, t=dates, period=12, season_type='month') |
| 111 | +print(f"Seasonal pattern detected: {seasonality.is_seasonal}") |
| 112 | + |
| 113 | +# Run seasonal trend test |
| 114 | +result = seasonal_trend_test( |
| 115 | + x=data, |
| 116 | + t=dates, |
| 117 | + period=12, |
| 118 | + season_type='month', |
| 119 | + agg_method='robust_median', # Aggregates multiple samples per month |
| 120 | + slope_scaling='year' |
| 121 | +) |
| 122 | +``` |
| 123 | + |
| 124 | +### Regional Analysis Across Sites |
| 125 | +```python |
| 126 | +from MannKS import regional_test |
| 127 | + |
| 128 | +# Run trend tests for each site |
| 129 | +site_results = [] |
| 130 | +for site in ['Site_A', 'Site_B', 'Site_C']: |
| 131 | + result = trend_test(x=site_data[site], t=dates) |
| 132 | + site_results.append({ |
| 133 | + 'site': site, |
| 134 | + 's': result.s, |
| 135 | + 'C': result.C |
| 136 | + }) |
| 137 | + |
| 138 | +# Aggregate regional trend |
| 139 | +regional = regional_test( |
| 140 | + trend_results=pd.DataFrame(site_results), |
| 141 | + time_series_data=all_site_data, |
| 142 | + site_col='site' |
| 143 | +) |
| 144 | +print(f"Regional trend: {regional.DT}, confidence: {regional.CT:.2%}") |
| 145 | +``` |
| 146 | + |
| 147 | +--- |
| 148 | + |
| 149 | +## ⚠️ Important Limitations |
| 150 | + |
| 151 | +### Sample Size |
| 152 | +- **Recommended maximum: n = 5,000** (triggers memory warning) |
| 153 | +- **Hard limit: n = 46,340** (prevents integer overflow) |
| 154 | +- For larger datasets, use `regional_test()` to aggregate multiple smaller sites |
| 155 | + |
| 156 | +### Statistical Assumptions |
| 157 | +- **Independence**: Data points must be serially independent |
| 158 | + - Autocorrelation violates this and causes spurious significance |
| 159 | + - Pre-test with ACF or use block bootstrap methods if autocorrelated |
| 160 | +- **Monotonic trend**: Cannot detect U-shaped or cyclical patterns |
| 161 | +- **Homogeneous variance**: Most powerful when variance is constant over time |
| 162 | + |
| 163 | +--- |
| 164 | + |
| 165 | +## 📚 Documentation |
| 166 | + |
| 167 | +### Detailed Guides |
| 168 | +- **[Trend Test Parameters](./Examples/Detailed_Guides/trend_test_parameters_guide.md)** - Full parameter reference |
| 169 | +- **[Seasonal Analysis](./Examples/Detailed_Guides/seasonal_trend_test_parameters_guide.md)** - Season types and aggregation |
| 170 | +- **[Regional Tests](./Examples/Detailed_Guides/regional_test_guide/README.md)** - Multi-site aggregation |
| 171 | +- **[Analysis Notes](./Examples/Detailed_Guides/analysis_notes_guide.md)** - Interpreting data quality warnings |
| 172 | +- **[Trend Classification](./Examples/Detailed_Guides/trend_classification_guide.md)** - Understanding confidence levels |
| 173 | + |
| 174 | +### Examples |
| 175 | +The [Examples](./Examples/README.md) folder contains step-by-step tutorials from basic to advanced usage. |
| 176 | + |
| 177 | +--- |
| 178 | + |
| 179 | +## 🔬 Validation |
| 180 | + |
| 181 | +Extensively validated against: |
| 182 | +- **LWP-TRENDS R script** (34 test cases, 99%+ agreement) |
| 183 | +- **NADA2 R package** (censored data methods) |
| 184 | +- Edge cases: missing data, tied values, all-censored data, insufficient samples |
| 185 | + |
| 186 | +See [validation/](./validation/) for detailed comparison reports. |
| 187 | + |
| 188 | +--- |
| 189 | + |
| 190 | +## 🙏 Acknowledgments |
| 191 | + |
| 192 | +This package is heavily inspired by the excellent work of **[LandWaterPeople (LWP)](https://landwaterpeople.co.nz/)**. The robust censored data handling and regional aggregation methods are based on their R scripts and methodologies. |
| 193 | + |
| 194 | +--- |
| 195 | + |
| 196 | +## 📖 References |
| 197 | + |
| 198 | +1. **Helsel, D.R. (2012).** *Statistics for Censored Environmental Data Using Minitab and R* (2nd ed.). Wiley. |
| 199 | +2. **Gilbert, R.O. (1987).** *Statistical Methods for Environmental Pollution Monitoring*. Wiley. |
| 200 | +3. **Hirsch, R.M., Slack, J.R., & Smith, R.A. (1982).** Techniques of trend analysis for monthly water quality data. *Water Resources Research*, 18(1), 107-121. |
| 201 | +4. **Mann, H.B. (1945).** Nonparametric tests against trend. *Econometrica*, 13(3), 245-259. |
| 202 | +5. **Sen, P.K. (1968).** Estimates of the regression coefficient based on a particular kind of rank correlation. *Journal of the American Statistical Association*, 63(324), 1379-1389. |
| 203 | +6. **Fraser, C., & Whitehead, A. L. (2022).** Continuous measures of confidence in direction of environmental trends at site and other spatial scales. *Environmental Challenges*, 9, 100601. |
| 204 | +7. **Fraser, C., Snelder, T., & Matthews, A. (2018).** State and trends of river water quality in the Manawatu-Whanganui region. Report for Horizons Regional Council. |
0 commit comments