Add files via upload

lukefullard · web-flow · commit 4b9080926821 · 2026-01-01T13:46:20.000+13:00
diff --git a/README.md b/README.md
@@ -1,2 +1,204 @@
-# MannKS
-MannKS (Mann-Kendall Sen slope) - Robust Trend Analysis in Python
+<div align="center">
+  <img src="assets/logo.png" alt="MannKS Logo" width="600"/>
+
+  # MannKS
+  ### (Mann-Kendall Sen)
+
+  **Robust Trend Analysis in Python**
+</div>
+
+---
+
+## 📦 Installation
+
+```bash
+pip install -r requirements.txt
+pip install -e .
+```
+
+**Requirements:** Python 3.7+, NumPy, Pandas, SciPy, Matplotlib
+
+---
+
+## ✨ What is MannKS?
+
+**MannKS** (Mann-Kendall Sen) is a Python package for detecting trends in time series data using non-parametric methods. It's specifically designed for environmental monitoring, water quality analysis, and other fields where data is messy, irregular, or contains detection limits.
+
+### When to Use MannKS
+
+Use this package when your data has:
+- **Irregular sampling intervals** (daily → monthly → quarterly)
+- **Censored values** (measurements like `<5` or `>100`)
+- **Seasonal patterns** you need to account for
+- **No normal distribution** (non-parametric methods don't require it)
+- **Small to moderate sample sizes** (n < 5,000 recommended)
+
+**Don't use** for highly autocorrelated data (test first) or if you need n > 46,340 observations.
+
+---
+
+## 🚀 Quick Start
+
+```python
+import pandas as pd
+from MannKS import prepare_censored_data, trend_test
+
+# 1. Prepare data with censored values
+# Converts strings like '<5' into a structured format
+values = [10, 12, '<5', 14, 15, 18, 20, '<5', 25, 30]
+dates = pd.date_range(start='2020-01-01', periods=len(values), freq='ME')
+data = prepare_censored_data(values)
+
+# 2. Run trend test
+# slope_scaling converts slope from "per second" to "per year"
+result = trend_test(
+    x=data,
+    t=dates,
+    slope_scaling='year',
+    x_unit='mg/L',
+    plot_path='trend.png'
+)
+
+# 3. Interpret results
+print(f"Trend: {result.classification}")
+print(f"Slope: {result.slope:.2f} {result.slope_units}")
+print(f"Confidence: {result.C:.2%}")
+```
+
+**Output:**
+```
+Trend: Highly Likely Increasing
+Slope: 24.57 mg/L per year
+Confidence: 98.47%
+```
+
+![Trend Analysis Plot](assets/quick_start_trend.png)
+
+---
+
+## 🎯 Key Features
+
+### Core Functionality
+- **Mann-Kendall Trend Test**: Detect monotonic trends with statistical significance
+- **Sen's Slope Estimator**: Calculate trend magnitude with confidence intervals
+- **Seasonal Analysis**: Separate seasonal signals from long-term trends
+- **Regional Aggregation**: Combine results across multiple monitoring sites
+
+### Data Handling
+- **Censored Data Support**: Native handling of detection limits (`<5`, `>100`)
+  - Three methods: Standard, LWP-compatible, Akritas-Theil-Sen (ATS)
+  - Handles left-censored, right-censored, and mixed censoring
+- **Unequal Spacing**: Uses actual time differences (not just rank order)
+- **Missing Data**: Automatically handles NaN values and missing seasons
+- **Temporal Aggregation**: Multiple strategies for high-frequency data
+
+### Statistical Features
+- **Continuous Confidence**: Reports likelihood ("Highly Likely Increasing") not just p-values
+- **Data Quality Checks**: Automatic warnings for tied values, long runs, insufficient data
+- **Robust Methods**: ATS estimator for heavily censored data
+- **Flexible Testing**: Kendall's Tau-a or Tau-b, custom significance levels
+
+---
+
+## 📊 Example Use Cases
+
+### Seasonal Water Quality Trend
+```python
+from MannKS import seasonal_trend_test, check_seasonality
+
+# Check if seasonality exists
+seasonality = check_seasonality(x=data, t=dates, period=12, season_type='month')
+print(f"Seasonal pattern detected: {seasonality.is_seasonal}")
+
+# Run seasonal trend test
+result = seasonal_trend_test(
+    x=data,
+    t=dates,
+    period=12,
+    season_type='month',
+    agg_method='robust_median',  # Aggregates multiple samples per month
+    slope_scaling='year'
+)
+```
+
+### Regional Analysis Across Sites
+```python
+from MannKS import regional_test
+
+# Run trend tests for each site
+site_results = []
+for site in ['Site_A', 'Site_B', 'Site_C']:
+    result = trend_test(x=site_data[site], t=dates)
+    site_results.append({
+        'site': site,
+        's': result.s,
+        'C': result.C
+    })
+
+# Aggregate regional trend
+regional = regional_test(
+    trend_results=pd.DataFrame(site_results),
+    time_series_data=all_site_data,
+    site_col='site'
+)
+print(f"Regional trend: {regional.DT}, confidence: {regional.CT:.2%}")
+```
+
+---
+
+## ⚠️ Important Limitations
+
+### Sample Size
+- **Recommended maximum: n = 5,000** (triggers memory warning)
+- **Hard limit: n = 46,340** (prevents integer overflow)
+- For larger datasets, use `regional_test()` to aggregate multiple smaller sites
+
+### Statistical Assumptions
+- **Independence**: Data points must be serially independent
+  - Autocorrelation violates this and causes spurious significance
+  - Pre-test with ACF or use block bootstrap methods if autocorrelated
+- **Monotonic trend**: Cannot detect U-shaped or cyclical patterns
+- **Homogeneous variance**: Most powerful when variance is constant over time
+
+---
+
+## 📚 Documentation
+
+### Detailed Guides
+- **[Trend Test Parameters](./Examples/Detailed_Guides/trend_test_parameters_guide.md)** - Full parameter reference
+- **[Seasonal Analysis](./Examples/Detailed_Guides/seasonal_trend_test_parameters_guide.md)** - Season types and aggregation
+- **[Regional Tests](./Examples/Detailed_Guides/regional_test_guide/README.md)** - Multi-site aggregation
+- **[Analysis Notes](./Examples/Detailed_Guides/analysis_notes_guide.md)** - Interpreting data quality warnings
+- **[Trend Classification](./Examples/Detailed_Guides/trend_classification_guide.md)** - Understanding confidence levels
+
+### Examples
+The [Examples](./Examples/README.md) folder contains step-by-step tutorials from basic to advanced usage.
+
+---
+
+## 🔬 Validation
+
+Extensively validated against:
+- **LWP-TRENDS R script** (34 test cases, 99%+ agreement)
+- **NADA2 R package** (censored data methods)
+- Edge cases: missing data, tied values, all-censored data, insufficient samples
+
+See [validation/](./validation/) for detailed comparison reports.
+
+---
+
+## 🙏 Acknowledgments
+
+This package is heavily inspired by the excellent work of **[LandWaterPeople (LWP)](https://landwaterpeople.co.nz/)**. The robust censored data handling and regional aggregation methods are based on their R scripts and methodologies.
+
+---
+
+## 📖 References
+
+1.  **Helsel, D.R. (2012).** *Statistics for Censored Environmental Data Using Minitab and R* (2nd ed.). Wiley.
+2.  **Gilbert, R.O. (1987).** *Statistical Methods for Environmental Pollution Monitoring*. Wiley.
+3.  **Hirsch, R.M., Slack, J.R., & Smith, R.A. (1982).** Techniques of trend analysis for monthly water quality data. *Water Resources Research*, 18(1), 107-121.
+4.  **Mann, H.B. (1945).** Nonparametric tests against trend. *Econometrica*, 13(3), 245-259.
+5.  **Sen, P.K. (1968).** Estimates of the regression coefficient based on a particular kind of rank correlation. *Journal of the American Statistical Association*, 63(324), 1379-1389.
+6.  **Fraser, C., & Whitehead, A. L. (2022).** Continuous measures of confidence in direction of environmental trends at site and other spatial scales. *Environmental Challenges*, 9, 100601.
+7.  **Fraser, C., Snelder, T., & Matthews, A. (2018).** State and trends of river water quality in the Manawatu-Whanganui region. Report for Horizons Regional Council.
diff --git a/pyproject.toml b/pyproject.toml
@@ -0,0 +1,41 @@
+[build-system]
+requires = ["setuptools"]
+build-backend = "setuptools.build_meta"
+
+[project]
+name = "MannKS"
+version = "0.1.0"
+description = "Non-parametric trend analysis for unequally spaced time series with censored data"
+authors = [{name = "Luke Fullard"}]
+readme = "README.md"
+requires-python = ">=3.7"
+license = {text = "MIT"}
+keywords = ["mann-kendall", "sen-slope", "trend-analysis", "censored-data", "time-series"]
+classifiers = [
+    "Development Status :: 4 - Beta",
+    "Intended Audience :: Science/Research",
+    "Topic :: Scientific/Engineering :: Mathematics",
+    "Programming Language :: Python :: 3.7",
+    "Programming Language :: Python :: 3.8",
+    "Programming Language :: Python :: 3.9",
+    "Programming Language :: Python :: 3.10",
+]
+
+dependencies = [
+    "numpy>=1.19.0",
+    "pandas>=1.1.0",
+    "scipy>=1.5.0",
+    "matplotlib>=3.3.0",
+    "seaborn>=0.11.0",
+]
+
+[project.optional-dependencies]
+dev = ["pytest>=6.0", "pytest-cov"]
+
+[project.urls]
+Homepage = "https://github.com/LukeAFullard/MannKS"
+Documentation = "https://github.com/LukeAFullard/MannKS/tree/main/Examples"
+Repository = "https://github.com/LukeAFullard/MannKS"
+
+[tool.setuptools]
+packages = ["MannKS"]