Skip to content

Commit b7e6867

Browse files
committed
v0.2.0 - adding bootstrap method
1 parent a2339a0 commit b7e6867

File tree

64 files changed

+3791
-115
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

64 files changed

+3791
-115
lines changed

Examples/07_The_High_Censor_Rule/README.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -93,12 +93,12 @@ Interpretation: FALSE POSITIVE. The test found a decreasing trend because we
9393
substituted values based on the changing detection limit.
9494
9595
--- Test 2: Standard Robust Analysis (hicensor=False) ---
96-
Trend: no trend
96+
Trend: indeterminate
9797
Sen's Slope: nan
9898
Interpretation: CORRECT. The robust method ignores ambiguous pairs, finding no trend.
9999
100100
--- Test 3: High Censor Rule (hicensor=True) ---
101-
Trend: no trend
101+
Trend: indeterminate
102102
Sen's Slope: nan
103103
Analysis Notes: ['< 3 unique values', 'Long run of single value']
104104
Interpretation: CORRECT & CONSERVATIVE. By unifying the censoring level,
473 Bytes
Loading

Examples/15_Regional_Trend_Analysis/README.md

Lines changed: 6 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -149,12 +149,12 @@ Generating data for 5 sites...
149149
150150
--- Running Individual Trend Tests ---
151151
Individual Results:
152-
site s C slope trend
153-
0 Site_A 31.0 0.996355 0.463962 increasing
154-
1 Site_B 31.0 0.996355 0.351809 increasing
155-
2 Site_C 33.0 0.997896 0.503549 increasing
156-
3 Site_D 5.0 0.639743 0.064203 increasing
157-
4 Site_E 1.0 0.500000 0.003851 no trend
152+
site s C slope trend
153+
0 Site_A 31.0 0.996355 0.463962 increasing
154+
1 Site_B 31.0 0.996355 0.351809 increasing
155+
2 Site_C 33.0 0.997896 0.503549 increasing
156+
3 Site_D 5.0 0.639743 0.064203 increasing
157+
4 Site_E 1.0 0.500000 0.003851 indeterminate
158158
159159
--- Running Regional Trend Test ---
160160
Number of Sites (M): 5

Examples/23_High_Frequency_Data/README.md

Lines changed: 6 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -22,6 +22,8 @@ import numpy as np
2222
import pandas as pd
2323
import MannKS as mk
2424

25+
np.random.seed(42)
26+
2527
# --- Part 1: High Frequency Data (Minutes/Seconds) ---
2628
# Goal: Demonstrate that the package can handle very small time increments
2729
# and use `slope_scaling` to return meaningful results.
@@ -99,15 +101,15 @@ print(f"Confidence Interval: [{result_hourly.lower_ci:.4f}, {result_hourly.upper
99101
```text
100102
--- Part 1: High-Frequency Sensor Data (Seconds) ---
101103
Detected Trend: increasing
102-
Sen's Slope: 0.5001 (degrees per minute)
104+
Sen's Slope: 0.5003 (degrees per minute)
103105
Expected Slope: 0.5000
104-
Confidence Interval: [0.4994, 0.5008]
106+
Confidence Interval: [0.4997, 0.5008]
105107
106108
--- Part 2: Hourly Data with Diurnal Seasonality ---
107109
Detected Trend: increasing
108-
Sen's Slope: 0.0929 (units per day)
110+
Sen's Slope: 0.0895 (units per day)
109111
Expected Slope: 0.1000
110-
Confidence Interval: [0.0697, 0.1195]
112+
Confidence Interval: [0.0652, 0.1139]
111113
112114
```
113115

-147 Bytes
Loading

Examples/23_High_Frequency_Data/run_example.py

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -12,6 +12,8 @@
1212
import pandas as pd
1313
import MannKS as mk
1414
15+
np.random.seed(42)
16+
1517
# --- Part 1: High Frequency Data (Minutes/Seconds) ---
1618
# Goal: Demonstrate that the package can handle very small time increments
1719
# and use `slope_scaling` to return meaningful results.
-115 Bytes
Loading
Lines changed: 214 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,214 @@
1+
2+
# Example 29: Block Bootstrap for Autocorrelation
3+
4+
## The "Why": The Autocorrelation Problem
5+
Standard Mann-Kendall tests assume that data points are **independent** (like rolling a die).
6+
However, environmental data is often **autocorrelated**:
7+
* A hot day is likely to be followed by another hot day.
8+
* A drought year might influence groundwater levels for the next year.
9+
10+
**The Danger:** Autocorrelation reduces the "effective" amount of information in your data. If you ignore it, the standard test underestimates the variance and often reports a **statistically significant trend when none exists (False Positive).**
11+
12+
**The Solution:** Correcting for autocorrelation. We demonstrate three approaches:
13+
1. **Standard Test (Uncorrected):** The naive baseline.
14+
2. **Yue & Wang (2004) Correction:** Adjusts the sample size based on lag-1 autocorrelation.
15+
3. **Block Bootstrap:** A robust, non-parametric resampling method that preserves correlation structure.
16+
17+
## The "How": Code Walkthrough
18+
19+
We will simulate data with strong autocorrelation (AR1 process) but **no trend**, and compare the results. Note the **Confidence Intervals** in each case—uncorrected tests often have overly narrow intervals that exclude zero (indicating a trend), while corrected tests widen the interval.
20+
21+
### Step 1: Python Code
22+
```python
23+
import numpy as np
24+
import pandas as pd
25+
import MannKS as mk
26+
import matplotlib.pyplot as plt
27+
28+
# 1. Generate Autocorrelated Data (AR1 Process)
29+
# Autocorrelation (serial dependence) is common in environmental data.
30+
# A high value today often means a high value tomorrow, even without a long-term trend.
31+
# Standard Mann-Kendall assumes independence and often produces FALSE POSITIVES
32+
# (finding a 'trend' that is just persistence).
33+
34+
np.random.seed(45)
35+
n = 100
36+
t = pd.date_range('2000-01-01', periods=n, freq='ME')
37+
38+
# Generate AR(1) noise: x[t] = 0.8 * x[t-1] + noise
39+
# High correlation (phi=0.8) makes the data "wander" and look like a trend.
40+
x = np.zeros(n)
41+
x[0] = np.random.normal(0, 1)
42+
for i in range(1, n):
43+
x[i] = 0.8 * x[i-1] + np.random.normal(0, 1)
44+
45+
# Add a very weak trend (or no trend)
46+
trend_signal = 0.0 * np.arange(n)
47+
x = x + trend_signal
48+
49+
print(f"Data generated: {n} points with AR(1) coefficient ~0.8 and NO trend.")
50+
51+
# 2. Run Standard Trend Test (The "Naive" Approach)
52+
print("\n--- Test 1: Standard Mann-Kendall (Assuming Independence) ---")
53+
result_std = mk.trend_test(
54+
x, t,
55+
autocorr_method='none',
56+
slope_scaling='year',
57+
plot_path='plot_std.png'
58+
)
59+
60+
print(f"Trend: {result_std.trend}")
61+
print(f"p-value: {result_std.p:.4f}")
62+
print(f"Significant? {result_std.h}")
63+
print(f"Slope: {result_std.slope:.4f} {result_std.slope_units}")
64+
print(f"Confidence Interval (95%): [{result_std.lower_ci:.4f}, {result_std.upper_ci:.4f}]")
65+
print("Note: If p < 0.05, this is a False Positive caused by autocorrelation.")
66+
67+
68+
# 3. Run Yue & Wang (2004) Correction
69+
# This method adjusts the variance of the Mann-Kendall statistic based on the
70+
# effective sample size (ESS), which is reduced by autocorrelation.
71+
# It is a parametric correction assuming an AR(1) process.
72+
print("\n--- Test 2: Yue & Wang (2004) Correction ---")
73+
result_yw = mk.trend_test(
74+
x, t,
75+
autocorr_method='yue_wang',
76+
slope_scaling='year'
77+
)
78+
79+
print(f"Trend: {result_yw.trend}")
80+
print(f"p-value: {result_yw.p:.4f}")
81+
print(f"Significant? {result_yw.h}")
82+
print(f"Slope: {result_yw.slope:.4f} {result_yw.slope_units}")
83+
print(f"Confidence Interval (95%): [{result_yw.lower_ci:.4f}, {result_yw.upper_ci:.4f}]")
84+
print(f"Effective Sample Size: {result_yw.n_effective:.1f} (vs {n} original)")
85+
86+
87+
# 4. Run Block Bootstrap Test (The "Robust" Approach)
88+
# The block bootstrap method preserves the autocorrelation structure by resampling
89+
# blocks of data instead of individual points. This correctly estimates the
90+
# significance even when data is dependent.
91+
print("\n--- Test 3: Block Bootstrap Mann-Kendall ---")
92+
result_boot = mk.trend_test(
93+
x, t,
94+
autocorr_method='block_bootstrap',
95+
n_bootstrap=1000,
96+
slope_scaling='year',
97+
plot_path='plot_boot.png'
98+
)
99+
100+
print(f"Trend: {result_boot.trend}")
101+
print(f"p-value: {result_boot.p:.4f}")
102+
print(f"Significant? {result_boot.h}")
103+
print(f"Slope: {result_boot.slope:.4f} {result_boot.slope_units}")
104+
print(f"Confidence Interval (95%): [{result_boot.lower_ci:.4f}, {result_boot.upper_ci:.4f}]")
105+
print(f"Autocorrelation (ACF1): {result_boot.acf1:.3f}")
106+
print(f"Block Size Used: {result_boot.block_size_used}")
107+
108+
109+
# 5. Seasonal Block Bootstrap
110+
# For seasonal data, we must bootstrap 'cycles' (e.g., years) to preserve seasonality.
111+
print("\n--- Test 4: Seasonal Block Bootstrap ---")
112+
113+
# Generate seasonal data with autocorrelation between years
114+
n_years = 20
115+
dates_seas = pd.date_range('2000-01-01', periods=n_years*12, freq='ME')
116+
seasonal_pattern = np.tile(np.sin(np.linspace(0, 2*np.pi, 12)), n_years)
117+
118+
# Autocorrelated noise (inter-annual persistence)
119+
noise_seas = np.zeros(n_years*12)
120+
current_noise = 0
121+
for i in range(n_years*12):
122+
current_noise = 0.7 * current_noise + np.random.normal(0, 1)
123+
noise_seas[i] = current_noise
124+
125+
x_seas = seasonal_pattern + noise_seas # No trend
126+
127+
result_seas = mk.seasonal_trend_test(
128+
x_seas, dates_seas,
129+
period=12,
130+
autocorr_method='block_bootstrap',
131+
slope_scaling='year',
132+
plot_path='plot_seas_boot.png'
133+
)
134+
135+
print(f"Seasonal Trend: {result_seas.trend}")
136+
print(f"p-value: {result_seas.p:.4f}")
137+
print(f"Significant? {result_seas.h}")
138+
print(f"Slope: {result_seas.slope:.4f} {result_seas.slope_units}")
139+
print(f"Confidence Interval (95%): [{result_seas.lower_ci:.4f}, {result_seas.upper_ci:.4f}]")
140+
print(f"Block Size Used: {result_seas.block_size_used} (cycles)")
141+
```
142+
143+
### Step 2: Text Output
144+
```text
145+
Data generated: 100 points with AR(1) coefficient ~0.8 and NO trend.
146+
147+
--- Test 1: Standard Mann-Kendall (Assuming Independence) ---
148+
Trend: increasing
149+
p-value: 0.0013
150+
Significant? True
151+
Slope: 0.2300 units per year
152+
Confidence Interval (95%): [0.0917, 0.3704]
153+
Note: If p < 0.05, this is a False Positive caused by autocorrelation.
154+
155+
--- Test 2: Yue & Wang (2004) Correction ---
156+
Trend: increasing
157+
p-value: 0.2447
158+
Significant? False
159+
Slope: 0.2300 units per year
160+
Confidence Interval (95%): [-0.1672, 0.6631]
161+
Effective Sample Size: 13.1 (vs 100 original)
162+
163+
--- Test 3: Block Bootstrap Mann-Kendall ---
164+
Trend: increasing
165+
p-value: 0.1550
166+
Significant? False
167+
Slope: 0.2300 units per year
168+
Confidence Interval (95%): [-0.0598, 0.8128]
169+
Autocorrelation (ACF1): 0.809
170+
Block Size Used: 10
171+
172+
--- Test 4: Seasonal Block Bootstrap ---
173+
Seasonal Trend: increasing
174+
p-value: 0.4400
175+
Significant? False
176+
Slope: 0.0267 units per year
177+
Confidence Interval (95%): [-0.0339, 0.0828]
178+
Block Size Used: 3 (cycles)
179+
180+
```
181+
182+
## Interpreting the Results
183+
184+
### 1. The False Positive (Standard Test)
185+
* In Test 1, the standard Mann-Kendall test likely reported a low p-value (e.g., $p < 0.05$) and claimed a significant trend.
186+
* The **Confidence Interval** for the slope likely excludes zero, falsely confirming a trend.
187+
* **Why?** The data "wandered" due to persistence, and the test mistook this for a deterministic trend.
188+
189+
### 2. Yue & Wang Correction (Parametric)
190+
* In Test 2, `autocorr_method='yue_wang'` was used.
191+
* It reduced the **Effective Sample Size** (e.g., from 100 down to ~15-20), reflecting that highly correlated data contains less independent information.
192+
* The p-value increases, and the confidence interval widens, correctly identifying 'no trend'.
193+
194+
### 3. Block Bootstrap (Non-Parametric)
195+
* In Test 3, `autocorr_method='block_bootstrap'` was used.
196+
* This method builds the null distribution by shuffling blocks of data.
197+
* The **Confidence Interval** is estimated directly from the bootstrap distribution. It should be wide enough to include zero, correctly indicating no significant trend.
198+
199+
### 4. Seasonal Application
200+
* Test 4 shows how to apply this to seasonal data using `seasonal_trend_test`.
201+
* Here, the bootstrap resamples entire **cycles** (e.g., years) to preserve the seasonal pattern while accounting for inter-annual correlation.
202+
203+
## Visual Results
204+
205+
### Standard (Naive) Test
206+
![Standard Plot](plot_std.png)
207+
*Notice how the red trend line might look convincing, but the data points are clustered above/below the line for long periods? That's autocorrelation.*
208+
209+
### Bootstrap (Robust) Test
210+
![Bootstrap Plot](plot_boot.png)
211+
*The trend line and data are the same, but the statistical conclusion (p-value) in the title/results is now correct.*
212+
213+
## Key Takeaway
214+
If you suspect your data has serial correlation (common in daily or monthly environmental data), use `autocorr_method='block_bootstrap'` (or `'auto'`) to avoid false positives.
58.2 KB
Loading
58.4 KB
Loading

0 commit comments

Comments
 (0)