LukeAFullard
diff --git a/‎Examples/01_Getting_Started_Inspecting_Data/README.md‎
Lines changed: 112 additions & 0 deletions b/‎Examples/01_Getting_Started_Inspecting_Data/README.md‎
Lines changed: 112 additions & 0 deletions
diff --git a/‎Examples/01_Getting_Started_Inspecting_Data/inspection_plots.png‎
394 KB b/‎Examples/01_Getting_Started_Inspecting_Data/inspection_plots.png‎
394 KB
diff --git a/‎Examples/01_Getting_Started_Inspecting_Data/run_example.py‎
Lines changed: 129 additions & 0 deletions b/‎Examples/01_Getting_Started_Inspecting_Data/run_example.py‎
Lines changed: 129 additions & 0 deletions
diff --git a/‎Examples/02_Basic_Non_Seasonal_Numeric/README.md‎
Lines changed: 99 additions & 0 deletions b/‎Examples/02_Basic_Non_Seasonal_Numeric/README.md‎
Lines changed: 99 additions & 0 deletions
@@ -0,0 +1,112 @@
+
+# Example 1: Getting Started - Inspecting Your Data
+
+## The "Why": Verify Before You Analyze
+In environmental data analysis, datasets are rarely perfect. They often contain:
+*   **Missing values (Gaps):** Sensors fail, samples get lost.
+*   **Censored data:** Concentrations fall below laboratory detection limits (e.g., `< 0.5 mg/L`).
+*   **Irregular sampling:** Samples might be taken daily in summer but monthly in winter.
+
+Running a trend test blindly on such data can lead to misleading results. The `MannKS.inspect_trend_data` function is your "sanity check."
+
+## The "How": Code Walkthrough
+
+In this example, we generate a synthetic "messy" dataset and inspect it. We use `return_summary=True` to get a programmatic report on data availability across different potential time increments (monthly, quarterly, etc.).
+
+### Step 1: Python Code
+```python
+import numpy as np
+import pandas as pd
+import MannKS as mk
+
+# 1. Generate Synthetic Data
+# We create a 5-year monthly dataset (60 points) with some "messy" real-world features:
+# - An underlying upward trend.
+# - Some random noise.
+# - Missing data (NaNs).
+# - Censored data (values below detection limit).
+np.random.seed(42)
+n_years = 5
+dates = pd.date_range(start='2020-01-01', periods=n_years*12, freq='ME')
+t = np.arange(len(dates))
+values = 0.1 * t + np.random.normal(0, 1, len(t)) + 10
+censored_mask = values < 10.5
+values_str = values.astype(str)
+values_str[censored_mask] = '<' + np.round(values[censored_mask] + 0.5, 1).astype(str)
+values_str[10:13] = np.nan
+values_str[45] = np.nan
+
+# 2. Pre-process the Data
+# Raw environmental data often comes as strings (e.g., '< 0.5', '12.4').
+# Standard statistical functions fail on these strings.
+# The `prepare_censored_data` function is critical because it:
+#   1. Parses the strings to identify censored values (detects '<').
+#   2. Separates the data into a numeric 'value' column and a boolean 'censored' column.
+#   3. Handles multiple detection limits automatically.
+df = mk.prepare_censored_data(values_str)
+df['date'] = dates
+
+# 3. Inspect the Data
+# Before running a trend test, we must verify the data is suitable.
+# The `inspect_trend_data` function acts as a diagnostic tool.
+# It checks for:
+#   - Data Availability: Do we have enough data points?
+#   - Time Structure: Is the data monthly? Quarterly? Irregular?
+#   - Gaps: Are there long periods with no data?
+#   - Censoring: What percentage of data is non-detect?
+# We request `return_summary=True` to get the statistical table back.
+print("Running Data Inspection...")
+result = mk.inspect_trend_data(
+    data=df,
+    time_col='date',
+    return_summary=True,
+    plot=True,
+    plot_path='inspection_plots.png'
+)
+
+# Print the availability summary
+print("\nData Availability Summary:")
+print(result.summary.to_markdown(index=False))
+```
+
+### Step 2: Text Output
+The function returns a summary DataFrame, which we printed:
+
+```text
+Running Data Inspection...
+
+Data Availability Summary:
+| increment   |   n_obs |   n_year |   prop_year |   n_incr_year |   prop_incr_year | data_ok   |
+|:------------|--------:|---------:|------------:|--------------:|-----------------:|:----------|
+| daily       |      56 |        5 |           1 |            56 |        0.0306849 | False     |
+| weekly      |      56 |        5 |           1 |            56 |        0.215385  | False     |
+| monthly     |      56 |        5 |           1 |            56 |        0.933333  | True      |
+| bi-monthly  |      56 |        5 |           1 |            29 |        0.966667  | True      |
+| quarterly   |      56 |        5 |           1 |            20 |        1         | True      |
+| bi-annually |      56 |        5 |           1 |            10 |        1         | True      |
+| annually    |      56 |        5 |           1 |             5 |        1         | True      |
+
+```
+
+## Interpreting the Results
+
+### 1. Statistical Summary (Text Output)
+The table above evaluates different time increments (monthly, quarterly, etc.) to see if they are suitable for analysis:
+*   **`increment`**: The time unit being tested.
+*   **`prop_year`**: Fraction of years that have at least one sample. High values (near 1.0) are good.
+*   **`prop_incr_year`**: Fraction of expected periods (e.g., 12 months/year) that have data.
+*   **`data_ok`**: A boolean flag suggesting if this increment is viable for seasonal analysis.
+    *   For **monthly** analysis, we see high coverage, confirming our data is suitable despite the gaps.
+
+### 2. Visual Diagnostics (Plots)
+The function generated `inspection_plots.png`:
+
+![Inspection Plots](inspection_plots.png)
+
+*   **Top-Left (Time Series):** Visualizes the trend, gaps, and censored values (red dots).
+*   **Top-Right (Value Matrix):** Heatmap of values (Row=Year, Col=Month). Useful for spotting seasonal blocks.
+*   **Bottom-Left (Censoring Matrix):** Heatmap of censored data locations.
+*   **Bottom-Right (Sample Count Matrix):** Heatmap of sampling frequency.
+
+## Conclusion
+We have confirmed our data is messy but sufficient for a **monthly** trend analysis. We are ready to proceed!
@@ -0,0 +1,129 @@
+import os
+import io
+import contextlib
+import numpy as np
+import pandas as pd
+import MannKS as mk
+import matplotlib.pyplot as plt
+
+# --- 1. Define the Example Code as a String ---
+example_code = """
+import numpy as np
+import pandas as pd
+import MannKS as mk
+
+# 1. Generate Synthetic Data
+# We create a 5-year monthly dataset (60 points) with some "messy" real-world features:
+# - An underlying upward trend.
+# - Some random noise.
+# - Missing data (NaNs).
+# - Censored data (values below detection limit).
+np.random.seed(42)
+n_years = 5
+dates = pd.date_range(start='2020-01-01', periods=n_years*12, freq='ME')
+t = np.arange(len(dates))
+values = 0.1 * t + np.random.normal(0, 1, len(t)) + 10
+censored_mask = values < 10.5
+values_str = values.astype(str)
+values_str[censored_mask] = '<' + np.round(values[censored_mask] + 0.5, 1).astype(str)
+values_str[10:13] = np.nan
+values_str[45] = np.nan
+
+# 2. Pre-process the Data
+# Raw environmental data often comes as strings (e.g., '< 0.5', '12.4').
+# Standard statistical functions fail on these strings.
+# The `prepare_censored_data` function is critical because it:
+#   1. Parses the strings to identify censored values (detects '<').
+#   2. Separates the data into a numeric 'value' column and a boolean 'censored' column.
+#   3. Handles multiple detection limits automatically.
+df = mk.prepare_censored_data(values_str)
+df['date'] = dates
+
+# 3. Inspect the Data
+# Before running a trend test, we must verify the data is suitable.
+# The `inspect_trend_data` function acts as a diagnostic tool.
+# It checks for:
+#   - Data Availability: Do we have enough data points?
+#   - Time Structure: Is the data monthly? Quarterly? Irregular?
+#   - Gaps: Are there long periods with no data?
+#   - Censoring: What percentage of data is non-detect?
+# We request `return_summary=True` to get the statistical table back.
+print("Running Data Inspection...")
+result = mk.inspect_trend_data(
+    data=df,
+    time_col='date',
+    return_summary=True,
+    plot=True,
+    plot_path='inspection_plots.png'
+)
+
+# Print the availability summary
+print("\\nData Availability Summary:")
+print(result.summary.to_markdown(index=False))
+"""
+
+# --- 2. Execute the Code and Capture Output ---
+output_buffer = io.StringIO()
+
+with contextlib.redirect_stdout(output_buffer):
+    local_scope = {}
+    exec(example_code, globals(), local_scope)
+
+captured_output = output_buffer.getvalue()
+
+# --- 3. Generate the README.md ---
+readme_content = f"""
+# Example 1: Getting Started - Inspecting Your Data
+
+## The "Why": Verify Before You Analyze
+In environmental data analysis, datasets are rarely perfect. They often contain:
+*   **Missing values (Gaps):** Sensors fail, samples get lost.
+*   **Censored data:** Concentrations fall below laboratory detection limits (e.g., `< 0.5 mg/L`).
+*   **Irregular sampling:** Samples might be taken daily in summer but monthly in winter.
+
+Running a trend test blindly on such data can lead to misleading results. The `MannKS.inspect_trend_data` function is your "sanity check."
+
+## The "How": Code Walkthrough
+
+In this example, we generate a synthetic "messy" dataset and inspect it. We use `return_summary=True` to get a programmatic report on data availability across different potential time increments (monthly, quarterly, etc.).
+
+### Step 1: Python Code
+```python
+{example_code.strip()}
+```
+
+### Step 2: Text Output
+The function returns a summary DataFrame, which we printed:
+
+```text
+{captured_output}
+```
+
+## Interpreting the Results
+
+### 1. Statistical Summary (Text Output)
+The table above evaluates different time increments (monthly, quarterly, etc.) to see if they are suitable for analysis:
+*   **`increment`**: The time unit being tested.
+*   **`prop_year`**: Fraction of years that have at least one sample. High values (near 1.0) are good.
+*   **`prop_incr_year`**: Fraction of expected periods (e.g., 12 months/year) that have data.
+*   **`data_ok`**: A boolean flag suggesting if this increment is viable for seasonal analysis.
+    *   For **monthly** analysis, we see high coverage, confirming our data is suitable despite the gaps.
+
+### 2. Visual Diagnostics (Plots)
+The function generated `inspection_plots.png`:
+
+![Inspection Plots](inspection_plots.png)
+
+*   **Top-Left (Time Series):** Visualizes the trend, gaps, and censored values (red dots).
+*   **Top-Right (Value Matrix):** Heatmap of values (Row=Year, Col=Month). Useful for spotting seasonal blocks.
+*   **Bottom-Left (Censoring Matrix):** Heatmap of censored data locations.
+*   **Bottom-Right (Sample Count Matrix):** Heatmap of sampling frequency.
+
+## Conclusion
+We have confirmed our data is messy but sufficient for a **monthly** trend analysis. We are ready to proceed!
+"""
+
+with open(os.path.join(os.path.dirname(__file__), 'README.md'), 'w') as f:
+    f.write(readme_content)
+
+print("Example 1 generated successfully.")
@@ -0,0 +1,99 @@
+
+# Example 2: Basic Non-Seasonal Trend Test (Numeric Time)
+
+## The "Why": The Fundamental Trend Test
+This example demonstrates the core function of the package: `mk.trend_test`.
+It answers the most basic question: **"Is there a statistically significant upward or downward trend in my data?"**
+
+We use the **Mann-Kendall test** for significance and **Sen's Slope** for magnitude because they are "non-parametric." This means:
+1.  They don't assume your data follows a bell curve (normal distribution).
+2.  They are robust to outliers (one crazy high value won't ruin the result).
+3.  They handle missing data gracefully.
+
+## The "How": Code Walkthrough
+
+We analyze a simple dataset where time is represented by numeric years (integers).
+
+### Step 1: Python Code
+```python
+import numpy as np
+import pandas as pd
+import MannKS as mk
+
+# 1. Generate Synthetic Data
+# We create a simple dataset with 10 yearly observations.
+# The time vector 't' is just a sequence of integers (years).
+# The value vector 'x' has a clear upward trend.
+t = np.arange(2000, 2011)  # Years 2000 to 2010
+x = np.array([5.1, 5.5, 5.9, 6.2, 6.8, 7.1, 7.5, 7.9, 8.2, 8.5, 9.0])
+
+print(f"Time (t): {t}")
+print(f"Values (x): {x}")
+
+# 2. Run the Trend Test
+# The `trend_test` function is the core of the package. It performs two key statistical tasks:
+#   A. Mann-Kendall Test: Checks *if* there is a trend (Significance).
+#      - It compares every pair of data points to see if they increase or decrease.
+#   B. Sen's Slope Estimator: Calculates *how strong* the trend is (Magnitude).
+#      - It finds the median of all pairwise slopes.
+# We pass 'plot_path' to automatically generate a visualization of these results.
+print("\nRunning Mann-Kendall Trend Test...")
+result = mk.trend_test(x, t, plot_path='trend_plot.png')
+
+# 3. Inspect the Results
+# The function returns a namedtuple with all statistical metrics.
+# Key fields include:
+#   - result.trend: A basic description ('increasing', 'decreasing', 'no trend').
+#   - result.classification: A more nuanced category (e.g., 'Likely Increasing') based on confidence.
+#   - result.p: The p-value (significance).
+#   - result.slope: The magnitude of change per unit time.
+print("\n--- Trend Test Results ---")
+print(f"Basic Trend: {result.trend} (Confidence: {result.C:.1%})")
+print(f"Classification: {result.classification}")
+print(f"Kendall's S: {result.s}")
+print(f"p-value: {result.p:.4f}")
+print(f"Sen's Slope: {result.slope:.4f}")
+print(f"Confidence Interval: [{result.lower_ci:.4f}, {result.upper_ci:.4f}]")
+```
+
+### Step 2: Text Output
+```text
+Time (t): [2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010]
+Values (x): [5.1 5.5 5.9 6.2 6.8 7.1 7.5 7.9 8.2 8.5 9. ]
+
+Running Mann-Kendall Trend Test...
+
+--- Trend Test Results ---
+Basic Trend: increasing (Confidence: 100.0%)
+Classification: Highly Likely Increasing
+Kendall's S: 55.0
+p-value: 0.0000
+Sen's Slope: 0.3889
+Confidence Interval: [0.3667, 0.4000]
+
+```
+
+## Interpreting the Results
+
+### 1. Statistical Results
+*   **Basic Trend (Increasing)**: The test detected an upward trend.
+*   **Classification (Highly Likely Increasing)**: The package assigns a descriptive category based on the confidence level (`result.C`).
+    *   **Increasing/Decreasing**: High confidence (≥ 90% or 95% depending on `alpha`).
+    *   **Likely Increasing/Decreasing**: Moderate confidence (e.g., 85-90%).
+    *   **Stable/No Trend**: Low confidence.
+*   **Confidence (result.C) (100.0%)**: This is derived from the p-value (`1 - p/2` for increasing trends). It means we are very certain this isn't just random noise.
+*   **Kendall's S (55.0)**: This is the raw score. It implies that when comparing all possible pairs of data points, 55 more pairs were increasing than decreasing. A positive number indicates growth.
+*   **p-value (0.0000)**: The probability that this trend happened by random chance is virtually zero. Standard practice considers $p < 0.05$ as significant.
+*   **Sen's Slope (0.3889)**: The median rate of change. Since our time unit is "years", this means the value increases by roughly **0.39 units per year**.
+
+### 2. Visual Results (`trend_plot.png`)
+The function automatically generated this plot:
+
+![Trend Plot](trend_plot.png)
+
+*   **Blue Dots**: The actual data points.
+*   **Solid Line**: The Sen's Slope trend line. Notice it passes through the "center of gravity" of the data but doesn't necessarily hit the mean.
+*   **Shaded Area**: The 90% confidence interval (default `alpha=0.1`). If the trend is significant, the edges of this zone usually won't cross zero (flat).
+
+## Key Takeaway
+For simple numeric time series (years, index numbers), `mk.trend_test(x, t)` is all you need. It provides the "Yes/No" (significance), the "How Much" (slope), and a user-friendly classification.