docs: add LLM demos documentation; feat(examples): add runnable mock-safe demos

GOESTERN-1035771 · GOESTERN-1035771 · commit 0749243f354d · 2025-12-09T15:50:44.000+01:00
diff --git a/docs/LLM_DEMOS.md b/docs/LLM_DEMOS.md
@@ -0,0 +1,92 @@
+LLM Demos and Cache Documentation
+
+Overview
+
+This document describes the lightweight runnable demos added under `examples/`.
+They are intentionally safe to run in minimal environments: each demo tries to
+use the real library classes (OpenAIClassifier, FusionEnsemble, AutoFusion, etc.)
+and falls back to simple mocks when those classes or credentials are not
+available.
+
+Files of interest (new/renamed)
+
+- `examples/cache_usage_demo.py` — Demonstrates writing and discovering simple
+  LLM cache JSON files. Writes a small mock cache under `cache/demo_llm_cache/`.
+
+- `examples/ensemble_cache_interrupt_demo.py` — Creates small val/test sets and
+  writes simple mock cache files under `cache/mock_ensemble/val` and
+  `cache/mock_ensemble/test`.
+
+- `examples/llm_cache_mock.py` — Existing example that exercises the
+  `LLMPredictionCache` implementation. If the real `prediction_cache` module is
+  available it will use it; otherwise it will raise at import-time. This demo
+  is left as an integration-oriented example.
+
+- `examples/minimal_precache_demo.py` — Minimal precache flow that tries to use
+  a real LLM (OpenAI/DeepSeek) and falls back to `MockLLM`. It uses Fusion helper
+  `_save_cached_llm_predictions` to write canonical cache JSON files.
+
+- `examples/test_multilabel_autofusion.py` — Runnable multi-label AutoFusion
+  demo; falls back to `MockAutoFusion` if `AutoFusionClassifier` is unavailable.
+
+- `examples/test_singlelabel_ml.py` — Runnable single-label ML-only demo; uses
+  `RoBERTaClassifier` if available, otherwise `MockML`.
+
+- `examples/test_singlelabel_autofusion.py` — Runnable single-label AutoFusion
+  demo with a safe fallback.
+
+Why these demos exist
+
+- Provide quick examples for contributors to run locally without needing
+  expensive GPU access or API credentials.
+- Demonstrate cache file formats and helper functions for saving and
+  discovering cached LLM predictions.
+- Provide reproducible scripted flows for CI smoke checks (syntax + import).
+
+How to run (quick)
+
+Run a single demo with Python:
+
+```bash
+python examples/test_singlelabel_ml.py
+python examples/cache_usage_demo.py
+python examples/test_multilabel_autofusion.py
+```
+
+Running under a minimal environment will activate mock fallbacks — this
+ensures the demos are useful even without model weights or API keys.
+
+Cache helper summary
+
+- `LLMPredictionCache` (in `textclassify/llm/prediction_cache.py`) provides
+  low-level operations to store, find, and load cached predictions.
+- Fusion helpers (e.g. `_save_cached_llm_predictions`) save canonical JSON files
+  used by the ensemble utilities. The demo scripts call these helpers when
+  available and otherwise write simple JSON files with a `predictions` list.
+
+Notes for maintainers
+
+- Keep the demos import-safe: avoid executing heavy logic at module import time.
+  Use `if __name__ == '__main__'` guards (already applied in the demos).
+- The demos intentionally keep output minimal and use `random` to generate
+  deterministic-like behavior for quick inspection.
+- If you want a CI job that verifies the demos, add a basic step that runs
+  `python -m py_compile examples/*.py` and optionally executes a small subset
+  with `python -c 'import runpy; runpy.run_path("examples/test_singlelabel_ml.py")'`.
+
+FAQ
+
+Q: Do the demos require API keys?
+A: No — they fall back to mock implementations unless you configure real
+   credentials and install optional dependencies.
+
+Q: Where are cache files written?
+A: `cache/` under the repository root. Demo scripts use subfolders like
+   `cache/demo_llm_cache` and `cache/mock_ensemble`.
+
+Q: Should we commit cache files?
+A: No — cache files are runtime artifacts and should remain untracked. Add
+   them to `.gitignore` if they are not already ignored.
+
+If you'd like, I can also add a short CI job (GitHub Actions) that runs the
+syntax checks and executes one demo with mocks. Say the word and I'll add it.
diff --git a/examples/cache_usage_demo.py b/examples/cache_usage_demo.py
@@ -0,0 +1,77 @@
+"""Cache Usage Demo
+
+Lightweight, runnable demo that shows how cached LLM predictions
+can be discovered, inspected, and used by the fusion helpers.
+This demo uses safe mocks when optional dependencies (OpenAI/DeepSeek)
+are not available so it can run in minimal environments.
+"""
+
+import os
+import sys
+from pathlib import Path
+import pandas as pd
+import json
+import random
+
+project_root = Path(__file__).resolve().parent.parent
+sys.path.insert(0, str(project_root))
+
+# Minimal Mock LLM and Fusion helpers for demo purposes
+class MockLLM:
+    def __init__(self):
+        self.provider = 'mock'
+
+    def predict(self, train_df=None, test_df=None, **kwargs):
+        texts = list(test_df['text']) if test_df is not None else []
+        preds = [[random.choice([0,1])] for _ in texts]
+        # return object with .predictions and .metadata to match real classifiers
+        return type('R', (), {'predictions': preds, 'metadata': {'provider':'mock'}})()
+
+
+def demo_cache_usage():
+    print('Cache usage demo (mock)')
+    # Tiny datasets
+    train_df = pd.DataFrame({'text':['train a', 'train b'], 'label':[1,0]})
+    val_df = pd.DataFrame({'text':['val a','val b'], 'label':[1,0]})
+    test_df = pd.DataFrame({'text':['test a','test b'], 'label':[1,0]})
+
+    # Try to import FusionEnsemble and LLMPredictionCache; fall back to mocks
+    try:
+        from textclassify.ensemble.fusion import FusionEnsemble
+        from textclassify.llm.prediction_cache import LLMPredictionCache
+        print('Loaded real FusionEnsemble and LLMPredictionCache')
+    except Exception:
+        FusionEnsemble = None
+        LLMPredictionCache = None
+        print('Using mock behavior (FusionEnsemble not available)')
+
+    # Use mock LLM to produce predictions and save simple JSON cache files
+    llm = MockLLM()
+    val_res = llm.predict(train_df=train_df, test_df=val_df)
+    test_res = llm.predict(train_df=train_df, test_df=test_df)
+
+    cache_dir = project_root / 'cache' / 'demo_llm_cache'
+    os.makedirs(cache_dir, exist_ok=True)
+    val_file = cache_dir / 'validation_predictions.json'
+    test_file = cache_dir / 'test_predictions.json'
+
+    with open(val_file, 'w') as f:
+        json.dump({'predictions': val_res.predictions, 'provider': llm.provider}, f)
+    with open(test_file, 'w') as f:
+        json.dump({'predictions': test_res.predictions, 'provider': llm.provider}, f)
+
+    print('Wrote demo cache files:')
+    print(' ', val_file)
+    print(' ', test_file)
+
+    # If FusionEnsemble is available we could show how to load these files;
+    # otherwise, just print discovery info.
+    if LLMPredictionCache is not None:
+        cache = LLMPredictionCache(cache_dir=str(cache_dir), verbose=False)
+        print('Cache stats:', cache.get_cache_stats())
+    else:
+        print('Cache discovery (mock):', [str(p) for p in cache_dir.glob('*.json')])
+
+
+if __name__ == '__main__':
+    demo_cache_usage()
diff --git a/examples/ensemble_cache_interrupt_demo.py b/examples/ensemble_cache_interrupt_demo.py
@@ -0,0 +1,61 @@
+"""Ensemble Cache Interrupt Demo
+
+Creates small val/test sets and demonstrates saving LLM predictions to
+cache using Fusion utilities when available. Falls back to safe mocks.
+"""
+
+import os
+import sys
+from pathlib import Path
+import pandas as pd
+import random
+
+project_root = Path(__file__).resolve().parent.parent
+sys.path.insert(0, str(project_root))
+
+class MockLLM:
+    def __init__(self):
+        self.provider = 'mock'
+
+    def predict(self, train_df=None, test_df=None, **kwargs):
+        texts = list(test_df['text']) if test_df is not None else []
+        preds = [[random.choice([0,1])] for _ in texts]
+        return type('R', (), {'predictions': preds, 'metadata': {}})()
+
+class MockML:
+    def predict(self, df):
+        return [0 for _ in range(len(df))]
+
+
+def main():
+    print('Ensemble cache interrupt demo (mock)')
+    df_val = pd.DataFrame({'text':[f'val {i}' for i in range(5)]})
+    df_test = pd.DataFrame({'text':[f'test {i}' for i in range(5)]})
+
+    try:
+        from textclassify.ensemble.fusion import FusionEnsemble
+        fusion_available = True
+        print('FusionEnsemble available')
+    except Exception:
+        fusion_available = False
+        print('FusionEnsemble not available; using mocks')
+
+    llm = MockLLM()
+    val_res = llm.predict(train_df=None, test_df=df_val)
+    test_res = llm.predict(train_df=None, test_df=df_test)
+
+    cache_dir = project_root / 'cache' / 'mock_ensemble'
+    os.makedirs(cache_dir / 'val', exist_ok=True)
+    os.makedirs(cache_dir / 'test', exist_ok=True)
+
+    # Save simple JSON caches
+    import json
+    with open(cache_dir / 'val' / 'preds.json', 'w') as f:
+        json.dump({'predictions': val_res.predictions, 'provider': 'mock'}, f)
+    with open(cache_dir / 'test' / 'preds.json', 'w') as f:
+        json.dump({'predictions': test_res.predictions, 'provider': 'mock'}, f)
+
+    print('Saved mock cache files under', cache_dir)
+
+if __name__ == '__main__':
+    main()
diff --git a/examples/test_multilabel_autofusion.py b/examples/test_multilabel_autofusion.py
@@ -0,0 +1,54 @@
+"""Runnable demo for multi-label AutoFusion (safe fallback)
+
+This demo attempts to construct a minimal multi-label AutoFusion pipeline.
+If the real `AutoFusionClassifier` is unavailable it simulates the flow with
+simple mocks so the script can run in minimal environments.
+"""
+
+import os
+import sys
+from pathlib import Path
+import pandas as pd
+import random
+
+project_root = Path(__file__).resolve().parent.parent
+sys.path.insert(0, str(project_root))
+
+class MockAutoFusion:
+    def __init__(self, config):
+        self.config = config
+        self.label_columns = config.get('label_columns', [])
+        self.multi_label = config.get('multi_label', True)
+
+    def fit(self, df):
+        print('MockAutoFusion.fit() called with', len(df), 'rows')
+
+    def predict(self, df):
+        preds = [[random.choice([0,1]) for _ in self.label_columns] for _ in range(len(df))]
+        return type('R', (), {'predictions': preds, 'metadata': {}})()
+
+
+def main():
+    print('Multi-label AutoFusion demo')
+
+    # tiny sample multi-label dataset
+    df = pd.DataFrame({'text':[f'sample {i}' for i in range(10)], 'labelA':[1,0,0,1,0,1,0,0,1,0], 'labelB':[0,1,0,0,1,0,0,1,0,1]})
+    train_df = df.sample(n=6, random_state=42).reset_index(drop=True)
+    test_df = df.drop(train_df.index).reset_index(drop=True)
+
+    config = {'label_columns': ['labelA','labelB'], 'multi_label': True}
+
+    try:
+        from textclassify.ensemble.auto_fusion import AutoFusionClassifier
+        print('Using real AutoFusionClassifier')
+        clf = AutoFusionClassifier(config=config)
+    except Exception:
+        print('AutoFusionClassifier not available; using MockAutoFusion')
+        clf = MockAutoFusion(config)
+
+    clf.fit(train_df)
+    res = clf.predict(test_df)
+    print('Predictions sample:', res.predictions[:3])
+
+if __name__ == '__main__':
+    main()
diff --git a/examples/test_singlelabel_autofusion.py b/examples/test_singlelabel_autofusion.py
@@ -0,0 +1,52 @@
+"""Runnable demo for single-label AutoFusion (safe fallback)
+
+Creates a tiny single-label dataset and demonstrates AutoFusion flow. Falls back
+to MockAutoFusion if the real class is not importable.
+"""
+
+import os
+import sys
+from pathlib import Path
+import pandas as pd
+import random
+
+project_root = Path(__file__).resolve().parent.parent
+sys.path.insert(0, str(project_root))
+
+class MockAutoFusion:
+    def __init__(self, config):
+        self.config = config
+        self.label_columns = config.get('label_columns', [])
+        self.multi_label = False
+
+    def fit(self, df):
+        print('MockAutoFusion.fit() called with', len(df), 'rows')
+
+    def predict(self, df):
+        preds = [[random.choice([0,1]) for _ in self.label_columns] for _ in range(len(df))]
+        return type('R', (), {'predictions': preds, 'metadata': {}})()
+
+
+def main():
+    print('Single-label AutoFusion demo')
+
+    df = pd.DataFrame({'text':[f'sample {i}' for i in range(12)], 'label':[random.choice([0,1]) for _ in range(12)]})
+    train_df = df.sample(n=8, random_state=42).reset_index(drop=True)
+    test_df = df.drop(train_df.index).reset_index(drop=True)
+
+    config = {'label_columns': ['label'], 'multi_label': False}
+
+    try:
+        from textclassify.ensemble.auto_fusion import AutoFusionClassifier
+        print('Using real AutoFusionClassifier')
+        clf = AutoFusionClassifier(config=config)
+    except Exception:
+        print('AutoFusionClassifier not available; using MockAutoFusion')
+        clf = MockAutoFusion(config)
+
+    clf.fit(train_df)
+    res = clf.predict(test_df)
+    print('Predictions sample:', res.predictions[:5])
+
+if __name__ == '__main__':
+    main()
diff --git a/examples/test_singlelabel_ml.py b/examples/test_singlelabel_ml.py
@@ -0,0 +1,50 @@
+"""Runnable demo for single-label ML-only classifier (safe fallback)
+
+This demo trains a tiny RoBERTa-based ML classifier if available, otherwise
+it uses a MockML to simulate training and prediction.
+"""
+
+import os
+import sys
+from pathlib import Path
+import pandas as pd
+import random
+
+project_root = Path(__file__).resolve().parent.parent
+sys.path.insert(0, str(project_root))
+
+class MockML:
+    def __init__(self):
+        self.model_name = 'mock'
+        self.label_columns = ['label']
+
+    def fit(self, df):
+        print('MockML.fit() called with', len(df), 'rows')
+
+    def predict(self, df):
+        return type('R', (), {'predictions': [[random.choice([0,1])] for _ in range(len(df))], 'metadata': {}})()
+
+
+def main():
+    print('Single-label ML demo')
+    # tiny dataset
+    df = pd.DataFrame({'text':[f'sample {i}' for i in range(30)], 'label':[random.choice([0,1]) for _ in range(30)]})
+    train_df = df.sample(n=20, random_state=42).reset_index(drop=True)
+    test_df = df.drop(train_df.index).reset_index(drop=True)
+
+    try:
+        from textclassify.ml.roberta_classifier import RoBERTaClassifier
+        from textclassify.core.types import ModelConfig, ModelType
+        print('Using real RoBERTaClassifier')
+        cfg = ModelConfig(model_name='roberta-base', model_type=ModelType.TRADITIONAL_ML, parameters={})
+        clf = RoBERTaClassifier(config=cfg, text_column='text', label_columns=['label'], multi_label=False, auto_save_results=False)
+    except Exception:
+        print('RoBERTaClassifier not available; using MockML')
+        clf = MockML()
+
+    clf.fit(train_df)
+    res = clf.predict(test_df)
+    print('Sample predictions:', res.predictions[:5])
+
+if __name__ == '__main__':
+    main()