Skip to content

Commit 7458687

Browse files
authored
feat: Add save/load functionality and improved repr for LLM-based metrics (#2320)
## Summary This PR adds persistence capabilities and better string representations for LLM-based metrics, making them easier to save, share, and debug. ## Changes ### 1. Save/Load Functionality - Added `save()` and `load()` methods to `SimpleLLMMetric` and its subclasses (`DiscreteMetric`, `NumericMetric`, `RankingMetric`) - Supports JSON format with optional gzip compression - Handles all prompt types including `Prompt` and `DynamicFewShotPrompt` - Smart defaults: `metric.save()` saves to `./metric_name.json` ### 2. Improved `__repr__` Methods - Clean, informative string representations for both LLM-based and decorator-based metrics - Removed implementation details (memory addresses, `<locals>`, internal attributes) - Smart prompt truncation (80 chars max) - Function signature display for decorator-based metrics **Before:** ```python create_metric_decorator.<locals>.decorator_factory.<locals>.decorator.<locals>.CustomMetric(name='summary_accuracy', _func=<function summary_accuracy at 0x151ffdf80>, ...) ``` **After:** ```python # LLM-based metrics DiscreteMetric(name='response_quality', allowed_values=['correct', 'incorrect'], prompt='Evaluate if the response...') # Decorator-based metrics summary_accuracy(user_input, response) -> DiscreteMetric[['pass', 'fail']] ``` ### 3. Response Model Handling - Added `create_auto_response_model()` factory to mark auto-generated models - Only warns about custom response models during save, not standard ones ## Usage Examples ```python # Save metric with default path metric.save() # → ./response_quality.json # Save with custom path metric.save("custom.json") metric.save("/path/to/metrics/") # → /path/to/metrics/response_quality.json metric.save("compressed.json.gz") # Compressed # Load metric loaded_metric = DiscreteMetric.load("response_quality.json") # For DynamicFewShotPrompt metrics loaded_metric = DiscreteMetric.load("metric.json", embedding_model=embeddings) ``` ## Testing - Comprehensive test suite with 8 tests covering all save/load scenarios - Tests for default paths, directory handling, compression - Tests for all prompt types and metric subclasses ## Dependencies **Note:** This PR builds on #2316 (Fix metric inheritance patterns) and requires it to be merged first. The changes here depend on the cleaned-up metric inheritance structure from that PR. ## Checklist - [x] Tests added - [x] Documentation in docstrings - [x] Backwards compatible (new functionality only) - [x] Follows TDD practices
1 parent 19caa7a commit 7458687

File tree

7 files changed

+904
-20
lines changed

7 files changed

+904
-20
lines changed

0 commit comments

Comments
 (0)