feat: Metric Future Pack: Introduce metric types #391
Conversation
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #391 +/- ##
==========================================
+ Coverage 86.16% 86.32% +0.15%
==========================================
Files 75 75
Lines 5864 5880 +16
==========================================
+ Hits 5053 5076 +23
+ Misses 811 804 -7 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
| scorer_fn=response_length_scorer, | ||
| scorable_types=[StepType.llm], | ||
| aggregatable_types=[StepType.trace], | ||
| scorer_fn=my_scorer, |
There was a problem hiding this comment.
what about the scorable & aggregatable types here?
There was a problem hiding this comment.
Commit ba74cf8 addressed this comment by adding comprehensive handling of scorable and aggregatable types. The diff introduces a factory method _create_metric_from_type that properly handles different scorer types (LLM, CODE, Galileo), and improves the extraction and handling of scoreable node types from API responses throughout the codebase.
| Attributes | ||
| ---------- | ||
| prompt (str | None): Prompt template for the LLM scorer. | ||
| model (str | None): Model name to use for scoring. |
There was a problem hiding this comment.
should we have something here that indicates that the model must align with a model name avaialble in Galileo? Will we have an Integration or Model top level class in the python SDK that we could reference here instead of just a string?
There was a problem hiding this comment.
yep, we are moving in that direction.
| This metric type is for code-based scorers that execute custom code | ||
| to evaluate traces/spans. | ||
|
|
||
| Note: Full support for creating CodeMetric instances is not yet implemented. |
vamaq
left a comment
There was a problem hiding this comment.
The only pending action would be to solve some of the comments and we are ready to ![]()
User description
Shortcut:
Description:
Overview
Refactored the metric system to have a clean 4-type class hierarchy with a common base class, as requested.
New Class Hierarchy
Base Class:
Metric(Abstract)Common attributes shared by all metric types:
id: str | None- Unique identifiername: str- Metric namescorer_type: ScorerTypes | None- Type of scorerdescription: str- Metric descriptiontags: list[str]- Associated tagscreated_at: datetime | None- Creation timestampupdated_at: datetime | None- Update timestampversion: int | None- Version numberCommon methods:
get(id=..., name=...)- Retrieve existing metric (returns appropriate subclass)list(...)- List metrics (returns appropriate subclasses)delete_by_name(name)- Delete metric by namedelete()- Delete this metricrefresh()- Refresh from APIupdate()- Update metric (not implemented)to_legacy_metric()- Convert to legacy format1.
LlmMetric- Custom LLM-based MetricsFor creating custom metrics evaluated by an LLM judge.
Additional Attributes:
prompt: str- Prompt template for the LLM scorermodel: str- Model name (e.g., "gpt-4o-mini")judges: int- Number of judges for scoringcot_enabled: bool- Chain-of-thought enablednode_level: StepType- Node level (e.g., StepType.llm)output_type: OutputTypeEnum- Output type (percentage, boolean, etc.)Additional Methods:
create()- Persist to APIExample:
Backward Compatibility:
user_prompt,model_name,num_judges2.
LocalMetric- Function-based Local MetricsFor metrics that use Python functions to score locally without API calls.
Additional Attributes:
scorer_fn: Callable- Scoring functionscorable_types: list[StepType]- Types that can be scoredaggregatable_types: list[StepType]- Types for aggregationAdditional Methods:
to_local_metric_config()- Convert to LocalMetricConfig formatExample:
3.
CodeMetric- Code-based MetricsFor code-based scorers (limited support).
Notes:
create()method raisesNotImplementedErrorMetric.get()if they existExample:
4.
GalileoMetric- Built-in Galileo ScorersFor Galileo's built-in scorers (correctness, completeness, toxicity, etc.).
Access via:
Metric.scorers.<scorer_name>- e.g.,Metric.scorers.correctnessMetric.get(name="scorer_name")- ReturnsGalileoMetricinstanceExample:
Key Features
Type-aware Factory Methods
The
Metric.get()andMetric.list()methods automatically return the appropriate subclass based onscorer_type:ScorerTypes.LLM→LlmMetricScorerTypes.CODE→CodeMetricGalileoMetricProper Type Annotations
All methods properly annotated with type hints, passing mypy strict type checking.
Clean Separation of Concerns
Each metric type has only the attributes and methods relevant to its purpose:
LlmMetrichas prompt, model, judgesLocalMetrichas scorer_fn, scorable_typesCodeMetricandGalileoMetricare minimalFiles Modified
1.
/src/galileo/__future__/metric.pyMetricto abstract base class (ABC)LlmMetric,LocalMetric,CodeMetric,GalileoMetricget()andlist()to return appropriate subclass instancesLlmMetricLocalMetric_populate_from_scorer_response()to handle different types2.
/src/galileo/__future__/__init__.py__all__list3.
/tests/future/test_metric.pyLlmMetric,LocalMetric, etc.)4.
/tests/future/test_metric_types.py(NEW)Breaking Changes
For Users Creating Metrics Directly
Before:
After:
For Users Retrieving Metrics
No Breaking Changes -
Metric.get()andMetric.list()still work the same, but now return properly typed subclass instances.Migration Guide
Simple Migration
Metric(...)withLlmMetric(...)Metric(..., scorer_fn=...)withLocalMetric(..., scorer_fn=...)isinstance()to check typesType Checking
Testing
Test Coverage
test_metric_types.pytest_metric.pyRunning Tests
Benefits
Tests:
Generated description
Below is a concise technical summary of the changes proposed in this PR:
graph LR Metric_get_("Metric.get"):::modified Metric_create_metric_from_type_("Metric._create_metric_from_type"):::added Metric_list_("Metric.list"):::modified LlmMetric_create_("LlmMetric.create"):::added Metric_refresh_("Metric.refresh"):::modified Metric_get_ -- "Instantiates correct Metric subclass based on scorer_type." --> Metric_create_metric_from_type_ Metric_list_ -- "Creates correct Metric subclass instances per scorer_type." --> Metric_create_metric_from_type_ LlmMetric_create_ -- "Refreshes metric instance to sync full scorer details." --> Metric_refresh_ classDef added stroke:#15AA7A classDef removed stroke:#CD5270 classDef modified stroke:#EDAC4C linkStyle default stroke:#CBD5E1,font-size:13pxRefactor the
Metricsystem by introducing an abstract base classMetricand four concrete subclasses:LlmMetric,LocalMetric,CodeMetric, andGalileoMetric, providing a type-safe and extensible API for defining and managing various metric types. UpdateMetric.get()andMetric.list()to return appropriate subclass instances, enhancing clarity and maintainability.Modified files (2)
Latest Contributors(1)
Metricbase class and four concrete subclasses (LlmMetric,LocalMetric,CodeMetric,GalileoMetric) to provide a type-safe and extensible API for defining and managing different metric types.Modified files (2)
Latest Contributors(2)