Conversation
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## main #378 +/- ##
==========================================
- Coverage 86.33% 86.11% -0.22%
==========================================
Files 73 75 +2
Lines 5612 5864 +252
==========================================
+ Hits 4845 5050 +205
- Misses 767 814 +47 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
src/galileo/__future__/log_stream.py
Outdated
|
|
||
| Args: | ||
| metrics: List of metrics to add. Supports: | ||
| - GalileoScorers enum values (e.g., GalileoScorers.correctness) |
There was a problem hiding this comment.
Commit c187e90 addressed this comment. The documentation for the metrics parameter was updated to promote the newer Metric.scorers approach as recommended while maintaining GalileoScorers for backward compatibility. This change appears to address the consistency issue that was likely being pointed out with "here too".
| metrics = Metric.list() | ||
|
|
||
| # Delete a metric | ||
| metric.delete() |
There was a problem hiding this comment.
can i delete a metric by name without retrieving it first?
There was a problem hiding this comment.
I believe we just added this
There was a problem hiding this comment.
Good point. We should probably add a class methods for that too.
| """ | ||
| Persist this metric to the API. | ||
|
|
||
| Only works for LLM metrics. Local metrics (with scorer_fn) don't need |
There was a problem hiding this comment.
what about galileo-hosted code-based metrics?
There was a problem hiding this comment.
just a note that i dont think we support these from the client today, so we can leave these for a follow on
There was a problem hiding this comment.
we don't have a SDK function for this yet
| current_metrics = log_stream.get_metrics() | ||
| print(f"Currently enabled: {current_metrics}") | ||
| """ | ||
| from galileo.config import GalileoConfig |
There was a problem hiding this comment.
Let's move all these imports to the top of the file.
src/galileo/__future__/log_stream.py
Outdated
| logger.info(f"LogStream.add_metrics: setting {len(combined_metrics)} total metrics") | ||
| return self.set_metrics(combined_metrics) | ||
|
|
||
| def enable_metrics( |
There was a problem hiding this comment.
As much as possible, we won't be addressing backward compatibility in the future package. That will be part of the later process when we move these objects to the base module.
| from galileo.__future__ import Metric | ||
|
|
||
| # Access built-in scorers | ||
| Metric.scorers.correctness |
There was a problem hiding this comment.
I wonder if this should just be Metric.correctness ?
|
|
||
| @classmethod | ||
| def list( | ||
| cls, *, name_filter: str | None = None, scorer_types: list[ScorerTypes] | None = None |
There was a problem hiding this comment.
Idk if we want to expose ScorerTypes publicly
|
|
||
| return result | ||
|
|
||
| def _populate_from_scorer_response(self, scorer_response: Any) -> None: |
There was a problem hiding this comment.
Shouldn't we use typing here? why Any? Additionally, can't we do some Pydantic serialization for this and add field_validators for any of the fields that need custom population
| else: | ||
| self.node_level = None | ||
|
|
||
| def update(self, **kwargs: Any) -> None: |
There was a problem hiding this comment.
why even add this to the SDK? I don't think we have plans to support this ?
| logger.error(f"Metric.delete: id='{self.id}' - failed: {e}") | ||
| raise | ||
|
|
||
| def refresh(self) -> None: |
| return f"Metric(name='{self.name}', type='local', scorer_fn={self.scorer_fn.__name__})" | ||
| if self.scorer_type: | ||
| return ( | ||
| f"Metric(name='{self.name}', id='{self.id}', type='{self.scorer_type.value}', " |
There was a problem hiding this comment.
Not all metrics will have judges etc
|
High level, I think we should have 4 types: CodeMetric, LlmMetric, GalileoMetric, LocalMetric They can inherit the common params from a base Metric class. |
| scorer_type (ScorerTypes | None): The type of scorer (LLM, CODE, LOCAL, etc.). | ||
| description (str): Description of the metric. | ||
| tags (list[str]): Tags associated with the metric. | ||
| prompt (str | None): Prompt template for LLM-based scorers (alias for user_prompt). |
There was a problem hiding this comment.
We should probably rename the properties to reflect what they are. e.g. A prompt, according our definition, should be a prompt object, so here the parameter name here should be prompt_name.
Same for all the other properties.
8ded234 to
c187e90
Compare
src/galileo/__future__/__init__.py
Outdated
| from galileo.search import RecordType | ||
| ======= | ||
| from galileo.schema.metrics import GalileoScorers, LocalMetricConfig | ||
| >>>>>>> 37c9012 (add/update) |
There was a problem hiding this comment.
❌ Failed check: Test / test (ubuntu-latest, 3.12)
I’ve attached the relevant part of the log for your convenience:
Invalid decimal literal [syntax] - merge conflict marker detected (>>>>>>> 37c9012 (add/update))
Finding type: Log Error
| logger.info(f"LogStream.get_metrics: id='{self.id}' - started") | ||
| config = GalileoConfig.get() | ||
|
|
||
| settings = get_settings_projects_project_id_runs_run_id_scorer_settings_get.sync( | ||
| project_id=self.project_id, run_id=self.id, client=config.api_client | ||
| ) |
There was a problem hiding this comment.
Could we restore the guard that raises ValueError when self.id or self.project_id is missing before calling the scorer-settings API? As written, a locally constructed log stream calls get_settings(..., project_id=None, run_id=None, ...), so instead of the documented error we'll send an invalid request to the backend.
| logger.info(f"LogStream.get_metrics: id='{self.id}' - started") | |
| config = GalileoConfig.get() | |
| settings = get_settings_projects_project_id_runs_run_id_scorer_settings_get.sync( | |
| project_id=self.project_id, run_id=self.id, client=config.api_client | |
| ) | |
| if self.id is None or self.project_id is None: | |
| raise ValueError("LogStream must have both id and project_id to get metrics") | |
| logger.info(f"LogStream.get_metrics: id='{self.id}' - started") | |
| config = GalileoConfig.get() | |
| settings = get_settings_projects_project_id_runs_run_id_scorer_settings_get.sync( | |
| project_id=self.project_id, run_id=self.id, client=config.api_client | |
| ) |
Finding type: Logical Bugs
| from galileo.__future__ import Metric, LogStream | ||
|
|
||
| project = Project.get(name="My AI Project") | ||
| log_stream = project.create_log_stream(name="Production Logs") | ||
| log_stream = LogStream.get(name="Production Logs", project_name="My Project") | ||
|
|
||
| # Enable built-in metrics | ||
| local_metrics = log_stream.enable_metrics([ | ||
| GalileoScorers.correctness, | ||
| GalileoScorers.completeness, | ||
| "context_relevance" | ||
| # Set metrics (replaces existing) | ||
| log_stream.set_metrics([ | ||
| Metric.scorers.correctness, | ||
| Metric.scorers.completeness, | ||
| Metric.get(id="metric-from-console-uuid"), # From console | ||
| ]) |
There was a problem hiding this comment.
The new docstring advertises support for galileo.__future__.Metric (e.g. using Metric.scorers.correctness), but this module still imports Metric from galileo.schema.metrics. That legacy class has no scorers attribute or get helper, so following the example now raises AttributeError and set_metrics never receives the new Metric objects. Please import Metric from galileo.__future__.metric (and adjust the union) so the promised type actually works.
Finding type: Type Inconsistency
| instance = cls.__new__(cls) | ||
| StateManagementMixin.__init__(instance) | ||
| instance._populate_from_scorer_response(retrieved_scorer) | ||
| instance._set_state(SyncState.SYNCED) | ||
| result.append(instance) |
There was a problem hiding this comment.
Would it make sense to extract the repeated logic of creating and initializing a Metric instance from a scorer response into a helper method, since the same 3+ lines appear here and at line 427? For example:
def _from_scorer_response(cls, scorer):
instance = cls.new(cls)
StateManagementMixin.init(instance)
instance._populate_from_scorer_response(scorer)
instance._set_state(SyncState.SYNCED)
return instance
Then you could call this helper in both places to avoid duplication.
Prompt for AI Agents:
In `src/galileo/__future__/metric.py` around lines 465-469 and line 427, there is
repeated code for creating and initializing Metric instances from scorer responses.
Refactor by extracting a class method like `_from_scorer_response` that encapsulates the
common instance creation logic. This method should create a new instance, initialize it
with StateManagementMixin, populate from the scorer response, set the sync state, and
return the instance. Replace the duplicate code blocks with calls to this new helper
method to eliminate code duplication and improve maintainability.
Finding type: Code Dedup and Conventions
e11a8d3 to
0624b9d
Compare
User description
Overview
The new
galileo.__future__.Metricclass provides a unified, object-oriented interface for working with all types of Galileo metrics. It's fully backward compatible with existing code while offering a much more intuitive API.Key Features
✅ Three Ways to Use Metrics
Metric.scorers.correctness✅ Complete Backward Compatibility
create_custom_llm_metric,delete_metric,get_metrics) still workLogStream.enable_metrics()andenable_metrics()functions still workto_legacy_metric(),to_local_metric_config()user_prompt,model_name,num_judges) and new cleaner ones (prompt,model,judges)✅ Enhanced Capabilities
Metric.get())Metric.list())metric.refresh()promptinstead ofuser_prompt,modelinstead ofmodel_nameAPI Comparison
Built-in Scorers
NEW API (Preferred):
EXISTING APIs (Still work!):
Custom LLM Metrics
NEW API (Improved):
EXISTING API (Still works!):
Local Function Metrics
NEW API:
EXISTING API (Still works!):
New Capabilities
1. Retrieve Metrics
2. List and Filter Metrics
3. State Management
4. Cleaner Parameter Names
Migration Examples
Example 1: Simple Migration
Before:
After (Optional):
Result: ✅ Both work! No need to migrate unless you prefer the new API.
Example 2: Custom LLM Metric
Before:
After (Optional):
Benefits of migrating:
Metric.get(name="quality")metric.is_synced()Example 3: All Three Metric Types
Before:
After (Simpler):
Implementation Details
Architecture
The new
Metricclass:BusinessObjectMixin- Provides state managementMetrics(),Scorers()under the hoodto_legacy_metric(),to_local_metric_config()enable_metrics(),set_metrics()State Diagram
Type System
Common Use Cases
Use Case 1: Quick Setup with Built-ins
Use Case 2: Custom Quality Scoring
Use Case 3: Domain-Specific Local Metrics
Use Case 4: Metric Management Dashboard
Generated description
Below is a concise technical summary of the changes proposed in this PR:
graph LR LogStream_set_metrics_("LogStream.set_metrics"):::modified Metric_("Metric"):::added LocalMetricConfig_("LocalMetricConfig"):::added Metric_create_("Metric.create"):::added Metrics_("Metrics"):::added Metric_get_("Metric.get"):::added Scorers_("Scorers"):::added Metric_list_("Metric.list"):::added Metric_delete_by_name_("Metric.delete_by_name"):::added Metric_refresh_("Metric.refresh"):::added Metric_to_legacy_metric_("Metric.to_legacy_metric"):::added LogStream_set_metrics_ -- "Supports Metric objects for richer log stream metric configuration" --> Metric_ LogStream_set_metrics_ -- "Adds LocalMetricConfig support for local function-based metrics" --> LocalMetricConfig_ Metric_create_ -- "Creates custom LLM metric via Metrics service API call" --> Metrics_ Metric_get_ -- "Retrieves metric details using Scorers service API" --> Scorers_ Metric_list_ -- "Lists metrics with filters via Scorers service API" --> Scorers_ Metric_delete_by_name_ -- "Deletes metric by name using Metrics service API" --> Metrics_ Metric_refresh_ -- "Refreshes metric state from API via Scorers service" --> Scorers_ Metric_to_legacy_metric_ -- "Converts new Metric to legacy class for backward compatibility" --> Metric_ classDef added stroke:#15AA7A classDef removed stroke:#CD5270 classDef modified stroke:#EDAC4C linkStyle default stroke:#CBD5E1,font-size:13pxIntroduce a new
galileo.future.Metricclass, providing a unified object-oriented interface for managing all types of Galileo metrics, including built-in, custom LLM, and local function metrics. Enhance theLogStreamcomponent to integrate seamlessly with this newMetricclass, ensuring backward compatibility and offering improved metric retrieval and setting capabilities.galileo.__future__.Metricclass, providing a unified object-oriented interface for creating, retrieving, listing, and managing built-in, custom LLM, and local function metrics.Modified files (4)
Latest Contributors(2)
LogStreamto support the newMetricclass, including a newget_metricsmethod and an enhancedset_metricsmethod, while also adding configuration options for default scorer models.Modified files (3)
Latest Contributors(2)