feat: Metric future package by nachiket-galileo · Pull Request #378 · rungalileo/galileo-python

nachiket-galileo · 2025-10-22T04:44:33Z

User description

Overview

The new galileo.__future__.Metric class provides a unified, object-oriented interface for working with all types of Galileo metrics. It's fully backward compatible with existing code while offering a much more intuitive API.

Key Features

✅ Three Ways to Use Metrics

Built-in Galileo Scorers - Access via Metric.scorers.correctness
Custom LLM Metrics - Create with prompt templates and judge models
Local Function Metrics - Define custom scoring functions

✅ Complete Backward Compatibility

All existing functions (create_custom_llm_metric, delete_metric, get_metrics) still work
Existing LogStream.enable_metrics() and enable_metrics() functions still work
Can convert to legacy types: to_legacy_metric(), to_local_metric_config()
Accepts both old parameter names (user_prompt, model_name, num_judges) and new cleaner ones (prompt, model, judges)

✅ Enhanced Capabilities

Retrieve metrics by ID or name (Metric.get())
List and filter metrics (Metric.list())
State management - Know if metric is synced, local-only, or failed
Refresh from API - metric.refresh()
Cleaner API - prompt instead of user_prompt, model instead of model_name

API Comparison

Built-in Scorers

NEW API (Preferred):

from galileo.__future__ import Metric, LogStream

log_stream = LogStream.get(name="my-stream", project_name="my-project")
log_stream.set_metrics([
    Metric.scorers.correctness,  # ✨ Most intuitive!
    Metric.scorers.completeness,
    Metric.scorers.toxicity,
])

EXISTING APIs (Still work!):

from galileo.schema.metrics import GalileoScorers

log_stream.set_metrics([
    GalileoScorers.correctness,  # Still works
    "completeness",               # String names still work
])

Custom LLM Metrics

NEW API (Improved):

from galileo.__future__ import Metric, StepType

metric = Metric(
    name="quality_checker",
    prompt="Rate the quality: {output}",  # Cleaner than 'user_prompt'
    model="gpt-4o-mini",                   # Cleaner than 'model_name'
    judges=3,                               # Cleaner than 'num_judges'
    node_level=StepType.llm,
    output_type="percentage",
    cot_enabled=True,
).create()

# Use it
log_stream.set_metrics([metric])

EXISTING API (Still works!):

from galileo.metrics import create_custom_llm_metric

version = create_custom_llm_metric(
    name="quality_checker",
    user_prompt="Rate the quality: {output}",  # Still works
    model_name="gpt-4o-mini",                   # Still works
    num_judges=3,                                # Still works
    node_level=StepType.llm,
)

Local Function Metrics

NEW API:

from galileo.__future__ import Metric, StepType

def my_scorer(trace_or_span):
    if hasattr(trace_or_span, "output"):
        return len(trace_or_span.output) / 100.0
    return 0.0

metric = Metric(
    name="response_length",
    scorer_fn=my_scorer,
    scorable_types=[StepType.llm],
    aggregatable_types=[StepType.trace],
)

# Use it
log_stream.set_metrics([metric])

EXISTING API (Still works!):

from galileo.schema.metrics import LocalMetricConfig

config = LocalMetricConfig(
    name="response_length",
    scorer_fn=my_scorer,
    scorable_types=[StepType.llm],
    aggregatable_types=[StepType.trace],
)

log_stream.set_metrics([config])  # Still works!

New Capabilities

1. Retrieve Metrics

# Get by name
metric = Metric.get(name="quality_checker")

# Get by ID
metric = Metric.get(id="abc-123-def")

# Returns None if not found
if metric is None:
    print("Metric not found")

2. List and Filter Metrics

# List all metrics
all_metrics = Metric.list()

# Filter by name
filtered = Metric.list(name_filter="quality")

# Filter by type
from galileo.resources.models import ScorerTypes
llm_metrics = Metric.list(scorer_types=[ScorerTypes.LLM])

3. State Management

# Create locally
metric = Metric(name="test", prompt="Test prompt", model="gpt-4o-mini")

# Check state
print(metric.is_local_only())  # True
print(metric.is_synced())       # False

# Persist to API
metric.create()

print(metric.is_local_only())  # False
print(metric.is_synced())       # True
print(metric.id)                # UUID assigned by API

# Refresh from API
metric.refresh()

# Delete
metric.delete()
print(metric.is_deleted())      # True

4. Cleaner Parameter Names

# ✅ New (preferred)
Metric(
    prompt="...",
    model="gpt-4o-mini",
    judges=3,
)

# ✅ Old (still works)
Metric(
    user_prompt="...",
    model_name="gpt-4o-mini",
    num_judges=3,
)

# ✅ Mix and match (new overrides old)
Metric(
    prompt="...",              # Takes precedence
    user_prompt="ignored",     # Ignored
    model="gpt-4o-mini",       # Takes precedence
    model_name="ignored",      # Ignored
)

Migration Examples

Example 1: Simple Migration

Before:

from galileo.schema.metrics import GalileoScorers
from galileo.log_streams import enable_metrics

enable_metrics(
    log_stream_name="my-stream",
    project_name="my-project",
    metrics=[GalileoScorers.correctness, "completeness"]
)

After (Optional):

from galileo.__future__ import Metric, LogStream

log_stream = LogStream.get(name="my-stream", project_name="my-project")
log_stream.set_metrics([
    Metric.scorers.correctness,
    Metric.scorers.completeness,
])

Result: ✅ Both work! No need to migrate unless you prefer the new API.

Example 2: Custom LLM Metric

Before:

from galileo.metrics import create_custom_llm_metric

version = create_custom_llm_metric(
    name="quality",
    user_prompt="Rate this: {output}",
    model_name="gpt-4.1-mini",
    num_judges=3,
)
scorer_id = version.scorer_id

After (Optional):

from galileo.__future__ import Metric

metric = Metric(
    name="quality",
    prompt="Rate this: {output}",  # Cleaner
    model="gpt-4.1-mini",            # Cleaner
    judges=3,                         # Cleaner
).create()

scorer_id = metric.id

Benefits of migrating:

✅ Cleaner parameter names
✅ Can retrieve later: Metric.get(name="quality")
✅ State management: metric.is_synced()
✅ Object-oriented workflow

Example 3: All Three Metric Types

Before:

from galileo.schema.metrics import GalileoScorers, LocalMetricConfig, Metric

def my_scorer(span):
    return 0.5

metrics = [
    GalileoScorers.correctness,
    Metric(name="custom_metric", version=2),
    LocalMetricConfig(name="local", scorer_fn=my_scorer),
]

enable_metrics(log_stream_name="stream", project_name="proj", metrics=metrics)

After (Simpler):

from galileo.__future__ import Metric

def my_scorer(span):
    return 0.5

log_stream.set_metrics([
    Metric.scorers.correctness,           # Built-in
    Metric.get(name="custom_metric"),     # Existing custom
    Metric(name="local", scorer_fn=my_scorer),  # Local
])

Implementation Details

Architecture

The new Metric class:

Extends BusinessObjectMixin - Provides state management
Wraps existing services - Uses Metrics(), Scorers() under the hood
Provides conversion methods - to_legacy_metric(), to_local_metric_config()
Works with existing infrastructure - Compatible with enable_metrics(), set_metrics()

State Diagram

LOCAL_ONLY ──.create()──> SYNCED
                            │
                 .refresh() │ .delete()
                            │
                            ↓
                         DELETED

SYNCED ──API error──> FAILED_SYNC ──.refresh()──> SYNCED

Type System

# Three metric creation patterns:

# 1. LLM Metric
Metric(
    name="...",
    prompt="...",
    scorer_type=ScorerTypes.LLM  # Auto-detected
)

# 2. Local Metric  
Metric(
    name="...",
    scorer_fn=callable,
    scorer_type=None  # Local metrics don't have scorer_type
)

# 3. Reference to Existing
Metric(
    name="...",
    version=2,
    scorer_type=None  # Just a reference
)

Common Use Cases

Use Case 1: Quick Setup with Built-ins

from galileo.__future__ import Metric, LogStream

log_stream = LogStream.get(name="prod", project_name="my-app")
log_stream.set_metrics([
    Metric.scorers.correctness,
    Metric.scorers.toxicity,
    Metric.scorers.prompt_injection,
])

Use Case 2: Custom Quality Scoring

from galileo.__future__ import Metric

quality_metric = Metric(
    name="response_quality_v2",
    prompt="""
    Rate response quality 1-10:
    - Accuracy
    - Completeness  
    - Clarity
    
    Input: {input}
    Output: {output}
    
    Score: """,
    model="gpt-4o",
    judges=5,
    tags=["quality", "v2"],
).create()

# Use across multiple streams
stream1.set_metrics([quality_metric])
stream2.set_metrics([quality_metric])

Use Case 3: Domain-Specific Local Metrics

from galileo.__future__ import Metric, StepType

def medical_terminology_scorer(span):
    """Check if medical terms are used correctly"""
    output = getattr(span, "output", "")
    medical_terms = ["diagnosis", "treatment", "symptoms"]
    return sum(1 for term in medical_terms if term in output.lower()) / len(medical_terms)

medical_metric = Metric(
    name="medical_terminology_usage",
    scorer_fn=medical_terminology_scorer,
    scorable_types=[StepType.llm],
)

log_stream.set_metrics([
    Metric.scorers.correctness,
    medical_metric,
])

Use Case 4: Metric Management Dashboard

from galileo.__future__ import Metric

# List all metrics
all_metrics = Metric.list()
print(f"Total: {len(all_metrics)}")

# Group by type
from collections import defaultdict
by_type = defaultdict(list)
for m in all_metrics:
    if m.scorer_type:
        by_type[m.scorer_type.value].append(m.name)

for type_name, names in by_type.items():
    print(f"{type_name}: {len(names)} metrics")

# Find unused metrics
active_metrics = {"correctness", "toxicity"}
for m in all_metrics:
    if m.name not in active_metrics:
        print(f"Unused: {m.name} (created {m.created_at})")
        # Optionally delete
        # m.delete()

Generated description

Below is a concise technical summary of the changes proposed in this PR:

graph LR
LogStream_set_metrics_("LogStream.set_metrics"):::modified
Metric_("Metric"):::added
LocalMetricConfig_("LocalMetricConfig"):::added
Metric_create_("Metric.create"):::added
Metrics_("Metrics"):::added
Metric_get_("Metric.get"):::added
Scorers_("Scorers"):::added
Metric_list_("Metric.list"):::added
Metric_delete_by_name_("Metric.delete_by_name"):::added
Metric_refresh_("Metric.refresh"):::added
Metric_to_legacy_metric_("Metric.to_legacy_metric"):::added
LogStream_set_metrics_ -- "Supports Metric objects for richer log stream metric configuration" --> Metric_
LogStream_set_metrics_ -- "Adds LocalMetricConfig support for local function-based metrics" --> LocalMetricConfig_
Metric_create_ -- "Creates custom LLM metric via Metrics service API call" --> Metrics_
Metric_get_ -- "Retrieves metric details using Scorers service API" --> Scorers_
Metric_list_ -- "Lists metrics with filters via Scorers service API" --> Scorers_
Metric_delete_by_name_ -- "Deletes metric by name using Metrics service API" --> Metrics_
Metric_refresh_ -- "Refreshes metric state from API via Scorers service" --> Scorers_
Metric_to_legacy_metric_ -- "Converts new Metric to legacy class for backward compatibility" --> Metric_
classDef added stroke:#15AA7A
classDef removed stroke:#CD5270
classDef modified stroke:#EDAC4C
linkStyle default stroke:#CBD5E1,font-size:13px

Introduce a new galileo.future.Metric class, providing a unified object-oriented interface for managing all types of Galileo metrics, including built-in, custom LLM, and local function metrics. Enhance the LogStream component to integrate seamlessly with this new Metric class, ensuring backward compatibility and offering improved metric retrieval and setting capabilities.

Topic Details

Metric API & Mgmt

Introduces the new galileo.__future__.Metric class, providing a unified object-oriented interface for creating, retrieving, listing, and managing built-in, custom LLM, and local function metrics.

Modified files (4)

tests/future/test_metric.py
src/galileo/__future__/types.py
src/galileo/__future__/metric.py
src/galileo/__future__/__init__.py

Latest Contributors(2)

User	Commit	Date
vamaq@users.noreply.gi...	feat-Add-declarative-f...	October 29, 2025
jimbobbennett@mac.com	fix-Fixing-docstrings-369	October 15, 2025

LogStream Integration

Updates the LogStream to support the new Metric class, including a new get_metrics method and an enhanced set_metrics method, while also adding configuration options for default scorer models.

Modified files (3)

tests/future/test_configuration.py
src/galileo/__future__/log_stream.py
src/galileo/__future__/configuration.py

Latest Contributors(2)

User	Commit	Date
vamaq@users.noreply.gi...	feat-Add-declarative-f...	October 29, 2025
jimbobbennett@mac.com	fix-Fixing-docstrings-369	October 15, 2025

This pull request is reviewed by Baz. Review like a pro on (Baz).

codecov · 2025-10-22T04:46:48Z

Codecov Report

❌ Patch coverage is 81.42292% with 47 lines in your changes missing coverage. Please review.
✅ Project coverage is 86.11%. Comparing base (6e1b5c4) to head (4fc0aef).
⚠️ Report is 1 commits behind head on main.

Files with missing lines	Patch %	Lines
src/galileo/__future__/metric.py	86.69%	31 Missing ⚠️
src/galileo/__future__/log_stream.py	15.38%	11 Missing ⚠️
src/galileo/__future__/types.py	0.00%	5 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main     #378      +/-   ##
==========================================
- Coverage   86.33%   86.11%   -0.22%     
==========================================
  Files          73       75       +2     
  Lines        5612     5864     +252     
==========================================
+ Hits         4845     5050     +205     
- Misses        767      814      +47

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

src/galileo/__future__/log_stream.py

dmcwhorter · 2025-10-22T12:55:32Z

src/galileo/__future__/log_stream.py

+
+        Args:
+            metrics: List of metrics to add. Supports:
+                - GalileoScorers enum values (e.g., GalileoScorers.correctness)


Commit c187e90 addressed this comment. The documentation for the metrics parameter was updated to promote the newer Metric.scorers approach as recommended while maintaining GalileoScorers for backward compatibility. This change appears to address the consistency issue that was likely being pointed out with "here too".

src/galileo/__future__/metric.py

dmcwhorter · 2025-10-22T13:15:43Z

src/galileo/__future__/metric.py

+        metrics = Metric.list()
+
+        # Delete a metric
+        metric.delete()


can i delete a metric by name without retrieving it first?

I believe we just added this

Good point. We should probably add a class methods for that too.

src/galileo/__future__/metric.py

dmcwhorter · 2025-10-22T13:17:49Z

src/galileo/__future__/metric.py

+        """
+        Persist this metric to the API.
+
+        Only works for LLM metrics. Local metrics (with scorer_fn) don't need


what about galileo-hosted code-based metrics?

just a note that i dont think we support these from the client today, so we can leave these for a follow on

we don't have a SDK function for this yet

src/galileo/__future__/__init__.py

vamaq · 2025-10-22T13:40:17Z

src/galileo/__future__/log_stream.py

+            current_metrics = log_stream.get_metrics()
+            print(f"Currently enabled: {current_metrics}")
+        """
+        from galileo.config import GalileoConfig


Let's move all these imports to the top of the file.

vamaq · 2025-10-22T13:43:21Z

src/galileo/__future__/log_stream.py

+        logger.info(f"LogStream.add_metrics: setting {len(combined_metrics)} total metrics")
+        return self.set_metrics(combined_metrics)
+
+    def enable_metrics(


As much as possible, we won't be addressing backward compatibility in the future package. That will be part of the later process when we move these objects to the base module.

src/galileo/__future__/__init__.py

src/galileo/__future__/log_stream.py

john-weiler · 2025-10-22T13:38:50Z

src/galileo/__future__/metric.py

+        from galileo.__future__ import Metric
+
+        # Access built-in scorers
+        Metric.scorers.correctness


I wonder if this should just be Metric.correctness ?

src/galileo/__future__/metric.py

john-weiler · 2025-10-22T13:46:32Z

src/galileo/__future__/metric.py

+
+    @classmethod
+    def list(
+        cls, *, name_filter: str | None = None, scorer_types: list[ScorerTypes] | None = None


Idk if we want to expose ScorerTypes publicly

john-weiler · 2025-10-22T13:48:01Z

src/galileo/__future__/metric.py

+
+        return result
+
+    def _populate_from_scorer_response(self, scorer_response: Any) -> None:


Shouldn't we use typing here? why Any? Additionally, can't we do some Pydantic serialization for this and add field_validators for any of the fields that need custom population

john-weiler · 2025-10-22T13:48:29Z

src/galileo/__future__/metric.py

+        else:
+            self.node_level = None
+
+    def update(self, **kwargs: Any) -> None:


why even add this to the SDK? I don't think we have plans to support this ?

john-weiler · 2025-10-22T13:49:10Z

src/galileo/__future__/metric.py

+            logger.error(f"Metric.delete: id='{self.id}' - failed: {e}")
+            raise
+
+    def refresh(self) -> None:


john-weiler · 2025-10-22T13:50:21Z

src/galileo/__future__/metric.py

+            return f"Metric(name='{self.name}', type='local', scorer_fn={self.scorer_fn.__name__})"
+        if self.scorer_type:
+            return (
+                f"Metric(name='{self.name}', id='{self.id}', type='{self.scorer_type.value}', "


Not all metrics will have judges etc

john-weiler · 2025-10-22T13:52:56Z

High level, I think we should have 4 types:

CodeMetric, LlmMetric, GalileoMetric, LocalMetric

They can inherit the common params from a base Metric class.

vamaq · 2025-10-22T13:55:05Z

src/galileo/__future__/metric.py

+        scorer_type (ScorerTypes | None): The type of scorer (LLM, CODE, LOCAL, etc.).
+        description (str): Description of the metric.
+        tags (list[str]): Tags associated with the metric.
+        prompt (str | None): Prompt template for LLM-based scorers (alias for user_prompt).


We should probably rename the properties to reflect what they are. e.g. A prompt, according our definition, should be a prompt object, so here the parameter name here should be prompt_name.
Same for all the other properties.

baz-reviewer · 2025-10-29T15:12:29Z

src/galileo/__future__/__init__.py

 from galileo.search import RecordType
+=======
+from galileo.schema.metrics import GalileoScorers, LocalMetricConfig
+>>>>>>> 37c9012 (add/update)


❌ Failed check: Test / test (ubuntu-latest, 3.12)
I’ve attached the relevant part of the log for your convenience:
Invalid decimal literal [syntax] - merge conflict marker detected (>>>>>>> 37c9012 (add/update))

Finding type: Log Error

Commit e11a8d3 addressed this comment by removing the merge conflict marker ">>>>>>> 37c9012 (add/update)" that was causing the syntax error. The diff shows clean code without any conflict markers, resolving the test failure.

src/galileo/__future__/log_stream.py

baz-reviewer · 2025-10-29T15:21:31Z

src/galileo/__future__/log_stream.py

+        logger.info(f"LogStream.get_metrics: id='{self.id}' - started")
+        config = GalileoConfig.get()
+
+        settings = get_settings_projects_project_id_runs_run_id_scorer_settings_get.sync(
+            project_id=self.project_id, run_id=self.id, client=config.api_client
+        )


Could we restore the guard that raises ValueError when self.id or self.project_id is missing before calling the scorer-settings API? As written, a locally constructed log stream calls get_settings(..., project_id=None, run_id=None, ...), so instead of the documented error we'll send an invalid request to the backend.

Suggested change

logger.info(f"LogStream.get_metrics: id='{self.id}' - started")

config = GalileoConfig.get()

settings = get_settings_projects_project_id_runs_run_id_scorer_settings_get.sync(

project_id=self.project_id, run_id=self.id, client=config.api_client

)

if self.id is None or self.project_id is None:

raise ValueError("LogStream must have both id and project_id to get metrics")

logger.info(f"LogStream.get_metrics: id='{self.id}' - started")

config = GalileoConfig.get()

settings = get_settings_projects_project_id_runs_run_id_scorer_settings_get.sync(

project_id=self.project_id, run_id=self.id, client=config.api_client

)

Finding type: Logical Bugs

baz-reviewer · 2025-10-29T15:21:31Z

src/galileo/__future__/log_stream.py

+            from galileo.__future__ import Metric, LogStream

-            project = Project.get(name="My AI Project")
-            log_stream = project.create_log_stream(name="Production Logs")
+            log_stream = LogStream.get(name="Production Logs", project_name="My Project")

-            # Enable built-in metrics
-            local_metrics = log_stream.enable_metrics([
-                GalileoScorers.correctness,
-                GalileoScorers.completeness,
-                "context_relevance"
+            # Set metrics (replaces existing)
+            log_stream.set_metrics([
+                Metric.scorers.correctness,
+                Metric.scorers.completeness,
+                Metric.get(id="metric-from-console-uuid"),  # From console
            ])


The new docstring advertises support for galileo.__future__.Metric (e.g. using Metric.scorers.correctness), but this module still imports Metric from galileo.schema.metrics. That legacy class has no scorers attribute or get helper, so following the example now raises AttributeError and set_metrics never receives the new Metric objects. Please import Metric from galileo.__future__.metric (and adjust the union) so the promised type actually works.

Finding type: Type Inconsistency

src/galileo/__future__/log_stream.py

baz-reviewer · 2025-10-29T15:45:18Z

src/galileo/__future__/metric.py

+            instance = cls.__new__(cls)
+            StateManagementMixin.__init__(instance)
+            instance._populate_from_scorer_response(retrieved_scorer)
+            instance._set_state(SyncState.SYNCED)
+            result.append(instance)


Would it make sense to extract the repeated logic of creating and initializing a Metric instance from a scorer response into a helper method, since the same 3+ lines appear here and at line 427? For example:

def _from_scorer_response(cls, scorer):
instance = cls.new(cls)
StateManagementMixin.init(instance)
instance._populate_from_scorer_response(scorer)
instance._set_state(SyncState.SYNCED)
return instance

Then you could call this helper in both places to avoid duplication.

Prompt for AI Agents:

In `src/galileo/__future__/metric.py` around lines 465-469 and line 427, there is repeated code for creating and initializing Metric instances from scorer responses. Refactor by extracting a class method like `_from_scorer_response` that encapsulates the common instance creation logic. This method should create a new instance, initialize it with StateManagementMixin, populate from the scorer response, set the sync state, and return the instance. Replace the duplicate code blocks with calls to this new helper method to eliminate code duplication and improve maintainability.

Finding type: Code Dedup and Conventions

nachiket-galileo requested a review from vamaq October 22, 2025 04:44

nachiket-galileo requested a review from a team as a code owner October 22, 2025 04:44

nachiket-galileo requested review from dmcwhorter, john-weiler and quinn-galileo October 22, 2025 04:45

dmcwhorter reviewed Oct 22, 2025

View reviewed changes

src/galileo/__future__/log_stream.py Show resolved Hide resolved

dmcwhorter reviewed Oct 22, 2025

View reviewed changes

src/galileo/__future__/metric.py Outdated Show resolved Hide resolved

dmcwhorter reviewed Oct 22, 2025

View reviewed changes

src/galileo/__future__/metric.py Outdated Show resolved Hide resolved

dmcwhorter reviewed Oct 22, 2025

View reviewed changes

src/galileo/__future__/metric.py Outdated Show resolved Hide resolved

dmcwhorter reviewed Oct 22, 2025

View reviewed changes

src/galileo/__future__/metric.py Outdated Show resolved Hide resolved

dmcwhorter reviewed Oct 22, 2025

View reviewed changes

vamaq reviewed Oct 22, 2025

View reviewed changes

src/galileo/__future__/__init__.py Outdated Show resolved Hide resolved

vamaq reviewed Oct 22, 2025

View reviewed changes

john-weiler reviewed Oct 22, 2025

View reviewed changes

vamaq reviewed Oct 22, 2025

View reviewed changes

nachiket-galileo force-pushed the dev/nachiket/metric-futures-pack branch from 8ded234 to c187e90 Compare October 29, 2025 15:11

baz-reviewer bot reviewed Oct 29, 2025

View reviewed changes

src/galileo/__future__/log_stream.py Outdated Show resolved Hide resolved

baz-reviewer bot reviewed Oct 29, 2025

View reviewed changes

nachiket-galileo requested a review from vamaq October 29, 2025 15:44

baz-reviewer bot reviewed Oct 29, 2025

View reviewed changes

vamaq approved these changes Oct 29, 2025

View reviewed changes

nachiket-galileo added 3 commits October 29, 2025 10:35

add/update

34f7d8c

lint

18a62b2

wip

7e1b3f5

nachiket-galileo added 6 commits October 29, 2025 10:35

wip2

af6ac09

tweaks

c3aabde

linter

aec544b

merge

1c17458

fix test

26cbac0

fix imports

0624b9d

nachiket-galileo force-pushed the dev/nachiket/metric-futures-pack branch from e11a8d3 to 0624b9d Compare October 29, 2025 17:48

fix tests

4fc0aef

nachiket-galileo enabled auto-merge (squash) October 29, 2025 17:58

nachiket-galileo merged commit 16f176d into main Oct 29, 2025
35 of 36 checks passed

nachiket-galileo deleted the dev/nachiket/metric-futures-pack branch October 29, 2025 18:56


		return result

		def _populate_from_scorer_response(self, scorer_response: Any) -> None:

Conversation

nachiket-galileo commented Oct 22, 2025 • edited by baz-reviewer bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

User description

Overview

Key Features

✅ Three Ways to Use Metrics

✅ Complete Backward Compatibility

✅ Enhanced Capabilities

API Comparison

Built-in Scorers

Custom LLM Metrics

Local Function Metrics

New Capabilities

1. Retrieve Metrics

2. List and Filter Metrics

3. State Management

4. Cleaner Parameter Names

Migration Examples

Example 1: Simple Migration

Example 2: Custom LLM Metric

Example 3: All Three Metric Types

Implementation Details

Architecture

State Diagram

Type System

Common Use Cases

Use Case 1: Quick Setup with Built-ins

Use Case 2: Custom Quality Scoring

Use Case 3: Domain-Specific Local Metrics

Use Case 4: Metric Management Dashboard

Generated description

Uh oh!

codecov bot commented Oct 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

baz-reviewer bot Oct 29, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

vamaq Oct 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

nachiket-galileo commented Oct 22, 2025 •

edited by baz-reviewer bot

Loading

codecov bot commented Oct 22, 2025 •

edited

Loading

vamaq Oct 22, 2025 •

edited

Loading

john-weiler commented Oct 22, 2025 •

edited

Loading