title	category	description	order	icon
Metric Selection Guide	Guides	Choose the right evaluation metrics for your optimization use case	10	settings

Metric Selection Guide

This guide helps you choose the right evaluation metric for your use case or determine when to create a custom metric.

Metric Comparison Matrix

Metric Type	Use Case	Expected Format	When to Use
ExactMatchMetric	Simple string matching	Plain text strings	When you need exact string matching between prediction and ground truth
StandardJSONMetric	Structured JSON evaluation	JSON objects or strings	When evaluating structured JSON responses with specific fields to compare
Custom Metric	Specialized evaluation needs	Any custom format	When existing metrics don't meet your evaluation needs

When to Create a Custom Metric

Create a custom metric when:

Complex Evaluation Logic: Your evaluation requires complex logic that can't be handled by configuring existing metrics
Domain-Specific Scoring: You need domain-specific scoring rules or normalization
Multi-Step Evaluation: Your evaluation process involves multiple steps or comparisons
Custom Parsing: You need special parsing logic for your predictions or ground truth
Specialized Output Format: Your model outputs in a format not supported by existing metrics

Custom Metric Implementation Example

from prompt_ops.core.metrics import MetricBase

class MyCustomMetric(MetricBase):
    def __init__(self, custom_param=None, **kwargs):
        # Initialize any custom parameters
        self.custom_param = custom_param

    def __call__(self, gold, pred, trace=False, **kwargs):
        """
        Evaluate the prediction against the ground truth.

        Args:
            gold: Ground truth example
            pred: Predicted example
            trace: Whether to enable tracing for debugging

        Returns:
            Either a dictionary containing metric scores or a single float score
        """
        # Extract values from gold and pred
        gold_value = self.extract_value(gold, "answer", gold)
        pred_value = self.extract_value(pred, "answer", pred)

        # Your custom evaluation logic here
        score = self._calculate_score(gold_value, pred_value)

        if trace:
            # Return detailed results for debugging
            return {
                "score": score,
                "details": {
                    "gold": gold_value,
                    "pred": pred_value,
                    # Add any other details
                }
            }

        # Return a single score for normal use
        return score

    def _calculate_score(self, gold, pred):
        # Implement your custom scoring logic here
        # Return a float score between 0.0 and 1.0
        pass

Configuration Examples

ExactMatchMetric Configuration

metric:
  class: "prompt_ops.core.metrics.ExactMatchMetric"
  params:
    case_sensitive: false
    strip_whitespace: true

StandardJSONMetric Configuration

metric:
  class: "prompt_ops.core.metrics.StandardJSONMetric"
  params:
    output_fields: ["categories", "sentiment", "urgency"]
    required_fields: ["categories"]
    nested_fields:
      categories: ["cleaning_services", "maintenance", "security"]
    field_weights:
      categories: 0.6
      sentiment: 0.2
      urgency: 0.2
    evaluation_mode: "selected_fields_comparison"

FacilityMetric Configuration

metric:
  class: "prompt_ops.core.metrics.FacilityMetric"
  params:
    output_field: "answer"
    strict_json: false

Example Metrics Implementation

For more complex evaluation needs, you can implement specialized metrics. For example:

The HotpotQA Metric shows how to implement specialized evaluation for multi-hop question answering, handling answer correctness and supporting fact verification.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Metric Selection Guide

Metric Comparison Matrix

When to Create a Custom Metric

Custom Metric Implementation Example

Configuration Examples

ExactMatchMetric Configuration

StandardJSONMetric Configuration

FacilityMetric Configuration

Example Metrics Implementation

FilesExpand file tree

metric_selection_guide.md

Latest commit

History

metric_selection_guide.md

File metadata and controls

Metric Selection Guide

Metric Comparison Matrix

When to Create a Custom Metric

Custom Metric Implementation Example

Configuration Examples

ExactMatchMetric Configuration

StandardJSONMetric Configuration

FacilityMetric Configuration

Example Metrics Implementation