Skip to content

Conversation

@OneZero-Y
Copy link
Contributor

@OneZero-Y OneZero-Y commented Sep 5, 2025

What type of PR is this?
feature

What this PR does / why we need it:

This PR adds comprehensive Prometheus monitoring metrics for the batch classification API to provide detailed performance monitoring and error tracking capabilities.

Key Features:

  • 6 Core Metrics: Request counters, processing duration histograms, text count counters, error counters, concurrent goroutine gauges, and batch size distribution
  • Full Configuration Support: Configurable histogram buckets, batch size range labels, sample rate control, and optional high-precision tracking
  • Enhanced Error Handling: Input validation error tracking and classification error tracking with proper error type classification

Configuration Example:

api:
  batch_classification:
    metrics:
      enabled: true                      # Enable comprehensive metrics collection
      detailed_goroutine_tracking: true  # Track individual goroutine lifecycle 
      high_resolution_timing: false      # Use nanosecond precision timing 
      sample_rate: 1.0                   # Collect metrics for all requests (1.0 = 100%, 0.5 = 50%)
      
      # Batch size range labels for metrics (OPTIONAL - uses sensible defaults)
      # Default ranges: "1", "2-5", "6-10", "11-20", "21-50", "50+"
      # Only specify if you need custom ranges:
      # batch_size_ranges:
      #   - {min: 1, max: 1, label: "1"}
      #   - {min: 2, max: 5, label: "2-5"}
      #   - {min: 6, max: 10, label: "6-10"}
      #   - {min: 11, max: 20, label: "11-20"}
      #   - {min: 21, max: 50, label: "21-50"}
      #   - {min: 51, max: -1, label: "50+"}  # -1 means no upper limit
      
      # Histogram buckets for metrics (directly configure what you need)
      duration_buckets: [0.001, 0.005, 0.01, 0.025, 0.05, 0.1, 0.25, 0.5, 1, 2.5, 5, 10, 30]
      size_buckets: [1, 2, 5, 10, 20, 50, 100, 200]

Metrics Query Examples:

# 1. View total requests by processing type
curl -s http://localhost:9190/metrics | grep "batch_classification_requests_total"

# 2. View processing duration distribution
curl -s http://localhost:9190/metrics | grep "batch_classification_duration_seconds"

# 3. View total texts processed
curl -s http://localhost:9190/metrics | grep "batch_classification_texts_total"

# 4. View error statistics by type
curl -s http://localhost:9190/metrics | grep "batch_classification_errors_total"

# 5. View concurrent goroutines (real-time)
curl -s http://localhost:9190/metrics | grep "batch_classification_concurrent_goroutines"

# 6. View batch size distribution
curl -s http://localhost:9190/metrics | grep "batch_classification_batch_size_distribution"

@netlify
Copy link

netlify bot commented Sep 5, 2025

Deploy Preview for vllm-semantic-router ready!

Name Link
🔨 Latest commit c805974
🔍 Latest deploy log https://app.netlify.com/projects/vllm-semantic-router/deploys/68bad03b3198e1000863de3f
😎 Deploy Preview https://deploy-preview-58--vllm-semantic-router.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

@github-actions
Copy link

github-actions bot commented Sep 5, 2025

👥 vLLM Semantic Team Notification

The following members have been identified for the changed files in this PR and have been automatically assigned:

📁 src

Owners: @rootfs, @Xunzhuo, @wangchen615
Files changed:

  • src/semantic-router/pkg/metrics/metrics_test.go
  • src/semantic-router/pkg/api/server.go
  • src/semantic-router/pkg/api/server_test.go
  • src/semantic-router/pkg/config/config.go
  • src/semantic-router/pkg/config/config_test.go
  • src/semantic-router/pkg/metrics/metrics.go

📁 config

Owners: @rootfs
Files changed:

  • config/config.yaml

📁 website

Owners: @Xunzhuo
Files changed:

  • website/docs/getting-started/configuration.md

This comment was automatically generated based on the OWNER files in the repository.

@netlify
Copy link

netlify bot commented Sep 5, 2025

Deploy Preview for vllm-semantic-router ready!

Name Link
🔨 Latest commit 219ff84
🔍 Latest deploy log https://app.netlify.com/projects/vllm-semantic-router/deploys/68baeb4f8d1f3f0008a43158
😎 Deploy Preview https://deploy-preview-58--vllm-semantic-router.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

Signed-off-by: OneZero-Y <[email protected]>
@rootfs
Copy link
Collaborator

rootfs commented Sep 5, 2025

@OneZero-Y nicely done!

For UX consideration, would batch size range be ok to have a hardcode default instead of exposing on the config?

batch_size_ranges:
        - {min: 1, max: 1, label: "1"}
        - {min: 2, max: 5, label: "2-5"}
        - {min: 6, max: 10, label: "6-10"}
        - {min: 11, max: 20, label: "11-20"}
        - {min: 21, max: 50, label: "21-50"}
        - {min: 51, max: -1, label: "50+"} 

@OneZero-Y
Copy link
Contributor Author

@OneZero-Y nicely done!

For UX consideration, would batch size range be ok to have a hardcode default instead of exposing on the config?

batch_size_ranges:
        - {min: 1, max: 1, label: "1"}
        - {min: 2, max: 5, label: "2-5"}
        - {min: 6, max: 10, label: "6-10"}
        - {min: 11, max: 20, label: "11-20"}
        - {min: 21, max: 50, label: "21-50"}
        - {min: 51, max: -1, label: "50+"} 

@rootfs Thank you for your recognition.
Should the configuration be optional (fallback to hardcoded defaults when batch_size_ranges is not specified), or just keep hardcoded values only?

@rootfs
Copy link
Collaborator

rootfs commented Sep 5, 2025

let's hardcode it first. You can add a comment there and explain this.

@OneZero-Y
Copy link
Contributor Author

let's hardcode it first. You can add a comment there and explain this.

There are instructions in configuration.md,The batch_size_ranges configuration in config. yaml has been removed.
please review it.

# Batch size range labels for metrics (OPTIONAL - uses sensible defaults)
      # Default ranges: "1", "2-5", "6-10", "11-20", "21-50", "50+"
      # Only specify if you need custom ranges:
      # batch_size_ranges:
      #   - {min: 1, max: 1, label: "1"}
      #   - {min: 2, max: 5, label: "2-5"}
      #   - {min: 6, max: 10, label: "6-10"}
      #   - {min: 11, max: 20, label: "11-20"}
      #   - {min: 21, max: 50, label: "21-50"}
      #   - {min: 51, max: -1, label: "50+"}  # -1 means no upper limit

@rootfs rootfs merged commit 420f7bc into vllm-project:main Sep 5, 2025
9 checks passed
@OneZero-Y OneZero-Y deleted the feat/add-batch-metrics branch September 5, 2025 14:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants