- 
                Notifications
    
You must be signed in to change notification settings  - Fork 117
 
feat: Add histogram metric type #386
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: Add histogram metric type #386
Conversation
| /// \param metric The metric object to update. | ||
| /// \param value The amount to observe metric's value to. | ||
| /// \return a TRITONSERVER_Error indicating success or failure. | ||
| TRITONSERVER_DECLSPEC struct TRITONSERVER_Error* TRITONSERVER_MetricObserve( | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reuse TRITONSERVER_MetricSet?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@rmccorm4 Any thought? Basically we need to merge two funcions below into one Metric::Set(double value). It works but may add confusion.
Lines 338 to 404 in fd5c44b
| TRITONSERVER_Error* | |
| Metric::Set(double value) | |
| { | |
| if (metric_ == nullptr) { | |
| return TRITONSERVER_ErrorNew( | |
| TRITONSERVER_ERROR_INTERNAL, | |
| "Could not set metric value. Metric has been invalidated."); | |
| } | |
| switch (kind_) { | |
| case TRITONSERVER_METRIC_KIND_COUNTER: { | |
| return TRITONSERVER_ErrorNew( | |
| TRITONSERVER_ERROR_UNSUPPORTED, | |
| "TRITONSERVER_METRIC_KIND_COUNTER does not support Set"); | |
| } | |
| case TRITONSERVER_METRIC_KIND_GAUGE: { | |
| auto gauge_ptr = reinterpret_cast<prometheus::Gauge*>(metric_); | |
| gauge_ptr->Set(value); | |
| break; | |
| } | |
| case TRITONSERVER_METRIC_KIND_HISTOGRAM: { | |
| return TRITONSERVER_ErrorNew( | |
| TRITONSERVER_ERROR_UNSUPPORTED, | |
| "TRITONSERVER_METRIC_KIND_HISTOGRAM does not support Set"); | |
| } | |
| default: | |
| return TRITONSERVER_ErrorNew( | |
| TRITONSERVER_ERROR_UNSUPPORTED, | |
| "Unsupported TRITONSERVER_MetricKind"); | |
| } | |
| return nullptr; // Success | |
| } | |
| TRITONSERVER_Error* | |
| Metric::Observe(double value) | |
| { | |
| if (metric_ == nullptr) { | |
| return TRITONSERVER_ErrorNew( | |
| TRITONSERVER_ERROR_INTERNAL, | |
| "Could not set metric value. Metric has been invalidated."); | |
| } | |
| switch (kind_) { | |
| case TRITONSERVER_METRIC_KIND_COUNTER: { | |
| return TRITONSERVER_ErrorNew( | |
| TRITONSERVER_ERROR_UNSUPPORTED, | |
| "TRITONSERVER_METRIC_KIND_COUNTER does not support Observe"); | |
| } | |
| case TRITONSERVER_METRIC_KIND_GAUGE: { | |
| return TRITONSERVER_ErrorNew( | |
| TRITONSERVER_ERROR_UNSUPPORTED, | |
| "TRITONSERVER_METRIC_KIND_GAUGE does not support Observe"); | |
| } | |
| case TRITONSERVER_METRIC_KIND_HISTOGRAM: { | |
| auto histogram_ptr = reinterpret_cast<prometheus::Histogram*>(metric_); | |
| histogram_ptr->Observe(value); | |
| break; | |
| } | |
| default: | |
| return TRITONSERVER_ErrorNew( | |
| TRITONSERVER_ERROR_UNSUPPORTED, | |
| "Unsupported TRITONSERVER_MetricKind"); | |
| } | |
| return nullptr; // Success | |
| } | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would need to take a closer look, but my gut reaction is that Guan is probably right and we can probably just reuse MetricValue and MetricSet which will call Collect and Observe internally when kind == KIND_HISTOGRAM if functionally equivalent
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
MetricValue may not work if Collect returns multiple values (one per bucket?), but again will need to take a closer look. Let me know if you already know more details on this from your research.
But similar to the new C API for MetricV2, keep in mind how this would work if we added support for Summary metric and wanted to get the values for each quantile, which is basically same as values for each bucket. Ideally the same API would work for both or all types.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
MetricValue cannot be reused for histogram.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would like to revisit this change. The consensus is to keep C API and python_backend API 1:1 matched. I am inclined to add a new C API TRITONSERVER_MetricObserve for histogram instead of reusing TRITONSERVER_MetricSet for three reasons.
- Both 
HistogramandSummarytypes callObserveto record new value. We can reuse observe forSummarytype if we add it in the future. - Histogram also has 
ObserveMultipleAPI which may be added in the future. I don't like the idea thatHistogram.SetandHistogram.ObserveMultiplecoexist. - Setting histogram to a value aka Histogram.set(val) is semantically wrong. It is confusing to users familiar with Prometheus APIs. The description of 
TRITONSERVER_MetricSetcan be verbose as well in order to describe different behaviors for counter/gauge and histogram/summary. 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well.. Triton metrics API is not supposed to be mirroring "Prometheus API", Prometheus is one of the "forms" we can exhibit the metrics as. So we should design the API to be using generic terms for statistics, the meaning of gauge/counter/histogram (and summary?) is not affected by the fact that Prometheus or other metrics libraries are used.
Thinking from this mindset, my question is if observe is the generic verb for recording the statistics of histogram. If that is the case, then I am fine to add XXXObserve, otherwise, we should use the proper verb
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think either sample or observe. Voting for Observe for simplicity.
A histogram samples observations (usually things like request durations or response sizes) and counts them in configurable buckets. It also provides a sum of all observed values.
Similar to a histogram, a summary samples observations (usually things like request durations and response sizes). While it also provides a total count of observations and a sum of all observed values, it calculates configurable quantiles over a sliding time window.
d0fed63    to
    fd5c44b      
    Compare
  
    79566e7    to
    edb0533      
    Compare
  
            
          
                include/triton/core/tritonserver.h
              
                Outdated
          
        
      | /// Supports metrics of kind TRITONSERVER_METRIC_KIND_GAUGE and returns | ||
| /// Set the current value of metric to value or observe the value to metric. | ||
| /// Supports metrics of kind TRITONSERVER_METRIC_KIND_GAUGE and | ||
| /// TRITONSERVER_METRIC_KIND_HISTOGRAM. Returns | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we want to explain what it does when TRITONSERVER_METRIC_KIND_HISTOGRAM is histogram (i.e. increment the counter for the bucket that value matches)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What does "observe" mean? Can we add more details?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's why I still think we need a new C API TRITONSERVER_MetricObserve.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
        
          
                src/metric_family.h
              
                Outdated
          
        
      | buckets_.resize(bucket_count); | ||
| std::memcpy(buckets_.data(), buckets, sizeof(double) * bucket_count); | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| buckets_.resize(bucket_count); | |
| std::memcpy(buckets_.data(), buckets, sizeof(double) * bucket_count); | |
| buckets_ = std::vector<double>(buckets, buckets + bucket_count); | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated
        
          
                src/test/metrics_api_test.cc
              
                Outdated
          
        
      | std::vector<std::uint64_t> cumulative_counts = {1, 1, 2, 2, 3, 3}; | ||
| ASSERT_EQ(buckets.size() + 1, cumulative_counts.size()); | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The cumulative_counts is depending on the buckets you split for the histogram, you should initialize cumulative_counts according to the buckets and data
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated
Co-authored-by: Yingge He <[email protected]>
What does the PR do?
Support histogram metric type and add tests.
Checklist
<commit_type>: <Title>Commit Type:
Check the conventional commit type
box here and add the label to the github PR.
Related PRs:
triton-inference-server/vllm_backend#56
triton-inference-server/python_backend#374
triton-inference-server/server#7525
Where should the reviewer start?
n/a
Test plan:
n/a
17487728
Caveats:
n/a
Background
Customer requested histogram metrics from vLLM.
Related Issues: (use one of the action keywords Closes / Fixes / Resolves / Relates to)
n/a