-
Notifications
You must be signed in to change notification settings - Fork 100
Description
What is the URL, file, or UI containing proposed doc change
Where does one find the original content or where would this change go?
Service Level Objectives (SLOs) and Service Level Agreements (SLAs)
What is the current content or situation in question
This is a really useful document which describes recommended SLO targets for different enterprise use cases. Many times downstream customers have trouble defining key performance metrics for their desired application, so this is a great starting point.
What is the proposed change
In addition to SLO targets, it will be helpful to suggest workload metrics that map to real-world applications. For example,
- input and output token lengths
- arrival rates
- concurrency
- prefix hit rate
These metrics might be described in terms of ranges, distributions, or averages. There is currently no standard that seems to be out there, so if we can work together and propose an initial starting point, that would be super useful.
Additional context
There are documents that have been compiled that aim to do workload patterns + SLOs -> application mapping. Though incomplete, we can certainly use them as starting points.