Skip to content

Define workload characteristics for different application/enterprise use casesΒ #437

@jgchn

Description

@jgchn

What is the URL, file, or UI containing proposed doc change
Where does one find the original content or where would this change go?

Service Level Objectives (SLOs) and Service Level Agreements (SLAs)

What is the current content or situation in question
This is a really useful document which describes recommended SLO targets for different enterprise use cases. Many times downstream customers have trouble defining key performance metrics for their desired application, so this is a great starting point.

What is the proposed change
In addition to SLO targets, it will be helpful to suggest workload metrics that map to real-world applications. For example,

  • input and output token lengths
  • arrival rates
  • concurrency
  • prefix hit rate

These metrics might be described in terms of ranges, distributions, or averages. There is currently no standard that seems to be out there, so if we can work together and propose an initial starting point, that would be super useful.

Additional context
There are documents that have been compiled that aim to do workload patterns + SLOs -> application mapping. Though incomplete, we can certainly use them as starting points.

Metadata

Metadata

Assignees

No one assigned

    Labels

    documentationImprovements or additions to documentation

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions