You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
feat: support "CachedInputToken" type in "llmRequestCosts" (#1315)
**Description**
Many AI providers have recently supported prompt caching on the provider
side. Cached token prices are significantly cheaper than normal token
processing. For example, in Open AI, cached token price is 10x cheaper
than normal token [1]. Thus, Envoy AI Gateway would like to take cached
token count into account in calculating `llmRequestCosts` in
`AIGatewayRequestCosts`.
Moreover, for self-hosted LLMs, cached tokens can drastically reduce GPU
usage. Thus, in such a case, users would like to care about cached token
usage in `llmRequestCosts`.
1: https://openai.com/api/pricing/
---------
Signed-off-by: Shingo Omura <[email protected]>
0 commit comments