Account for `retry-after-ms` header

Presently, the module only accounts for the `retry-after` header containing a value in seconds. Azure OpenAI Provisioned Throughput (PTU) deployments also pass back the `retry-after-ms` header containing a value in milliseconds. This value may be preferential to seconds. 

From the [PTU documentation](https://learn.microsoft.com/en-us/azure/ai-services/openai/how-to/provisioned-get-started#what-should--i-do-when-i-receive-a-429-response):

*A 429 response indicates that **the allocated PTUs are fully consumed** at the time of the call. The response **includes the `retry-after-ms` and `retry-after headers`** that tell you the time to wait before the next call will be accepted. *

As such, there is only a longer delay when not using `retry-after-ms` when there is simply no PTU left to consume (overall and/or tokens-per-minute?). This may be an acceptable situation that may not surface too often.

---

One approach may be to convert everything internal to the module to milliseconds. These resources provide the starting point for the enhancement:

- https://learn.microsoft.com/en-us/azure/ai-services/openai/how-to/provisioned-get-started#handling-high-utilization
- https://github.com/openai/openai-python/pull/1046

cc @kristapratico

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Account for `retry-after-ms` header #14

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Account for retry-after-ms header #14

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions

Account for `retry-after-ms` header #14