Skip to content

Account for retry-after-ms header #14

@simonkurtz-MSFT

Description

@simonkurtz-MSFT

Presently, the module only accounts for the retry-after header containing a value in seconds. Azure OpenAI Provisioned Throughput (PTU) deployments also pass back the retry-after-ms header containing a value in milliseconds. This value may be preferential to seconds.

From the PTU documentation:

*A 429 response indicates that the allocated PTUs are fully consumed at the time of the call. The response includes the retry-after-ms and retry-after headers that tell you the time to wait before the next call will be accepted. *

As such, there is only a longer delay when not using retry-after-ms when there is simply no PTU left to consume (overall and/or tokens-per-minute?). This may be an acceptable situation that may not surface too often.


One approach may be to convert everything internal to the module to milliseconds. These resources provide the starting point for the enhancement:

cc @kristapratico

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions