Skip to content
Open
Show file tree
Hide file tree
Changes from 25 commits
Commits
Show all changes
29 commits
Select commit Hold shift + click to select a range
f1e7419
Remove worst-case additional 50ms latency for non-rate limited requests
timgrein Oct 8, 2025
a326e31
Update docs/changelog/136167.yaml
timgrein Oct 8, 2025
191029f
Merge branch 'main' into es-eis-latency-issue
timgrein Oct 8, 2025
edaa0f0
Do not use forbidden API
timgrein Oct 8, 2025
57c5605
Merge remote-tracking branch 'origin/es-eis-latency-issue' into es-ei…
timgrein Oct 8, 2025
f98b82e
Merge branch 'main' into es-eis-latency-issue
timgrein Oct 8, 2025
f17ec00
Move startRequestQueueTask before start signal
timgrein Oct 8, 2025
90ee1a1
Merge remote-tracking branch 'origin/es-eis-latency-issue' into es-ei…
timgrein Oct 8, 2025
a9e7610
Cleanup in finally block
timgrein Oct 8, 2025
ec513be
Reject request on shutdown
timgrein Oct 8, 2025
174526c
Reuse rateLimitSettingsEnabled check
timgrein Oct 8, 2025
f506cb3
Add NoopTask to wake up queue on shutdown
timgrein Oct 9, 2025
ae349fd
Only add non-rate-limited tasks to fast-path request queue
timgrein Oct 9, 2025
540f49d
Extract rejection logic to common static method
timgrein Oct 9, 2025
0dca88a
Remove unnecessary cast
timgrein Oct 10, 2025
0590561
Use string placeholder in assertion
timgrein Oct 10, 2025
2e65475
Adjust test to check that a throwing task does not terminate the service
timgrein Oct 10, 2025
4fb2372
Adjust error message in general exception handler
timgrein Oct 10, 2025
2930151
Adjust warn to error
timgrein Oct 10, 2025
b2fd85f
Adjust error message when request gets rejected
timgrein Oct 13, 2025
91f387a
Rename id in RateLimitingEndpointHandler to rateLimitGroupingId
timgrein Oct 13, 2025
90e672b
Use Strings.format(...) in assertion
timgrein Oct 13, 2025
1cf24dc
Use thenAnswer instead of suppression
timgrein Oct 13, 2025
a92a7c0
Only reject requests of the respective execution path (rate-limited v…
timgrein Oct 13, 2025
8e52c22
Merge branch 'main' into es-eis-latency-issue
timgrein Oct 14, 2025
a868152
Submit only ingest embeddings requests to rate-limited execution path
timgrein Oct 14, 2025
dd53fcb
Merge remote-tracking branch 'origin/es-eis-latency-issue' into es-ei…
timgrein Oct 14, 2025
c575eba
Add rate limiting check
timgrein Oct 14, 2025
69db0e1
Make NoopTask all caps
timgrein Oct 14, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 6 additions & 0 deletions docs/changelog/136167.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
pr: 136167
summary: "[Inference API] Remove worst-case additional 50ms latency for non-rate limited\
\ requests"
area: Machine Learning
type: bug
issues: []
Loading