-
Notifications
You must be signed in to change notification settings - Fork 131
Description
Summary
The fix from elastic/elasticsearch-js#2027 addresses a hang in the bulk helper's semaphore logic when server responses arrive more slowly than the configured flushInterval. This can cause the bulk helper to deadlock and stop processing new requests under some timing conditions, especially in high-latency or intermittently slow environments.
Problem
Currently, opensearch-js appears to have the same semaphore implementation as in the pre-fix version of elasticsearch-js, where only a single resolveSemaphore reference is managed. This can result in unresolved promises and a hang when multiple requests are queued but not released due to slow responses. The semaphore logic needs to be updated to match the upstream fix.
- See Helpers.js for the current implementation.
- Upstream fix for reference: Fix hang in bulk helper semaphore when server responses are slower than flushInterval elastic/elasticsearch-js#2027
Steps to Reproduce
- Use the bulk helper with a slow server (slower than the configured
flushInterval). - Observe that the bulk helper may hang and stop processing.
Proposed Solution
- Backport the semaphore fix from Fix hang in bulk helper semaphore when server responses are slower than flushInterval elastic/elasticsearch-js#2027, which rewrites the semaphore logic to queue multiple waiting promises and release them properly.
- Add a regression test to ensure this hang does not occur in the future.
References
- elasticsearch-js PR 2027
- Current opensearch-js Helpers.js semaphore code
- Bulk helper test with flushInterval
Labels
🐛 bug, Roadmap:Stability/Availability/Resiliency