[Draft] perf: batch-drain buffered events in memqueue runLoop#49939
[Draft] perf: batch-drain buffered events in memqueue runLoop#49939strawgate wants to merge 1 commit intoelastic:mainfrom
Conversation
|
This pull request doesn't have a |
🤖 GitHub commentsJust comment with:
|
|
This pull request does not have a backport label.
To fixup this pull request, you need to add the backport labels for the needed
|
842a5c7 to
5d76081
Compare
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (2)
🚧 Files skipped from review as they are similar to previous changes (2)
📝 WalkthroughWalkthroughAdds eight behavioral tests to ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
The memqueue runLoop processes one event per select loop iteration, paying full goroutine scheduling overhead for each event. When multiple producers are sending concurrently, the pushChan buffer fills up but is still drained one-at-a-time. After handling the first event from the select, perform a non-blocking drain of up to 64 additional already-buffered events before returning to the main select. This amortizes scheduling overhead across batches while the cap prevents starvation of Get, Ack, and Close operations. Benchmark results (Apple M4, null output pipeline, batch_size=2048): - BenchmarkFullPipeline/batch_2048: +19-24% throughput (p=0.010) - BenchmarkProducerThroughput (10 producers): neutral (p=0.442) - ES e2e (real Elasticsearch output): neutral, no regression (all p>0.1) Includes 7 behavioral equivalence tests that pass on both the old and new code paths, covering backpressure, shutdown, multi-producer delivery, ack correctness, and rapid close scenarios.
5d76081 to
58e534e
Compare
The memqueue runLoop processes one event per select loop iteration, paying full goroutine scheduling overhead for each event. When multiple producers are sending concurrently, the pushChan buffer fills up but is still drained one-at-a-time. When the pipeline is CPU constrained this has a significant impact on e2e performance. When the pipeline is I/O constrained (network to ES or from the data source), this has no real impact.
After handling the first event from the select, perform a non-blocking drain of up to 64 additional already-buffered events before returning to the main select. This amortizes scheduling overhead across batches while the cap prevents starvation of Get, Ack, and Close operations.
Benchmark results (Apple M4, null output pipeline, batch_size=2048):
Includes 8 behavioral equivalence tests that pass on both the old and new code paths, covering backpressure, shutdown, multi-producer delivery, ack correctness, and rapid close scenarios.