-
Notifications
You must be signed in to change notification settings - Fork 1k
Description
Summary:
MySQL Backend Monitoring showed artificially high ping times, especially when the number of servers exceeded 200. The issue was caused by how ping were being dispatched and how logs were being written during the monitoring cycle.
Problem Details
Previously, the monitoring logic dispatched all MySQL ping tasks at once and then performed a poll() to check for results. While this worked functionally, it led to inaccurate ping timings because all ping requests were sent in one go, and no polling occurred to process responses that were already available. This issue became particularly noticeable when the number of monitored servers was high (200+).
Additionally, the logging mechanism slowed things down further. Every time 50% of the ping tasks were completed, the code would write all results to the mysql_server_ping_log table in the same thread that was also handling non-blocking I/O. This blocking write operation delayed subsequent poll() calls, worsening timing accuracy.
Changes Made
1. Introduced batching for task dispatching
- Tasks are now dispatched in batches of 30 instead of all at once.
- After each batch dispatch, we perform a
poll()call with a 0 timeout to process any ready sockets immediately. - This continues until all tasks are dispatched.
- Once all tasks are sent, we switch back to the regular polling loop with the configured timeout.
- Result: Much smoother and more accurate ping timings, even with large numbers of servers.
2. Moved log writing to a thread pool
- Instead of writing logs synchronously (blocking the main thread), log writes are now queued into a thread pool.
- The main monitoring loop immediately continues polling without waiting for the log writes to complete.
- This removes the I/O bottleneck that was skewing timing measurements.