-
Notifications
You must be signed in to change notification settings - Fork 27
MDBF-1149 Make Zabbix Calls Non-Blocking #885
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
a480dfa to
f3484ad
Compare
|
@fauust with this implementation you should have in the buildbot logs any errors related to buildbot -> zabbix calls :) |
`Background` The Zabbix Python module performs synchronous API calls, which block the reactor’s event loop and can potentially cause the master to freeze. Even worse, the configured timeout can block execution for up to 10 seconds. After reviewing the official integration documentation (https://www.zabbix.com/integrations/python ), I concluded that no Twisted-compatible module exists that allows asynchronous API calls. Writing such a module from scratch would be excessive for such a small component. Since no Twisted integration is available, I also see no benefit in switching to a different Zabbix module; the current one has served us well. `Changes` The solution is to run the synchronous code in a separate thread by passing `getMetric` to `deferToThread`. This prevents the main thread from being blocked. The critical section is wrapped in a `try/except` block to ensure that any failureinside `getMetric`, for example, Zabbix unavailability, missing metrics, or network issues—does not prevent the build from starting. I also reduced the `timeout` to 3 seconds. Although the main thread is no longer blocked, Buildbot’s `BuildRequestDistributor` will not proceed to the next builder’s `build request` until `canStartBuild` has completed. Reducing the `timeout` prevents unnecessary delays in processing the build request queue
f3484ad to
9630ee4
Compare
|
Do you think that you can test this by adding the following on diff --git a/nftables.conf b/nftables.conf
index 7615939..30ae173 100644
--- a/nftables.conf
+++ b/nftables.conf
@@ -25,6 +25,8 @@ table inet filter {
+ ip saddr 2a01:4f8:c17:905a::1 drop
+ ip saddr 78.47.143.59 drop
tcp dport { 80, 443 } acceptAnd restarting the firewall |
|
A couple of tests for the s390x master in different simulated configurations. [OK] Normal operationBefore from 2025-12-15 07:26:50+0000 [-] WOLOLO Worker s390x-bbw1-docker-ubuntu-2204 load is 1.574385[OK] No Zabbix ItemAdded a non-existent item try:
load = yield threads.deferToThread(
getMetric, worker_name, "this_item_does_not_exist"
)To see if we get Got 2025-12-15 07:32:58+0000 [-] Zabbix Error: Check configuration for ibm-s390x-ubuntu2404-03
Traceback (most recent call last):
File "/opt/buildbot/.venv/lib/python3.9/site-packages/twisted/internet/defer.py", line 1792, in gotResult
_inlineCallbacks(r, gen, status, context)
File "/opt/buildbot/.venv/lib/python3.9/site-packages/twisted/internet/defer.py", line 1693, in _inlineCallbacks
result = context.run(
File "/opt/buildbot/.venv/lib/python3.9/site-packages/twisted/python/failure.py", line 518, in throwExceptionIntoGenerator
return g.throw(self.type, self.value, self.tb)
File "/srv/buildbot/master/utils.py", line 310, in canStartBuild
log.err(e, f"Zabbix Error: Check configuration for {worker_name}")
--- <exception caught here> ---
File "/srv/buildbot/master/utils.py", line 306, in canStartBuild
load = yield threads.deferToThread(
File "/opt/buildbot/.venv/lib/python3.9/site-packages/twisted/python/threadpool.py", line 244, in inContext
result = inContext.theWork() # type: ignore[attr-defined]
File "/opt/buildbot/.venv/lib/python3.9/site-packages/twisted/python/threadpool.py", line 260, in <lambda>
inContext.theWork = lambda: context.call( # type: ignore[attr-defined]
File "/opt/buildbot/.venv/lib/python3.9/site-packages/twisted/python/context.py", line 117, in callWithContext
return self.currentContext().callWithContext(ctx, func, *args, **kw)
File "/opt/buildbot/.venv/lib/python3.9/site-packages/twisted/python/context.py", line 82, in callWithContext
return func(*args, **kw)
File "/srv/buildbot/master/utils.py", line 635, in getMetric
raise ZabbixNoItemFound
utils.ZabbixNoItemFound: [OK] No hostModified private["worker_name_mapping"] = {
"s390x-bbw1": "this_host_does_not_exist",Got 2025-12-15 07:37:44+0000 [-] Zabbix Error: Check configuration for this_host_does_not_exist
Traceback (most recent call last):
File "/opt/buildbot/.venv/lib/python3.9/site-packages/twisted/internet/defer.py", line 1792, in gotResult
_inlineCallbacks(r, gen, status, context)
File "/opt/buildbot/.venv/lib/python3.9/site-packages/twisted/internet/defer.py", line 1693, in _inlineCallbacks
result = context.run(
File "/opt/buildbot/.venv/lib/python3.9/site-packages/twisted/python/failure.py", line 518, in throwExceptionIntoGenerator
return g.throw(self.type, self.value, self.tb)
File "/srv/buildbot/master/utils.py", line 310, in canStartBuild
log.err(e, f"Zabbix Error: Check configuration for {worker_name}")
--- <exception caught here> ---
File "/srv/buildbot/master/utils.py", line 306, in canStartBuild
load = yield threads.deferToThread(
File "/opt/buildbot/.venv/lib/python3.9/site-packages/twisted/python/threadpool.py", line 244, in inContext
result = inContext.theWork() # type: ignore[attr-defined]
File "/opt/buildbot/.venv/lib/python3.9/site-packages/twisted/python/threadpool.py", line 260, in <lambda>
inContext.theWork = lambda: context.call( # type: ignore[attr-defined]
File "/opt/buildbot/.venv/lib/python3.9/site-packages/twisted/python/context.py", line 117, in callWithContext
return self.currentContext().callWithContext(ctx, func, *args, **kw)
File "/opt/buildbot/.venv/lib/python3.9/site-packages/twisted/python/context.py", line 82, in callWithContext
return func(*args, **kw)
File "/srv/buildbot/master/utils.py", line 628, in getMetric
raise ZabbixNoHostFound
utils.ZabbixNoHostFound: [OK] Network errors (Zabbix API)I didn't modified the private["zabbix_server"] = "https://doesnotexist.mariadb.org"Got: 2025-12-15 07:46:16+0000 [-] Zabbix Error: Unexpected error when fetching data for ibm-s390x-ubuntu2404-03
Traceback (most recent call last):
File "/opt/buildbot/.venv/lib/python3.9/site-packages/twisted/internet/defer.py", line 1792, in gotResult
_inlineCallbacks(r, gen, status, context)
File "/opt/buildbot/.venv/lib/python3.9/site-packages/twisted/internet/defer.py", line 1693, in _inlineCallbacks
result = context.run(
File "/opt/buildbot/.venv/lib/python3.9/site-packages/twisted/python/failure.py", line 518, in throwExceptionIntoGenerator
return g.throw(self.type, self.value, self.tb)
File "/srv/buildbot/master/utils.py", line 316, in canStartBuild
log.err(
--- <exception caught here> ---
File "/srv/buildbot/master/utils.py", line 306, in canStartBuild
load = yield threads.deferToThread(
File "/opt/buildbot/.venv/lib/python3.9/site-packages/twisted/python/threadpool.py", line 244, in inContext
result = inContext.theWork() # type: ignore[attr-defined]
File "/opt/buildbot/.venv/lib/python3.9/site-packages/twisted/python/threadpool.py", line 260, in <lambda>
inContext.theWork = lambda: context.call( # type: ignore[attr-defined]
File "/opt/buildbot/.venv/lib/python3.9/site-packages/twisted/python/context.py", line 117, in callWithContext
return self.currentContext().callWithContext(ctx, func, *args, **kw)
File "/opt/buildbot/.venv/lib/python3.9/site-packages/twisted/python/context.py", line 82, in callWithContext
return func(*args, **kw)
File "/srv/buildbot/master/utils.py", line 619, in getMetric
zapi.login(api_token=private_config["private"]["zabbix_token"])
File "/opt/buildbot/.venv/lib/python3.9/site-packages/pyzabbix/api.py", line 123, in login
self.version = Version(self.api_version())
File "/opt/buildbot/.venv/lib/python3.9/site-packages/pyzabbix/api.py", line 187, in api_version
return self.apiinfo.version()
File "/opt/buildbot/.venv/lib/python3.9/site-packages/pyzabbix/api.py", line 278, in __call__
return self._parent.do_request(self._method, args or kwargs)["result"]
File "/opt/buildbot/.venv/lib/python3.9/site-packages/pyzabbix/api.py", line 216, in do_request
resp = self.session.post(
File "/opt/buildbot/.venv/lib/python3.9/site-packages/requests/sessions.py", line 637, in post
return self.request("POST", url, data=data, json=json, **kwargs)
File "/opt/buildbot/.venv/lib/python3.9/site-packages/requests/sessions.py", line 589, in request
resp = self.send(prep, **send_kwargs)
File "/opt/buildbot/.venv/lib/python3.9/site-packages/requests/sessions.py", line 703, in send
r = adapter.send(request, **kwargs)
File "/opt/buildbot/.venv/lib/python3.9/site-packages/requests/adapters.py", line 700, in send
raise ConnectionError(e, request=request)
requests.exceptions.ConnectionError: HTTPSConnectionPool(host='doesnotexist.mariadb.org', port=443): Max retries exceeded with url: /api_jsonrpc.php (Caused by NameResolutionError("<urllib3.connection.HTTPSConnection object at 0x7f5f701faee0>: Failed to resolve 'doesnotexist.mariadb.org' ([Errno -2] Name or service not known)")) |
Background
The Zabbix Python module performs synchronous API calls, which block the reactor’s event loop and can potentially cause the master to freeze. Even worse, the configured timeout can block execution for up to 10 seconds.
After reviewing the official integration documentation (https://www.zabbix.com/integrations/python ), I concluded that no Twisted-compatible module exists that allows asynchronous API calls. Writing such a module from scratch would be excessive for such a small component. Since no Twisted integration is available,
I also see no benefit in switching to a different Zabbix module; the current one has served us well.
Changes
The solution is to run the synchronous code in a separate thread by passing
getMetrictodeferToThread. This prevents the main thread from being blocked.The critical section is wrapped in a
try/exceptblock to ensure that any failureinsidegetMetric, for example, Zabbix unavailability, missing metrics, or network issues does not prevent the build from starting.I also reduced the
timeoutto 3 seconds.Although the main thread is no longer blocked, Buildbot’s
BuildRequestDistributorwill not proceed to the next builder’sbuild requestuntilcanStartBuildhas completed. Reducing thetimeoutprevents unnecessary delays in processing the build request queue