Skip to content

Conversation

@RazvanLiviuVarzaru
Copy link
Collaborator

Background

The Zabbix Python module performs synchronous API calls, which block the reactor’s event loop and can potentially cause the master to freeze. Even worse, the configured timeout can block execution for up to 10 seconds.

After reviewing the official integration documentation (https://www.zabbix.com/integrations/python ), I concluded that no Twisted-compatible module exists that allows asynchronous API calls. Writing such a module from scratch would be excessive for such a small component. Since no Twisted integration is available,
I also see no benefit in switching to a different Zabbix module; the current one has served us well.

Changes

The solution is to run the synchronous code in a separate thread by passing getMetric to deferToThread. This prevents the main thread from being blocked.
The critical section is wrapped in a try/except block to ensure that any failureinside getMetric, for example, Zabbix unavailability, missing metrics, or network issues does not prevent the build from starting.

I also reduced the timeout to 3 seconds.
Although the main thread is no longer blocked, Buildbot’s BuildRequestDistributor will not proceed to the next builder’s build request until canStartBuild has completed. Reducing the timeout prevents unnecessary delays in processing the build request queue

@RazvanLiviuVarzaru RazvanLiviuVarzaru force-pushed the zbx-non-blocking branch 9 times, most recently from a480dfa to f3484ad Compare December 12, 2025 13:41
@RazvanLiviuVarzaru
Copy link
Collaborator Author

@fauust with this implementation you should have in the buildbot logs any errors related to buildbot -> zabbix calls :)

`Background`
The Zabbix Python module performs synchronous API calls, which block the reactor’s
event loop and can potentially cause the master to freeze.
Even worse, the configured timeout can block execution for up to 10 seconds.

After reviewing the official integration documentation (https://www.zabbix.com/integrations/python
), I concluded that no Twisted-compatible module exists that allows asynchronous API calls.
Writing such a module from scratch would be excessive for such a small component.
Since no Twisted integration is available,
I also see no benefit in switching to a different Zabbix module; the current one has served us well.

`Changes`
The solution is to run the synchronous code in a separate thread by passing `getMetric` to `deferToThread`.
This prevents the main thread from being blocked.
The critical section is wrapped in a `try/except` block to ensure that any failureinside `getMetric`,
for example, Zabbix unavailability, missing metrics, or network issues—does not prevent the build from starting.

I also reduced the `timeout` to 3 seconds.
Although the main thread is no longer blocked, Buildbot’s `BuildRequestDistributor`
will not proceed to the next builder’s `build request` until `canStartBuild` has completed.
Reducing the `timeout` prevents unnecessary delays in processing the build request queue
@fauust
Copy link
Collaborator

fauust commented Dec 12, 2025

Do you think that you can test this by adding the following on hz-monit:

diff --git a/nftables.conf b/nftables.conf
index 7615939..30ae173 100644
--- a/nftables.conf
+++ b/nftables.conf
@@ -25,6 +25,8 @@ table inet filter {
     
+    ip saddr 2a01:4f8:c17:905a::1 drop
+    ip saddr 78.47.143.59 drop
     tcp dport { 80, 443 } accept

And restarting the firewall sudo systemctl restart nftables ?
IPs are from BB DEV.

@RazvanLiviuVarzaru
Copy link
Collaborator Author

@fauust

A couple of tests for the s390x master in different simulated configurations.
Please see the cases below. The patch is working as intended.

[OK] Normal operation

Before if float(load) > 60:
added log.msg(f"WOLOLOWorker {worker.name} load is {load}") just to show normal operation.

from s390x-master-0.log got:

2025-12-15 07:26:50+0000 [-] WOLOLO Worker s390x-bbw1-docker-ubuntu-2204 load is 1.574385

[OK] No Zabbix Item

Added a non-existent item

    try:
        load = yield threads.deferToThread(
            getMetric, worker_name, "this_item_does_not_exist"
        )

To see if we get ZabbixNoItemFound and the build will proceed as a fallback.

Got

2025-12-15 07:32:58+0000 [-] Zabbix Error: Check configuration for ibm-s390x-ubuntu2404-03
	Traceback (most recent call last):
	  File "/opt/buildbot/.venv/lib/python3.9/site-packages/twisted/internet/defer.py", line 1792, in gotResult
	    _inlineCallbacks(r, gen, status, context)
	  File "/opt/buildbot/.venv/lib/python3.9/site-packages/twisted/internet/defer.py", line 1693, in _inlineCallbacks
	    result = context.run(
	  File "/opt/buildbot/.venv/lib/python3.9/site-packages/twisted/python/failure.py", line 518, in throwExceptionIntoGenerator
	    return g.throw(self.type, self.value, self.tb)
	  File "/srv/buildbot/master/utils.py", line 310, in canStartBuild
	    log.err(e, f"Zabbix Error: Check configuration for {worker_name}")
	--- <exception caught here> ---
	  File "/srv/buildbot/master/utils.py", line 306, in canStartBuild
	    load = yield threads.deferToThread(
	  File "/opt/buildbot/.venv/lib/python3.9/site-packages/twisted/python/threadpool.py", line 244, in inContext
	    result = inContext.theWork()  # type: ignore[attr-defined]
	  File "/opt/buildbot/.venv/lib/python3.9/site-packages/twisted/python/threadpool.py", line 260, in <lambda>
	    inContext.theWork = lambda: context.call(  # type: ignore[attr-defined]
	  File "/opt/buildbot/.venv/lib/python3.9/site-packages/twisted/python/context.py", line 117, in callWithContext
	    return self.currentContext().callWithContext(ctx, func, *args, **kw)
	  File "/opt/buildbot/.venv/lib/python3.9/site-packages/twisted/python/context.py", line 82, in callWithContext
	    return func(*args, **kw)
	  File "/srv/buildbot/master/utils.py", line 635, in getMetric
	    raise ZabbixNoItemFound
	utils.ZabbixNoItemFound: 

[OK] No host

Modified master-private to map a non existent Zabbix host. The build should proceed as a fallback.

private["worker_name_mapping"] = {
    "s390x-bbw1": "this_host_does_not_exist",

Got

2025-12-15 07:37:44+0000 [-] Zabbix Error: Check configuration for this_host_does_not_exist
	Traceback (most recent call last):
	  File "/opt/buildbot/.venv/lib/python3.9/site-packages/twisted/internet/defer.py", line 1792, in gotResult
	    _inlineCallbacks(r, gen, status, context)
	  File "/opt/buildbot/.venv/lib/python3.9/site-packages/twisted/internet/defer.py", line 1693, in _inlineCallbacks
	    result = context.run(
	  File "/opt/buildbot/.venv/lib/python3.9/site-packages/twisted/python/failure.py", line 518, in throwExceptionIntoGenerator
	    return g.throw(self.type, self.value, self.tb)
	  File "/srv/buildbot/master/utils.py", line 310, in canStartBuild
	    log.err(e, f"Zabbix Error: Check configuration for {worker_name}")
	--- <exception caught here> ---
	  File "/srv/buildbot/master/utils.py", line 306, in canStartBuild
	    load = yield threads.deferToThread(
	  File "/opt/buildbot/.venv/lib/python3.9/site-packages/twisted/python/threadpool.py", line 244, in inContext
	    result = inContext.theWork()  # type: ignore[attr-defined]
	  File "/opt/buildbot/.venv/lib/python3.9/site-packages/twisted/python/threadpool.py", line 260, in <lambda>
	    inContext.theWork = lambda: context.call(  # type: ignore[attr-defined]
	  File "/opt/buildbot/.venv/lib/python3.9/site-packages/twisted/python/context.py", line 117, in callWithContext
	    return self.currentContext().callWithContext(ctx, func, *args, **kw)
	  File "/opt/buildbot/.venv/lib/python3.9/site-packages/twisted/python/context.py", line 82, in callWithContext
	    return func(*args, **kw)
	  File "/srv/buildbot/master/utils.py", line 628, in getMetric
	    raise ZabbixNoHostFound
	utils.ZabbixNoHostFound: 

[OK] Network errors (Zabbix API)

I didn't modified the nftables on hz-monit but gave
buildbot a fake host. That code path should catch any network related err's.
The build should start as a fallback.

private["zabbix_server"] = "https://doesnotexist.mariadb.org"

Got:

2025-12-15 07:46:16+0000 [-] Zabbix Error: Unexpected error when fetching data for ibm-s390x-ubuntu2404-03
	Traceback (most recent call last):
	  File "/opt/buildbot/.venv/lib/python3.9/site-packages/twisted/internet/defer.py", line 1792, in gotResult
	    _inlineCallbacks(r, gen, status, context)
	  File "/opt/buildbot/.venv/lib/python3.9/site-packages/twisted/internet/defer.py", line 1693, in _inlineCallbacks
	    result = context.run(
	  File "/opt/buildbot/.venv/lib/python3.9/site-packages/twisted/python/failure.py", line 518, in throwExceptionIntoGenerator
	    return g.throw(self.type, self.value, self.tb)
	  File "/srv/buildbot/master/utils.py", line 316, in canStartBuild
	    log.err(
	--- <exception caught here> ---
	  File "/srv/buildbot/master/utils.py", line 306, in canStartBuild
	    load = yield threads.deferToThread(
	  File "/opt/buildbot/.venv/lib/python3.9/site-packages/twisted/python/threadpool.py", line 244, in inContext
	    result = inContext.theWork()  # type: ignore[attr-defined]
	  File "/opt/buildbot/.venv/lib/python3.9/site-packages/twisted/python/threadpool.py", line 260, in <lambda>
	    inContext.theWork = lambda: context.call(  # type: ignore[attr-defined]
	  File "/opt/buildbot/.venv/lib/python3.9/site-packages/twisted/python/context.py", line 117, in callWithContext
	    return self.currentContext().callWithContext(ctx, func, *args, **kw)
	  File "/opt/buildbot/.venv/lib/python3.9/site-packages/twisted/python/context.py", line 82, in callWithContext
	    return func(*args, **kw)
	  File "/srv/buildbot/master/utils.py", line 619, in getMetric
	    zapi.login(api_token=private_config["private"]["zabbix_token"])
	  File "/opt/buildbot/.venv/lib/python3.9/site-packages/pyzabbix/api.py", line 123, in login
	    self.version = Version(self.api_version())
	  File "/opt/buildbot/.venv/lib/python3.9/site-packages/pyzabbix/api.py", line 187, in api_version
	    return self.apiinfo.version()
	  File "/opt/buildbot/.venv/lib/python3.9/site-packages/pyzabbix/api.py", line 278, in __call__
	    return self._parent.do_request(self._method, args or kwargs)["result"]
	  File "/opt/buildbot/.venv/lib/python3.9/site-packages/pyzabbix/api.py", line 216, in do_request
	    resp = self.session.post(
	  File "/opt/buildbot/.venv/lib/python3.9/site-packages/requests/sessions.py", line 637, in post
	    return self.request("POST", url, data=data, json=json, **kwargs)
	  File "/opt/buildbot/.venv/lib/python3.9/site-packages/requests/sessions.py", line 589, in request
	    resp = self.send(prep, **send_kwargs)
	  File "/opt/buildbot/.venv/lib/python3.9/site-packages/requests/sessions.py", line 703, in send
	    r = adapter.send(request, **kwargs)
	  File "/opt/buildbot/.venv/lib/python3.9/site-packages/requests/adapters.py", line 700, in send
	    raise ConnectionError(e, request=request)
	requests.exceptions.ConnectionError: HTTPSConnectionPool(host='doesnotexist.mariadb.org', port=443): Max retries exceeded with url: /api_jsonrpc.php (Caused by NameResolutionError("<urllib3.connection.HTTPSConnection object at 0x7f5f701faee0>: Failed to resolve 'doesnotexist.mariadb.org' ([Errno -2] Name or service not known)"))

@fauust fauust merged commit 583be7c into MariaDB:dev Dec 15, 2025
3 checks passed
@RazvanLiviuVarzaru RazvanLiviuVarzaru changed the title Make Zabbix Calls Non-Blocking MDBF-1149 Make Zabbix Calls Non-Blocking Dec 15, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

2 participants