-
Notifications
You must be signed in to change notification settings - Fork 39
Description
I am forwarding an issue from Red Hat internal Jira that I was debugging in May 2024 but that I have not resolved yet.
Current Behavior:
An OSH task hanged indefinitely on an OSH worker while the child process was blocked on write to stdout/stderr. The kobo worker logging thread was blocked indeifintely on TLS handshake:
(gdb) py-bt
Traceback (most recent call first):
File "/usr/lib64/python3.9/ssl.py", line 1343, in do_handshake
self._sslobj.do_handshake()
File "/usr/lib64/python3.9/ssl.py", line 1074, in _create
self.do_handshake()
File "/usr/lib64/python3.9/ssl.py", line 501, in wrap_socket
return self.sslsocket_class._create(
File "/usr/lib64/python3.9/http/client.py", line 1454, in connect
self.sock = self._context.wrap_socket(self.sock,
File "/usr/lib64/python3.9/http/client.py", line 980, in send
self.connect()
File "/usr/lib64/python3.9/http/client.py", line 1040, in _send_output
self.send(msg)
File "/usr/lib64/python3.9/http/client.py", line 1280, in endheaders
self._send_output(message_body, encode_chunked=encode_chunked)
File "/usr/lib64/python3.9/xmlrpc/client.py", line 1321, in send_content
connection.endheaders(request_body)
File "/usr/lib64/python3.9/xmlrpc/client.py", line 1291, in send_request
self.send_content(connection, request_body)
File "/usr/lib/python3.9/site-packages/kobo/xmlrpc.py", line 369, in _single_request3
h = self.send_request(host, handler, request_body, verbose)
File "/usr/lib64/python3.9/xmlrpc/client.py", line 1166, in request
return self.single_request(host, handler, request_body, verbose)
File "/usr/lib/python3.9/site-packages/kobo/xmlrpc.py", line 477, in request
result = transport_class.request(self, *args, **kwargs)
File "/usr/lib64/python3.9/xmlrpc/client.py", line 1464, in __request
response = self.__transport.request(
File "/usr/lib64/python3.9/xmlrpc/client.py", line 1122, in __call__
return self.__send(self.__name, args)
File "/usr/lib/python3.9/site-packages/kobo/client/__init__.py", line 510, in upload_task_log
self._hub.worker.upload_task_log(task_id, remote_file_name, mode, chunk_start, chunk_len, chunk_checksum, encoded_chunk)
File "/usr/lib/python3.9/site-packages/kobo/worker/logger.py", line 65, in run
self._hub.upload_task_log(BytesIO(self._send_data), self._task_id, "stdout.log", append=True)
File "/usr/lib64/python3.9/threading.py", line 980, in _bootstrap_inner
self.run()
File "/usr/lib64/python3.9/threading.py", line 937, in _bootstrap
self._bootstrap_inner()The Python code where the thread was blocked seems to support a timeout to be set for the TLS handshake but the kobo/xmlrpc stack does not set it:
(gdb) py-list
1338 self._check_connected()
1339 timeout = self.gettimeout()
1340 try:
1341 if timeout == 0.0 and block:
1342 self.settimeout(None)
>1343 self._sslobj.do_handshake()
1344 finally:
1345 self.settimeout(timeout)
1346
1347 def _real_connect(self, addr, connect_ex):
1348 if self.server_side:Expected Behavior:
The task should either fail or stop transferring the captured output to the hub but it should not hang indefinitely.
Steps to reproduce:
I am not sure how it happened but I suspect it was caused by an intermittent network issue.
Impact Statement:
Such OSH tasks unnecessarily block the OSH scanning queue.