-
Notifications
You must be signed in to change notification settings - Fork 403
Description
We have observed the following problem:
When using the mssim TCTI, “Resource temporarily unavailable” errors occur regularly. Often 2 out of 3 runs will fail!
For example, it looks like this:
tpm2_load -T 'mssim:host=192.168.178.47,port=2323' -C 0x81000006 -P 12345 -u ecc.pub -r ecc.priv -c ecc.ctx
Error message: WARNING:tcti:src/util-io/io.c:66:read_all() read on fd 3 failed with errno 11: Resource temporarily unavailable
ERROR:esys:src/tss2-esys/api/Esys_ContextSave.c:251:Esys_ContextSave_Finish() Received a non-TPM Error
ERROR:esys:src/tss2-esys/api/Esys_ContextSave.c:92:Esys_ContextSave() Esys Finish ErrorCode (0x000a000a)
ERROR: Esys_ContextSave(0xA000A) - tcti:IO failure
Note that “Resource temporarily unavailable” comes down to an EAGAIN error (i.e. errno 11).
I think the reason why this can happen is the way how tcti_mssim_receive() is currently implemented: It will first poll() the network socket until it becomes "ready for reading", and once this has happened, it will attempt to recv() the full response message. This is actually wrapped in the socket_recv_buf() function, which just calls the read_all() function.
There are, to my understanding, at least two ways how this can go wrong:
-
If
poll()signals that the network socket is "ready for reading", it means that some bytes can be read now, but it does not guarantee that the full message is available yet. Nonetheless, the subsequentread_all()always attempts to read the full message, by repeatedly callingrecv(). This will fail, if the full message cannot be read right now. Specifically, theread_all()function will fail with anEAGAINerror (instead of blocking and waiting), if insufficient data is available at the moment – because the socket was opened inO_NONBLOCKmode. And that is, I suppose, precisely what we are seeing. -
At least on the Linux platform, the
poll()andselect()functions may cause a so-called "spurious readiness notification". This means that a socket may be reported as "ready for reading" but then the subsequentread()may still block because the socket is not actually ready. InO_NONBLOCKmode,recv()orread()will fail withEAGAINin this situation.For reference, please see the "BUGS" sections at:
At the core of the problem is that the TEMP_RETRY macro does not currently handle the EAGAIN (and EWOULDBLOCK) errors.
At least on the Linux platform. It appears there is some handling on FreeBSD already 🤔
The following patch contains a simple workaround that has fixed the “Resource temporarily unavailable” problem for us:
diff --git a/src/util-io/io.h b/src/util-io/io.h
index 595177d3..dc9a35fa 100644
--- a/src/util-io/io.h
+++ b/src/util-io/io.h
@@ -44,11 +44,12 @@ typedef SSIZE_T ssize_t;
dest =__ret; }
#else
#define TEMP_RETRY(dest, exp) \
-{ int __ret; \
+{ int __ret, __err = 0; \
do { \
+ if (__err > 0) usleep(100U); \
__ret = exp; \
- } while (__ret == SOCKET_ERROR && errno == EINTR); \
- ((dest)) =__ret; }
+ } while ((__ret == SOCKET_ERROR) && (errno == EINTR || errno == EAGAIN || errno == EWOULDBLOCK) && (++__err < 32767)); \
+ ((dest)) = __ret; }
#endif
#ifdef __cplusplusI think the preferable solution would be going back to polling when it turns out that no or insufficient data is available for reading, while keeping the partial message that has already been read. But that would probably require some more significant changes.
Regards.