Skip to content

aio: message not associated on FreeBSD #2226

@interkosmos

Description

@interkosmos

Describe the bug
The setup consists of a server program and multiple clients that connect through TCP transport using the request/reply protocol (reqrep0). Each client (synchronously) sends a single request message to the server. The server spins up a number of asynchronous workers to respond to the requests. The program in which the bug is observed is written in Fortran and uses interface bindings to libnng.

On FreeBSD 15, the function nng_aio_get_msg() sometimes returns a NULL pointer if multiple messages are received in parallel. On Linux (Debian 12), the message pointer is always associated instead (as expected). Additionally, it is required to call nng_sleep_aio() with an arbitrary delay after nng_aio_result() on FreeBSD or no message will be received by the server and the clients run into a timeout.

Expected behavior
The function nng_aio_get_msg() should return the received message.

Actual Behavior
The function nng_aio_get_msg() may return NULL on FreeBSD.

To Reproduce
The Fortran code in which the bug occurs is based on the structure of the server_cb callback procedure from example demo/async/server.c:

void server_cb(void *arg)
{
    struct work *work = arg;
    nng_msg *    msg;
    int          rv;
    uint32_t     when;

    switch (work->state) {
    case INIT:
        work->state = RECV;
        nng_ctx_recv(work->ctx, work->aio);
        break;
    case RECV:
        if ((rv = nng_aio_result(work->aio)) != 0) {
            fatal("nng_ctx_recv", rv);
        }
        nng_sleep_aio(0, work->aio);      /* <-- somehow necessary on FreeBSD */
        msg = nng_aio_get_msg(work->aio); /* <-- the returned pointer is NULL */
        if ((rv = nng_msg_trim_u32(msg, &when)) != 0) {
            nng_msg_free(msg);
            nng_ctx_recv(work->ctx, work->aio);
            return;
        }
        work->msg   = msg;
        work->state = WAIT;
        nng_sleep_aio(when, work->aio);
        break;
    case WAIT:
        nng_aio_set_msg(work->aio, work->msg);
        work->msg   = NULL;
        work->state = SEND;
        nng_ctx_send(work->ctx, work->aio);
        break;
    case SEND:
        if ((rv = nng_aio_result(work->aio)) != 0) {
            nng_msg_free(work->msg);
            fatal("nng_ctx_send", rv);
        }
        work->state = RECV;
        nng_ctx_recv(work->ctx, work->aio);
        break;
    default:
        fatal("bad state!", NNG_ESTATE);
        break;
    }
} 

On FreeBSD, the message pointer returned by nng_aio_get_msg() may be NULL if multiple clients send messages to the server in parallel. When tested with 10 clients, either all messages are received successfully or approx. 1/3 to 1/2 are lost. The bug occurs independently of the number of parallel aio workers started by the server (tested with 10 and 100 workers). The bug is present both with GCC (gcc, gfortran) and LLVM (clang, flang).

Environment Details

  • NNG v1.11.0
  • FreeBSD 15 (amd64)
  • GCC 15, LLVM 21
  • libnng.so

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions