Skip to content

Concurrency issue causes crash in SLIRP code #48

@rhalkyard

Description

@rhalkyard

Arculator's SLIRP interface appears to have a concurrency issue - the queue structure that it uses is not inherently thread-safe, and it is possible for the main Arculator thread to remove a packet from the queue before the SLIRP thread has finished inserting it, causing it to read bogus data and usually crash.

Any long-running heavy network activity should be able to replicate the issue, but the particular case that caused this for me was NFS activity on an emulated A440/1 under RISCiX. The chance of it occurring is still pretty low, but when reading a large file over NFS, I would usually hit it within 5 or so minutes.

I hacked a mutex into podules/common/net/net_slirp.c to serialise queue insertions and removals, and while I'm not sure if this is the only place where locking is necessary, it seems to have resolved the crashes I was seeing. Diff is below:

diff --git a/podules/common/net/net_slirp.c b/podules/common/net/net_slirp.c
index 58da9e6..ff1c05b 100644
--- a/podules/common/net/net_slirp.c
+++ b/podules/common/net/net_slirp.c
@@ -13,6 +13,8 @@ typedef struct net_slirp_t
        queueADT slirpq;
 } net_slirp_t;

+static pthread_mutex_t queue_mutex = PTHREAD_MUTEX_INITIALIZER;
+
 static void *slirp_poll_thread(void *p)
 {
        net_t *net = p;
@@ -47,7 +49,9 @@ static void *slirp_poll_thread(void *p)
 static int net_slirp_read(net_t *net, packet_t *packet)
 {
        net_slirp_t *slirp = net->p;
+       int ret = -1;

+       pthread_mutex_lock(&queue_mutex);
        if (QueuePeek(slirp->slirpq) > 0)
        {
                struct queuepacket *qp = QueueDelete(slirp->slirpq);
@@ -56,10 +60,11 @@ static int net_slirp_read(net_t *net, packet_t *packet)
                packet->data = qp->data;
                packet->len = qp->len;

-               return 0;
+               ret = 0;
        }

-       return -1;
+       pthread_mutex_unlock(&mutex);
+       return ret;
 }

 static void net_slirp_write(net_t *net, uint8_t *data, int size)
@@ -80,7 +85,9 @@ void slirp_output(const unsigned char *pkt, int pkt_len)
        p = (struct queuepacket *)malloc(sizeof(struct queuepacket));
        p->len = pkt_len;
        memcpy(p->data, pkt, pkt_len);
+       pthread_mutex_lock(&queue_mutex);
        QueueEnter(*g_slirpq, p);
+       pthread_mutex_unlock(&queue_mutex);
 //        aeh54_log("slirp_output %d @%d\n",pkt_len,p);
 }
 int slirp_can_output(void)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions