core: Return BAD_STATEID for NFSv4.0 special "stateless" stateids #161

kohlschuetter · 2025-07-08T18:37:54Z

Under some rare circumstances, especially when under I/O pressure, NFS clients such as those on macOS may try to send NFS requests with an invalid state id (seq=-1, other:ffffffffffffffffffffffff).

This usually happens after the NFS server has been restarted and lost information about the previous client state. Unfortunately, the macOS client does not recover from this situation and keeps making requests to the server. This slows down both server and client to a halt.

Send a "bad session" error upon occurring a seqid of -1. This lets the NFS client reconnect and the server establish the correct state.

Fixes: #160

kofemann · 2025-07-09T15:18:56Z

core/src/main/java/org/dcache/nfs/v4/NFSv4StateHandler.java

@@ -238,6 +238,10 @@ public NFS4Client getClient(clientid4 clientid) throws StaleClientidException {
    }

    public NFS4Client getClientIdByStateId(stateid4 stateId) throws ChimeraNFSException {
+        if (stateId.seqid == -1) {
+            // invalid; force reconnect
+            throw new BadSessionException();


AFAIK, OSX doesn't support NFSv4.1, thus can't understand BadSession error. Probably BadClientid must be thrown. I will check what spec says about all-ones stateid

or probably NFS4ERR_EXPIRED ?

fwiw, the situation is a bit tricky to test. I was able to test the effectiveness of the change once the device was in state, but afterwards couldn't get it back in state.

kofemann · 2025-07-10T07:09:01Z

All-one stateid is a special stateid that the client can present to 'anonymously' read the file:

https://datatracker.ietf.org/doc/html/rfc7530#section-9.1.4.3

   READ Bypass Stateid:  When "other" and seqid are both all ones, the
      stateid is a special READ bypass stateid.  When this value is used
      in WRITE or SETATTR, it is treated like the anonymous value.  When
      used in READ, the server MAY grant access, even if access would
      normally be denied to READ requests.

In the dCache code, we have special handling for them:

    if (Stateids.isStateLess(stateid)) {

        /*
         * As there was no open, we have to check  permissions.
         */
        if (context.getFs().access(context.getSubject(), inode, nfs4_prot.ACCESS4_READ) == 0) {
            throw new AccessException();
        }
       // perform read
    } else {
       // check stateid and perform read
    }

The same logic should be applied to OperationREAD

kohlschuetter · 2025-07-10T11:55:26Z

Interesting! So throwing an error code the macOS client does not understand is effectively an "access denied".

I wonder why we don't leave the decision to grant/deny access to the file system? Right now, the read/write API is "stateless" by default because we don't pass the stateid, and the NFS "OPEN" is translated merely to "lookup".

Also, we should probably catch ChimeraNFSExceptions that are NFSv4.1+ and translate them to NFSv4.0 error messages when the client is known to now support them.

kofemann · 2025-07-10T12:59:08Z

Well, it's not only permissions, right? Another client might have opened the same file in write mode with SHARE_DENY, so the filesystem doesn't know about it.

READs and WRITEs are stateful and require OPEN. Stateless READs for be sounds like a protocol bug of a weird workaround. In reality, only the OSX client does it, which is broken and supports only NFSv4.0. The stack trace in #160 clearly demonstrates that OSX client uses it for WRITE. This is NFS spec violation. But there are many other problems with it as well, that Apple never tries to fix.

kohlschuetter · 2025-07-10T13:08:05Z

Yes, I was referring to VirtualFileSystem.read and .write not being stateful:

int read(Inode inode, ByteBuffer data, long offset) throws IOException;
WriteResult write(Inode inode, ByteBuffer data, long offset, StabilityLevel stabilityLevel) throws IOException;

if we had the stateid as a parameter (and corresponding open and close methods), then the VirtualFileSystem implementation can take care of it (but still, NFSv4StateHandler should also check if the stateid is "stateless")

kofemann · 2025-07-10T16:39:15Z

File systems are not aware of nfs states. This can work only if NFSv4StateHandler is part of the file systems. Otherwise, I have no idea how NFS semantics, like delegation, pNFS layout, and locking, can be implemented. In pNFS case, reads and writes happen on a data server, which never sees OPEN.

kohlschuetter · 2025-07-10T19:07:48Z

I think what I want is the ability for filesystems to track "OPEN" and "CLOSE", and associate "READ" and "WRITE" operations with the corresponding OPEN/CLOSE stateids. This is what typical POSIX filesystems do, and it will greatly help in my use case.

Right now, VirtualFileSystem implementations get "read" and "write" calls to an Inode out of the blue, meaning they can't properly keep file handles open for the duration of the open-read/write-close session on the client side. This may severely impact performance as one would have to open/close the underlying resource for each read/write request (or infer this state via some suboptimal caching strategy). It may also pose a problem with respect to shared accesses on the same file.

I don't know how pNFS handles this, but my guess is the "stateid4" is properly communicated between open and read/write and across involved systems.

kofemann · 2025-07-14T14:10:47Z

NFS4 squashes multiple OPENs by the same clientid+clientowner and uses a single CLOSE to close all of them. Handling all this in the VFS turns it into a state manager.

kohlschuetter · 2025-07-14T15:27:49Z

State manager: Yeah I think that's what I want for my VFS. Those VFS that don't need it don't have to worry about.
We should continue that part of the discussion in #162

Under some rare circumstances, especially when under I/O pressure, NFS clients such as those on macOS may try to send NFS requests with an invalid state id (seq=-1, other:ffffffffffffffffffffffff; or seq:0, other:0). This usually happens after the NFS server has been restarted and lost information about the previous client state. Unfortunately, the macOS client does not recover from this situation and keeps making requests to the server. This slows down both server and client to a halt. Send a "bad stateid" error upon occurring such a sequence. This lets the NFS client reconnect and re-establish the correct state with the server. Fixes: dCache#160 Related: https://datatracker.ietf.org/doc/html/rfc7530#section-9.1.4.3 Signed-off-by: Christian Kohlschütter <[email protected]>

kohlschuetter · 2025-07-15T17:58:47Z

READs and WRITEs are stateful and require OPEN. Stateless READs for be sounds like a protocol bug of a weird workaround. In reality, only the OSX client does it, which is broken and supports only NFSv4.0. The stack trace in #160 clearly demonstrates that OSX client uses it for WRITE. This is NFS spec violation. But there are many other problems with it as well, that Apple never tries to fix.

I think it's OK by spec (RFC 7530 section 9.1.4.3), but NFS4.0 servers do not need to support these stateIds.
If really only the macOS NFS client does this then the fix is obvious.

I've revised my patch to check Stateids.isStateLess(stateid) and throw a BadStateidException.

kofemann · 2025-07-16T11:06:44Z

addressed in commit c8d57f2

kohlschuetter · 2025-07-16T14:05:31Z

@kofemann Will c8d57f2 get the macOS client back in a defined state or will all subsequent calls be made as stateless requests?

kofemann · 2025-07-16T14:26:33Z

I don't know what OSX client will do. However, the server should now handle both cases. So the client can mix open and anonymous state IDs.

kohlschuetter · 2025-07-16T14:39:54Z

The spec allows us to reject such calls, I don't see the upside of supporting them.

We should get the server back in state as this situation is not normal, that's why I'm hesitant to just keep supporting stateless stateids. It also makes it impossible to properly order these requests in a log because there is no seqid other than "-1" (or 0)

kofemann · 2025-07-16T15:01:37Z

Reject with NFS4ERR_ACCESS, not with a NFS4ERR_BAD_STATEID

In general, I would prefer to remove NFSv4.0 support, but some people still use it.

kohlschuetter · 2025-07-16T15:57:47Z

Reject with NFS4ERR_ACCESS, not with a NFS4ERR_BAD_STATEID

NFS4ERR_ACCESS will not cause a reconnect, as explained in macOS NFS' nfs_mount_state_error_should_restart https://github.com/apple-oss-distributions/NFS/blob/NFS-327.120.3/kext/nfs4_vnops.c#L1863L1880

The spec requires to return NFS4ERR_BAD_STATEID in all undefined cases. While it does not indicate what response to use for a defined, yet unsupported case, it is reasonable to also use it.

In general, I would prefer to remove NFSv4.0 support, but some people still use it.

Since macOS only supports NFSv4.0, this would immediately make me fork nfs4j. And I know the one or the other macOS-specific project that will probably do the same.

This might be better from an agility standpoint, but my hope is that bringing these changes back to upstream will benefit the project as a whole.

kofemann reviewed Jul 9, 2025

View reviewed changes

kohlschuetter force-pushed the ck/issue/160 branch from 97c4bec to d899322 Compare July 15, 2025 17:41

kohlschuetter changed the title ~~core: Force bad-session error upon seqid=-1~~ core: Return BAD_STATEID for NFSv4.0 special "stateless" stateids Jul 15, 2025

kohlschuetter force-pushed the ck/issue/160 branch from d899322 to 726e545 Compare July 15, 2025 17:46

kofemann closed this Jul 16, 2025

kohlschuetter mentioned this pull request Jul 18, 2025

NFS on macOS may not terminate old connections after server restart #160

Closed

core: Return BAD_STATEID for NFSv4.0 special "stateless" stateids #161

core: Return BAD_STATEID for NFSv4.0 special "stateless" stateids #161

Uh oh!

Conversation

kohlschuetter commented Jul 8, 2025

Uh oh!

kofemann Jul 9, 2025

Choose a reason for hiding this comment

Uh oh!

kohlschuetter Jul 9, 2025

Choose a reason for hiding this comment

Uh oh!

kofemann commented Jul 10, 2025

Uh oh!

kohlschuetter commented Jul 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

kofemann commented Jul 10, 2025

Uh oh!

kohlschuetter commented Jul 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

kofemann commented Jul 10, 2025

Uh oh!

kohlschuetter commented Jul 10, 2025

Uh oh!

kofemann commented Jul 14, 2025

Uh oh!

kohlschuetter commented Jul 14, 2025

Uh oh!

kohlschuetter commented Jul 15, 2025

Uh oh!

kofemann commented Jul 16, 2025

Uh oh!

kohlschuetter commented Jul 16, 2025

Uh oh!

kofemann commented Jul 16, 2025

Uh oh!

kohlschuetter commented Jul 16, 2025

Uh oh!

kofemann commented Jul 16, 2025

Uh oh!

kohlschuetter commented Jul 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

kohlschuetter commented Jul 10, 2025 •

edited

Loading

kohlschuetter commented Jul 10, 2025 •

edited

Loading

kohlschuetter commented Jul 16, 2025 •

edited

Loading