Skip to content

Conversation

kohlschuetter
Copy link
Contributor

Under some rare circumstances, especially when under I/O pressure, NFS clients such as those on macOS may try to send NFS requests with an invalid state id (seq=-1, other:ffffffffffffffffffffffff).

This usually happens after the NFS server has been restarted and lost information about the previous client state. Unfortunately, the macOS client does not recover from this situation and keeps making requests to the server. This slows down both server and client to a halt.

Send a "bad session" error upon occurring a seqid of -1. This lets the NFS client reconnect and the server establish the correct state.

Fixes: #160

@@ -238,6 +238,10 @@ public NFS4Client getClient(clientid4 clientid) throws StaleClientidException {
}

public NFS4Client getClientIdByStateId(stateid4 stateId) throws ChimeraNFSException {
if (stateId.seqid == -1) {
// invalid; force reconnect
throw new BadSessionException();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AFAIK, OSX doesn't support NFSv4.1, thus can't understand BadSession error. Probably BadClientid must be thrown. I will check what spec says about all-ones stateid

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

or probably NFS4ERR_EXPIRED ?

fwiw, the situation is a bit tricky to test. I was able to test the effectiveness of the change once the device was in state, but afterwards couldn't get it back in state.

@kofemann
Copy link
Member

All-one stateid is a special stateid that the client can present to 'anonymously' read the file:

https://datatracker.ietf.org/doc/html/rfc7530#section-9.1.4.3

   READ Bypass Stateid:  When "other" and seqid are both all ones, the
      stateid is a special READ bypass stateid.  When this value is used
      in WRITE or SETATTR, it is treated like the anonymous value.  When
      used in READ, the server MAY grant access, even if access would
      normally be denied to READ requests.

In the dCache code, we have special handling for them:

    if (Stateids.isStateLess(stateid)) {

        /*
         * As there was no open, we have to check  permissions.
         */
        if (context.getFs().access(context.getSubject(), inode, nfs4_prot.ACCESS4_READ) == 0) {
            throw new AccessException();
        }
       // perform read
    } else {
       // check stateid and perform read
    }

The same logic should be applied to OperationREAD

@kohlschuetter
Copy link
Contributor Author

kohlschuetter commented Jul 10, 2025

Interesting! So throwing an error code the macOS client does not understand is effectively an "access denied".

I wonder why we don't leave the decision to grant/deny access to the file system? Right now, the read/write API is "stateless" by default because we don't pass the stateid, and the NFS "OPEN" is translated merely to "lookup".

Also, we should probably catch ChimeraNFSExceptions that are NFSv4.1+ and translate them to NFSv4.0 error messages when the client is known to now support them.

@kofemann
Copy link
Member

Well, it's not only permissions, right? Another client might have opened the same file in write mode with SHARE_DENY, so the filesystem doesn't know about it.

READs and WRITEs are stateful and require OPEN. Stateless READs for be sounds like a protocol bug of a weird workaround. In reality, only the OSX client does it, which is broken and supports only NFSv4.0. The stack trace in #160 clearly demonstrates that OSX client uses it for WRITE. This is NFS spec violation. But there are many other problems with it as well, that Apple never tries to fix.

@kohlschuetter
Copy link
Contributor Author

kohlschuetter commented Jul 10, 2025

Yes, I was referring to VirtualFileSystem.read and .write not being stateful:

int read(Inode inode, ByteBuffer data, long offset) throws IOException;
WriteResult write(Inode inode, ByteBuffer data, long offset, StabilityLevel stabilityLevel) throws IOException;

if we had the stateid as a parameter (and corresponding open and close methods), then the VirtualFileSystem implementation can take care of it (but still, NFSv4StateHandler should also check if the stateid is "stateless")

@kofemann
Copy link
Member

File systems are not aware of nfs states. This can work only if NFSv4StateHandler is part of the file systems. Otherwise, I have no idea how NFS semantics, like delegation, pNFS layout, and locking, can be implemented. In pNFS case, reads and writes happen on a data server, which never sees OPEN.

@kohlschuetter
Copy link
Contributor Author

I think what I want is the ability for filesystems to track "OPEN" and "CLOSE", and associate "READ" and "WRITE" operations with the corresponding OPEN/CLOSE stateids. This is what typical POSIX filesystems do, and it will greatly help in my use case.

Right now, VirtualFileSystem implementations get "read" and "write" calls to an Inode out of the blue, meaning they can't properly keep file handles open for the duration of the open-read/write-close session on the client side. This may severely impact performance as one would have to open/close the underlying resource for each read/write request (or infer this state via some suboptimal caching strategy). It may also pose a problem with respect to shared accesses on the same file.

I don't know how pNFS handles this, but my guess is the "stateid4" is properly communicated between open and read/write and across involved systems.

@kofemann
Copy link
Member

NFS4 squashes multiple OPENs by the same clientid+clientowner and uses a single CLOSE to close all of them. Handling all this in the VFS turns it into a state manager.

@kohlschuetter
Copy link
Contributor Author

State manager: Yeah I think that's what I want for my VFS. Those VFS that don't need it don't have to worry about.
We should continue that part of the discussion in #162

@kohlschuetter kohlschuetter changed the title core: Force bad-session error upon seqid=-1 core: Return BAD_STATEID for NFSv4.0 special "stateless" stateids Jul 15, 2025
Under some rare circumstances, especially when under I/O pressure, NFS
clients such as those on macOS may try to send NFS requests with an
invalid state id (seq=-1, other:ffffffffffffffffffffffff; or seq:0,
other:0).

This usually happens after the NFS server has been restarted and lost
information about the previous client state. Unfortunately, the macOS
client does not recover from this situation and keeps making requests to
the server. This slows down both server and client to a halt.

Send a "bad stateid" error upon occurring such a sequence. This lets the
NFS client reconnect and re-establish the correct state with the server.

Fixes: dCache#160
Related: https://datatracker.ietf.org/doc/html/rfc7530#section-9.1.4.3
Signed-off-by: Christian Kohlschütter <[email protected]>
@kohlschuetter
Copy link
Contributor Author

READs and WRITEs are stateful and require OPEN. Stateless READs for be sounds like a protocol bug of a weird workaround. In reality, only the OSX client does it, which is broken and supports only NFSv4.0. The stack trace in #160 clearly demonstrates that OSX client uses it for WRITE. This is NFS spec violation. But there are many other problems with it as well, that Apple never tries to fix.

I think it's OK by spec (RFC 7530 section 9.1.4.3), but NFS4.0 servers do not need to support these stateIds.
If really only the macOS NFS client does this then the fix is obvious.

I've revised my patch to check Stateids.isStateLess(stateid) and throw a BadStateidException.

@kofemann
Copy link
Member

addressed in commit c8d57f2

@kofemann kofemann closed this Jul 16, 2025
@kohlschuetter
Copy link
Contributor Author

@kofemann Will c8d57f2 get the macOS client back in a defined state or will all subsequent calls be made as stateless requests?

@kofemann
Copy link
Member

I don't know what OSX client will do. However, the server should now handle both cases. So the client can mix open and anonymous state IDs.

@kohlschuetter
Copy link
Contributor Author

The spec allows us to reject such calls, I don't see the upside of supporting them.

We should get the server back in state as this situation is not normal, that's why I'm hesitant to just keep supporting stateless stateids. It also makes it impossible to properly order these requests in a log because there is no seqid other than "-1" (or 0)

@kofemann
Copy link
Member

Reject with NFS4ERR_ACCESS, not with a NFS4ERR_BAD_STATEID

In general, I would prefer to remove NFSv4.0 support, but some people still use it.

@kohlschuetter
Copy link
Contributor Author

kohlschuetter commented Jul 16, 2025

Reject with NFS4ERR_ACCESS, not with a NFS4ERR_BAD_STATEID

NFS4ERR_ACCESS will not cause a reconnect, as explained in macOS NFS' nfs_mount_state_error_should_restart https://github.com/apple-oss-distributions/NFS/blob/NFS-327.120.3/kext/nfs4_vnops.c#L1863L1880

The spec requires to return NFS4ERR_BAD_STATEID in all undefined cases. While it does not indicate what response to use for a defined, yet unsupported case, it is reasonable to also use it.

In general, I would prefer to remove NFSv4.0 support, but some people still use it.

Since macOS only supports NFSv4.0, this would immediately make me fork nfs4j. And I know the one or the other macOS-specific project that will probably do the same.

This might be better from an agility standpoint, but my hope is that bringing these changes back to upstream will benefit the project as a whole.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

NFS on macOS may not terminate old connections after server restart
2 participants