Skip to content

Conversation

kohlschuetter
Copy link
Contributor

Currently, read and write operations to the VirtualFileSystem are "stateless" in the sense that there is no corresponding "open" and "close" exposed. At the NFS4 level, this exists, and nfs4j tracks these internally already (stateid4.opaque byte sequences), but they're not exposed to VirtualFileSystem.

This is a problem not only for scenarios where an explicit open/close is required, but even for currently implemented scenarios it's a performance problem: PseudoFs calls checkAccess(inode, ACE4_READ_DATA/ACE4_WRITE_DATA) for each read/write, which in turn triggers a Stat for each read/write operation.

This incurs an unnecessary performance penalty of more than 20%.

The access check is unnecessary because a successful check upon "open" will be valid for the entire lifetime of the call, just as it is when opening a file descriptor in POSIX.

In order to properly track granted access, we can leverage data stored in stateid4 and the existing work in FileTracker.

Since stateid4 exists in NFSv4 only, we make sure there is no performance degradation in NFSv3 and no exposure of NFSv4-only internals in the VirtualFileSystem API:

  1. Expose "Opaque" stateids to VirtualFileSystem read/write/commit; add open/close for custom FS interop.
  2. Rework stateid4, which currently conflates stateid and seqid -- directly reference stateid4.other Opaque when possible.
  3. In PseudoFS, check SharedAccess state from FileTracker and use this to determine access during read/write when available.

With the change, for sustained reads I'm seeing 854 MB/s instead of 712 MB/s on LocalFileSystem, and 2000 MB/s instead of 1666 MB/s on a custom implementation.

We sometimes want to check a certain byte or long as part of an Opaque.

Let's add helper methods so we don't have to convert to byte[].

Signed-off-by: Christian Kohlschütter <[email protected]>
When specifying lambdas for cleanup operations, it is good practice to
reduce the scope of instances referenced ("pinned") in the lambda scope.

Right now, we reference unnecessary objects, adding to heap
fragmentation.

Reduce the scope of referenced objects by accessing the required
references before entering the dispose listener lambda.

Signed-off-by: Christian Kohlschütter <[email protected]>
Currently, read and write operations to the VirtualFileSystem are
"stateless" in the sense that there is no corresponding "open" and
"close" exposed. At the NFS4 level, this exists, and nfs4j tracks these
internally already (`stateid4.opaque` byte sequences), but they're not
exposed to `VirtualFileSystem`.

This is a problem not only for scenarios where an explicit open/close is
required, but even for currently implemented scenarios it's a
performance problem: PseudoFs calls `checkAccess(inode,
ACE4_READ_DATA/ACE4_WRITE_DATA)` for each `read`/`write`, which in turn
triggers a Stat for each read/write operation.

This incurs an unnecessary performance penalty of more than 20%.

The access check is unnecessary because a successful check upon "open"
will be valid for the entire lifetime of the call, just as it is when
opening a file descriptor in POSIX.

In order to properly track granted access, we can leverage data stored
in stateid4 and the existing work in FileTracker.

Since stateid4 exists in NFSv4 only, we make sure there is no
performance degradation in NFSv3 and no exposure of NFSv4-only internals
in the VirtualFileSystem API:

1. Expose "Opaque" stateids to VirtualFileSystem read/write/commit; add
   open/close for custom FS interop.
2. Rework stateid4, which currently conflates stateid and seqid --
   directly reference stateid4.other Opaque when possible.
3. In PseudoFS, check SharedAccess state from FileTracker and use this
   to determine access during read/write when available.

With the change, for sustained reads I'm seeing 854 MB/s instead of 712
MB/s on LocalFileSystem, and 2000 MB/s instead of 1666 MB/s on a custom
implementation.

Signed-off-by: Christian Kohlschütter <[email protected]>
context.getFs() may be null upon close...

Signed-off-by: Christian Kohlschütter <[email protected]>
Apparently stateid4 needs to be serialized for dcache, so let's make
sure this works again.

Signed-off-by: Christian Kohlschütter <[email protected]>
As per reviewer comment, this may still be used somewhere.

Signed-off-by: Christian Kohlschütter <[email protected]>
@kofemann
Copy link
Member

There are too many changes. Some make sense, some don't. Cleanups, like new stateid => stateid.clone(), are good. byteAt should be stateid.type(). The vfs layer is extra designed to be thin. If OPEN/READ/WRITE/CLOSE must pass extra information to the custom filesystem, then those operations can be overridden.

@kohlschuetter
Copy link
Contributor Author

kohlschuetter commented Jul 14, 2025

I'm not sure I follow.

What can be removed? We're not going to expose byte[] or stateid4 directly to the VFS API (and wrapping them into Opaque for each call also doesn't make much sense), so deep cuts have to be made.

How can operations be overridden by the VFS if that information is not exposed otherwise?

Which changes don't make sense?

(Maybe if you review the changes commit by commit it's easier to follow)

With the introduction of stateid4.getType, let's not make any further
assumptions about how the type is stored in the Opaque.

Add a static getType variant in stateid4 and call that instead of
Opaque.byteAt.

Signed-off-by: Christian Kohlschütter <[email protected]>
... to reduce the amount of changes in dCache#162

Signed-off-by: Christian Kohlschütter <[email protected]>
to reduce the amount of changes in dCache#162

Signed-off-by: Christian Kohlschütter <[email protected]>
to reduce the amount of changes in dCache#162

Signed-off-by: Christian Kohlschütter <[email protected]>
@kohlschuetter
Copy link
Contributor Author

@kofemann I've cut a few changes, but that's as much as I think we can cut.

kohlschuetter added a commit to kohlschuetter/nfs4j that referenced this pull request Jul 14, 2025
to reduce the amount of changes in dCache#162

We may also need the reference to Inode should we decide to call
Vfs.close from here.

Signed-off-by: Christian Kohlschütter <[email protected]>
to reduce the amount of changes in dCache#162

We may also need the reference to Inode should we decide to call
Vfs.close from here.

Signed-off-by: Christian Kohlschütter <[email protected]>
to reduce the amount of changes in dCache#162

Signed-off-by: Christian Kohlschütter <[email protected]>
@kohlschuetter
Copy link
Contributor Author

Follow-up on #163

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants