core: Improve access tracking performance for read/write #162

kohlschuetter · 2025-07-14T14:10:34Z

Currently, read and write operations to the VirtualFileSystem are "stateless" in the sense that there is no corresponding "open" and "close" exposed. At the NFS4 level, this exists, and nfs4j tracks these internally already (stateid4.opaque byte sequences), but they're not exposed to VirtualFileSystem.

This is a problem not only for scenarios where an explicit open/close is required, but even for currently implemented scenarios it's a performance problem: PseudoFs calls checkAccess(inode, ACE4_READ_DATA/ACE4_WRITE_DATA) for each read/write, which in turn triggers a Stat for each read/write operation.

This incurs an unnecessary performance penalty of more than 20%.

The access check is unnecessary because a successful check upon "open" will be valid for the entire lifetime of the call, just as it is when opening a file descriptor in POSIX.

In order to properly track granted access, we can leverage data stored in stateid4 and the existing work in FileTracker.

Since stateid4 exists in NFSv4 only, we make sure there is no performance degradation in NFSv3 and no exposure of NFSv4-only internals in the VirtualFileSystem API:

Expose "Opaque" stateids to VirtualFileSystem read/write/commit; add open/close for custom FS interop.
Rework stateid4, which currently conflates stateid and seqid -- directly reference stateid4.other Opaque when possible.
In PseudoFS, check SharedAccess state from FileTracker and use this to determine access during read/write when available.

With the change, for sustained reads I'm seeing 854 MB/s instead of 712 MB/s on LocalFileSystem, and 2000 MB/s instead of 1666 MB/s on a custom implementation.

We sometimes want to check a certain byte or long as part of an Opaque. Let's add helper methods so we don't have to convert to byte[]. Signed-off-by: Christian Kohlschütter <[email protected]>

When specifying lambdas for cleanup operations, it is good practice to reduce the scope of instances referenced ("pinned") in the lambda scope. Right now, we reference unnecessary objects, adding to heap fragmentation. Reduce the scope of referenced objects by accessing the required references before entering the dispose listener lambda. Signed-off-by: Christian Kohlschütter <[email protected]>

Currently, read and write operations to the VirtualFileSystem are "stateless" in the sense that there is no corresponding "open" and "close" exposed. At the NFS4 level, this exists, and nfs4j tracks these internally already (`stateid4.opaque` byte sequences), but they're not exposed to `VirtualFileSystem`. This is a problem not only for scenarios where an explicit open/close is required, but even for currently implemented scenarios it's a performance problem: PseudoFs calls `checkAccess(inode, ACE4_READ_DATA/ACE4_WRITE_DATA)` for each `read`/`write`, which in turn triggers a Stat for each read/write operation. This incurs an unnecessary performance penalty of more than 20%. The access check is unnecessary because a successful check upon "open" will be valid for the entire lifetime of the call, just as it is when opening a file descriptor in POSIX. In order to properly track granted access, we can leverage data stored in stateid4 and the existing work in FileTracker. Since stateid4 exists in NFSv4 only, we make sure there is no performance degradation in NFSv3 and no exposure of NFSv4-only internals in the VirtualFileSystem API: 1. Expose "Opaque" stateids to VirtualFileSystem read/write/commit; add open/close for custom FS interop. 2. Rework stateid4, which currently conflates stateid and seqid -- directly reference stateid4.other Opaque when possible. 3. In PseudoFS, check SharedAccess state from FileTracker and use this to determine access during read/write when available. With the change, for sustained reads I'm seeing 854 MB/s instead of 712 MB/s on LocalFileSystem, and 2000 MB/s instead of 1666 MB/s on a custom implementation. Signed-off-by: Christian Kohlschütter <[email protected]>

context.getFs() may be null upon close... Signed-off-by: Christian Kohlschütter <[email protected]>

core/src/main/java/org/dcache/nfs/v4/xdr/stateid4.java

Apparently stateid4 needs to be serialized for dcache, so let's make sure this works again. Signed-off-by: Christian Kohlschütter <[email protected]>

core/src/main/java/org/dcache/nfs/v4/xdr/stateid4.java

As per reviewer comment, this may still be used somewhere. Signed-off-by: Christian Kohlschütter <[email protected]>

kofemann · 2025-07-14T14:48:52Z

There are too many changes. Some make sense, some don't. Cleanups, like new stateid => stateid.clone(), are good. byteAt should be stateid.type(). The vfs layer is extra designed to be thin. If OPEN/READ/WRITE/CLOSE must pass extra information to the custom filesystem, then those operations can be overridden.

kohlschuetter · 2025-07-14T14:55:06Z

I'm not sure I follow.

What can be removed? We're not going to expose byte[] or stateid4 directly to the VFS API (and wrapping them into Opaque for each call also doesn't make much sense), so deep cuts have to be made.

How can operations be overridden by the VFS if that information is not exposed otherwise?

Which changes don't make sense?

(Maybe if you review the changes commit by commit it's easier to follow)

With the introduction of stateid4.getType, let's not make any further assumptions about how the type is stored in the Opaque. Add a static getType variant in stateid4 and call that instead of Opaque.byteAt. Signed-off-by: Christian Kohlschütter <[email protected]>

... to reduce the amount of changes in dCache#162 Signed-off-by: Christian Kohlschütter <[email protected]>

to reduce the amount of changes in dCache#162 Signed-off-by: Christian Kohlschütter <[email protected]>

kohlschuetter · 2025-07-14T15:52:39Z

@kofemann I've cut a few changes, but that's as much as I think we can cut.

to reduce the amount of changes in dCache#162 We may also need the reference to Inode should we decide to call Vfs.close from here. Signed-off-by: Christian Kohlschütter <[email protected]>

to reduce the amount of changes in dCache#162 Signed-off-by: Christian Kohlschütter <[email protected]>

kohlschuetter · 2025-07-15T15:32:35Z

Follow-up on #163

kohlschuetter added 4 commits July 14, 2025 15:29

core: Opaque: add byteAt/longAt support

e01d9f8

We sometimes want to check a certain byte or long as part of an Opaque. Let's add helper methods so we don't have to convert to byte[]. Signed-off-by: Christian Kohlschütter <[email protected]>

core: Fix NPE in OperationCLOSE when running maven tests

3007d90

context.getFs() may be null upon close... Signed-off-by: Christian Kohlschütter <[email protected]>

kofemann reviewed Jul 14, 2025

View reviewed changes

core/src/main/java/org/dcache/nfs/v4/xdr/stateid4.java Outdated Show resolved Hide resolved

core: stateid4: Restore Serialization compatibility

6d67cd2

Apparently stateid4 needs to be serialized for dcache, so let's make sure this works again. Signed-off-by: Christian Kohlschütter <[email protected]>

kofemann reviewed Jul 14, 2025

View reviewed changes

core/src/main/java/org/dcache/nfs/v4/xdr/stateid4.java Outdated Show resolved Hide resolved

core: stateid4: Bring back xdrDecode

5db35a0

As per reviewer comment, this may still be used somewhere. Signed-off-by: Christian Kohlschütter <[email protected]>

kohlschuetter mentioned this pull request Jul 14, 2025

core: Return BAD_STATEID for NFSv4.0 special "stateless" stateids #161

Closed

kohlschuetter added 3 commits July 14, 2025 17:36

core: Undo stateid4.forBytes change

dd506c6

... to reduce the amount of changes in dCache#162 Signed-off-by: Christian Kohlschütter <[email protected]>

core: Revert unnecessary linewrap

5d2e843

to reduce the amount of changes in dCache#162 Signed-off-by: Christian Kohlschütter <[email protected]>

core: Undo some changes to Opaque

c4510cf

to reduce the amount of changes in dCache#162 Signed-off-by: Christian Kohlschütter <[email protected]>

core: Undo optimization in FileTracker

02708e4

to reduce the amount of changes in dCache#162 We may also need the reference to Inode should we decide to call Vfs.close from here. Signed-off-by: Christian Kohlschütter <[email protected]>

kohlschuetter force-pushed the ck/Stat2 branch from cb3a23a to 02708e4 Compare July 14, 2025 16:15

core: test: Undo some organize-imports

f5c9b9f

to reduce the amount of changes in dCache#162 Signed-off-by: Christian Kohlschütter <[email protected]>

kohlschuetter closed this Jul 15, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

core: Improve access tracking performance for read/write #162

core: Improve access tracking performance for read/write #162

Uh oh!

kohlschuetter commented Jul 14, 2025

Uh oh!

Uh oh!

Uh oh!

kofemann commented Jul 14, 2025

Uh oh!

kohlschuetter commented Jul 14, 2025 •

edited

Loading

Uh oh!

kohlschuetter commented Jul 14, 2025

Uh oh!

kohlschuetter commented Jul 15, 2025

Uh oh!

Uh oh!

core: Improve access tracking performance for read/write #162

core: Improve access tracking performance for read/write #162

Uh oh!

Conversation

kohlschuetter commented Jul 14, 2025

Uh oh!

Uh oh!

Uh oh!

kofemann commented Jul 14, 2025

Uh oh!

kohlschuetter commented Jul 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

kohlschuetter commented Jul 14, 2025

Uh oh!

kohlschuetter commented Jul 15, 2025

Uh oh!

Uh oh!

kohlschuetter commented Jul 14, 2025 •

edited

Loading