Skip to content

issue-5293: fixed caching of stale GetNodeAttr responses in case of concurrent WriteData/SetNodeAttr/AllocateData requests#5308

Open
qkrorlqr wants to merge 16 commits intomainfrom
users/qkrorlqr/fix-set-get-attr-race
Open

issue-5293: fixed caching of stale GetNodeAttr responses in case of concurrent WriteData/SetNodeAttr/AllocateData requests#5308
qkrorlqr wants to merge 16 commits intomainfrom
users/qkrorlqr/fix-set-get-attr-race

Conversation

@qkrorlqr
Copy link
Collaborator

@qkrorlqr qkrorlqr commented Feb 24, 2026

Notes

  • introducing a per-fs atomic GlobalAttrVersion counter - incrementing it every time upon attr-modifying operation completion
  • invalidating node attr in TNodeCache upon each attr-modifying operation completion and setting the just-incremented GlobalAttrVersion as this node's version
  • loading current node attr version from TNodeCache upon attr-reading operation start
  • resetting attr_timeout in the attrs returned to the guest if upon attr-reading operation completion we find out that the version of the node in TNodeCache has changed since the start of the operation

extra:

  • moved TDirectoryBuilder to a separate .h/.cpp file pair and added unit tests for it
  • moved NodeCacheLock into TNodeCache so TNodeCache is now thread-safe
  • TNodeCache benchmark
  • implemented TNodeCache sharding by node id to reduce contention (the results can be checked via the added benchmark)
// 16 client threads, 1 shard (equivalent to non-sharded version)
----------- UpdateNode16 ---------------
 samples:       4364
 iterations:    11758825
 iterations hr:    11.8M
 run time:      5.000951105
 per iteration: 1235.532165 cycles

// 16 client threads, 16 shards
----------- UpdateNode16_16 ---------------
 samples:       10080
 iterations:    62736801
 iterations hr:    62.7M
 run time:      5.001321319
 per iteration: 156.7914719 cycles

Issue

#5293

@qkrorlqr qkrorlqr added large-tests Launch large tests for PR filestore Add this label to run only cloud/filestore build and tests on PR labels Feb 24, 2026
@github-actions
Copy link
Contributor

Note

This is an automated comment that will be appended during run.

🟢 linux-x86_64-relwithdebinfo target: cloud/filestore/ (test time: 6939s): all tests PASSED for commit 6a2ca31.

TESTS PASSED ERRORS FAILED FAILED BUILD SKIPPED MUTED?
3439 3439 0 0 0 0 0

@qkrorlqr qkrorlqr added asan Launch builds with address sanitizer along with regular build tsan Launch builds with thread sanitizer along with regular build msan Launch builds with memory sanitizer along with regular build ubsan Launch builds with undefined behaviour sanitizer along with regular build labels Feb 25, 2026
@github-actions
Copy link
Contributor

Note

This is an automated comment that will be appended during run.

🟢 linux-x86_64-release-asan target: cloud/filestore/ (test time: 5176s): all tests PASSED for commit 3b9e79d.

TESTS PASSED ERRORS FAILED FAILED BUILD SKIPPED MUTED?
2911 2911 0 0 0 0 0

@github-actions
Copy link
Contributor

Note

This is an automated comment that will be appended during run.

🟢 linux-x86_64-release-msan target: cloud/filestore/ (test time: 5050s): all tests PASSED for commit 3b9e79d.

TESTS PASSED ERRORS FAILED FAILED BUILD SKIPPED MUTED?
2911 2911 0 0 0 0 0

@github-actions
Copy link
Contributor

github-actions bot commented Feb 25, 2026

Note

This is an automated comment that will be appended during run.

🔴 linux-x86_64-relwithdebinfo target: cloud/filestore/ (test time: 7008s): some tests FAILED for commit 3b9e79d.

TESTS PASSED ERRORS FAILED FAILED BUILD SKIPPED MUTED?
3441 3440 0 1 0 0 0

🔴 linux-x86_64-relwithdebinfo target: cloud/filestore/ (test time: 184s): some tests FAILED for commit 3b9e79d.

TESTS PASSED ERRORS FAILED FAILED BUILD SKIPPED MUTED?
2 1 0 1 0 0 0

🔴 linux-x86_64-relwithdebinfo target: cloud/filestore/ (test time: 180s): some tests FAILED for commit 3b9e79d.

TESTS PASSED ERRORS FAILED FAILED BUILD SKIPPED MUTED?
2 1 0 1 0 0 0

@github-actions
Copy link
Contributor

Note

This is an automated comment that will be appended during run.

🟢 linux-x86_64-release-tsan target: cloud/filestore/ (test time: 5460s): all tests PASSED for commit 3b9e79d.

TESTS PASSED ERRORS FAILED FAILED BUILD SKIPPED MUTED?
2901 2901 0 0 0 0 0

@github-actions
Copy link
Contributor

Note

This is an automated comment that will be appended during run.

🟢 linux-x86_64-release-ubsan target: cloud/filestore/ (test time: 4967s): all tests PASSED for commit 3b9e79d.

TESTS PASSED ERRORS FAILED FAILED BUILD SKIPPED MUTED?
2942 2942 0 0 0 0 0

@github-actions
Copy link
Contributor

Note

This is an automated comment that will be appended during run.

🔴 linux-x86_64-release-asan target: cloud/filestore/ (test time: 16962s): some tests FAILED for commit d9dfe9e.

TESTS PASSED ERRORS FAILED FAILED BUILD SKIPPED MUTED?
2796 2190 0 191 4 347 64

@github-actions
Copy link
Contributor

Note

This is an automated comment that will be appended during run.

🔴 linux-x86_64-release-msan target: cloud/filestore/ (test time: 17153s): some tests FAILED for commit d9dfe9e.

TESTS PASSED ERRORS FAILED FAILED BUILD SKIPPED MUTED?
2796 2190 0 191 4 347 64

@github-actions
Copy link
Contributor

Note

This is an automated comment that will be appended during run.

🔴 linux-x86_64-release-tsan target: cloud/filestore/ (test time: 17333s): some tests FAILED for commit d9dfe9e.

TESTS PASSED ERRORS FAILED FAILED BUILD SKIPPED MUTED?
2786 2179 0 192 4 347 64

@github-actions
Copy link
Contributor

Note

This is an automated comment that will be appended during run.

🔴 linux-x86_64-release-ubsan target: cloud/filestore/ (test time: 18194s): some tests FAILED for commit d9dfe9e.

TESTS PASSED ERRORS FAILED FAILED BUILD SKIPPED MUTED?
2825 2202 0 209 4 346 64

@github-actions
Copy link
Contributor

github-actions bot commented Feb 27, 2026

Note

This is an automated comment that will be appended during run.

🔴 linux-x86_64-relwithdebinfo target: cloud/filestore/ (test time: 27185s): some tests FAILED for commit d9dfe9e.

TESTS PASSED ERRORS FAILED FAILED BUILD SKIPPED MUTED?
3324 2483 0 298 4 475 64

🔴 linux-x86_64-relwithdebinfo target: cloud/filestore/ (test time: 23869s): some tests FAILED for commit d9dfe9e.

TESTS PASSED ERRORS FAILED FAILED BUILD SKIPPED MUTED?
949 109 0 300 0 476 64

@qkrorlqr qkrorlqr force-pushed the users/qkrorlqr/fix-set-get-attr-race branch from d9dfe9e to b430327 Compare February 27, 2026 10:43
@qkrorlqr qkrorlqr marked this pull request as ready for review February 27, 2026 12:44

NProto::TError ResetAttrTimeout(
char* data,
ui64 len,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remainingLen

const TNodeIdVisitor& visitor)
{
#if defined(FUSE_VIRTIO)
while (len > sizeof(fuse_direntplus)) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

probably add commit that expect no name with 0 lens

content.GetSize());
TBuffer c(content.GetData(), content.GetSize());

ResetAttrTimeout(c.Data(), c.Size(), [&] (ui64 ino) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

log error?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will it work as expected for '.' and '..'


if (self->CheckResponse(self, *callContext, req, response)) {
//
// Disallow result caching for all concurrently running operations
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we take care about WriteBufLocal as well?

struct TContent
{
TBufferPtr Buffer;
ui64 AttrVersion = 0;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we do not serialize it, that means we will reset it, meaning, that after restart we will have stale caches. probably discussion is needed

entry.entry_timeout = GetEntryCacheTimeout(attrs).SecondsFloat();

ConvertAttr(Config->GetPreferredBlockSize(), node->Attrs, entry.attr);
} else {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not clear why, probably some comment here can help

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

asan Launch builds with address sanitizer along with regular build filestore Add this label to run only cloud/filestore build and tests on PR large-tests Launch large tests for PR msan Launch builds with memory sanitizer along with regular build tsan Launch builds with thread sanitizer along with regular build ubsan Launch builds with undefined behaviour sanitizer along with regular build

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants