Skip to content

fix: Eliminate RwLock in SequenceCounter to avoid read-write deadlock issues#19432

Merged
KKould merged 7 commits intodatabendlabs:mainfrom
KKould:refactor/eliminate_sequnece_counter_rwlock
Feb 11, 2026
Merged

fix: Eliminate RwLock in SequenceCounter to avoid read-write deadlock issues#19432
KKould merged 7 commits intodatabendlabs:mainfrom
KKould:refactor/eliminate_sequnece_counter_rwlock

Conversation

@KKould
Copy link
Member

@KKould KKould commented Feb 9, 2026

I hereby agree to the terms of the CLA available at: https://docs.databend.com/dev/policies/cla/

Summary

This issue points out that the current usage of RwLock in SequenceCounter may lead to deadlocks or degraded performance due to blocking. The issue also suggests using tokio::spawn to avoid blocking.

In this PR, instead of introducing a larger refactor(), I aimed to address the root cause of the problem by minimizing the reliance on RwLock. The solution is to initialize the counter using a refill_lock with double-checked locking on the slow path. As a result, locking is only required during the slow-path refill/initialization, while the normal fast-path sequence increment remains lock-free.

This PR also adds two unit test based on the reproduction case from the issue: test_no_stall_when_refill_lock_waiting & test_high_concurrency_fast_path_progress_during_refill_contention.

Special thanks to @YZL0v3ZZ for the careful review and valuable feedback.

Tips:

  • As far as I know, the logic between next_val_v0 and next_val_v1 is not fully consistent yet, and the migration is still ongoing. A larger refactor at this stage could further increase the complexity. Therefore, this change tries to preserve the original structure and behavior as much as possible.
  • We may advance the cached counter (claim_up_to) before awaiting the remote fetch to avoid duplicates under concurrency. If the fetch fails, the claimed cached values are skipped, which can introduce small sequence gaps.
    • Sequences are not guaranteed to be gapless, and fetch errors are expected to be rare

Tests

  • Unit Test
  • Logic Test
  • Benchmark Test
  • No Test - Explain why

Type of change

  • Bug Fix (non-breaking change which fixes an issue)
  • New Feature (non-breaking change which adds functionality)
  • Breaking Change (fix or feature that could cause existing functionality not to work as expected)
  • Documentation Update
  • Refactoring
  • Performance Improvement
  • Other (please describe):

This change is Reviewable

@KKould KKould requested a review from sundy-li February 9, 2026 07:23
@KKould KKould self-assigned this Feb 9, 2026
@github-actions github-actions bot added the pr-bugfix this PR patches a bug in codebase label Feb 9, 2026
@KKould KKould marked this pull request as ready for review February 9, 2026 08:49
@KKould
Copy link
Member Author

KKould commented Feb 10, 2026

@codex review

Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: c6018ad90c

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

@KKould
Copy link
Member Author

KKould commented Feb 10, 2026

@codex review

Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 4d06f74c98

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

@KKould
Copy link
Member Author

KKould commented Feb 10, 2026

@codex review

Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 37f0d12ff5

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

@KKould KKould force-pushed the refactor/eliminate_sequnece_counter_rwlock branch from e66f769 to c759f70 Compare February 11, 2026 02:26
@KKould KKould merged commit 1631b6c into databendlabs:main Feb 11, 2026
421 of 429 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

pr-bugfix this PR patches a bug in codebase

Projects

None yet

Development

Successfully merging this pull request may close these issues.

bug: Potential Deadlock in TransformAsyncFunction with Shared RwLock

2 participants