Skip to content

Conversation

andyyang890
Copy link
Collaborator

This patch fixes a bug where the memory monitor wouldn't reclaim the
memory allocated to events corresponding to unwatched column families
for a changefeed that targets only a subset of a table's families.

Fixes #154776

Release note (bug fix): A bug where a changefeed targeting only a subset
of a table's column families could become stuck has been fixed.

@cockroach-teamcity
Copy link
Member

This change is Reviewable

@andyyang890 andyyang890 force-pushed the 20251003-fix-alloc-leak branch from fb86bc0 to 3804925 Compare October 4, 2025 02:01
@andyyang890 andyyang890 changed the title changefeedccl: fix unwatched column families memory bug changefeedccl: fix unwatched column families memory monitoring bug Oct 4, 2025
@andyyang890 andyyang890 added backport-24.1.x Flags PRs that need to be backported to 24.1. backport-24.3.x Flags PRs that need to be backported to 24.3 backport-25.2.x Flags PRs that need to be backported to 25.2 backport-25.3.x Flags PRs that need to be backported to 25.3 backport-25.4.x Flags PRs that need to be backported to 25.4 labels Oct 4, 2025
@andyyang890 andyyang890 force-pushed the 20251003-fix-alloc-leak branch 3 times, most recently from 445e667 to fc8482b Compare October 4, 2025 12:54
@andyyang890 andyyang890 requested a review from aerfrei October 4, 2025 13:01
@andyyang890 andyyang890 marked this pull request as ready for review October 4, 2025 13:01
@andyyang890 andyyang890 requested a review from a team as a code owner October 4, 2025 13:01
}

// Update watched column again to verify the feed is still progressing.
// If the memory allocations leaked, this assertion would time out.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a way to get this to fail with a clearer symptom than a timeout (at least for enterprise feeds)? This is my first time looking at the memory monitor, but my sense was that since this would set a low PerChangefeedMemLimit, that the memory monitors should fail the feed once the limit is passed. But when I tried to check that on the status of the feed in this test, I saw that it was still running.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the allocations leaked, the memory monitor wouldn't fail; instead, the changefeed would get stuck, which is why the assertion would time out. I added some more explanatory comments to the test.

This patch fixes a bug where the memory monitor wouldn't reclaim the
memory allocated to events corresponding to unwatched column families
for a changefeed that targets only a subset of a table's families.

Release note (bug fix): A bug where a changefeed targeting only a subset
of a table's column families could become stuck has been fixed.
@andyyang890 andyyang890 force-pushed the 20251003-fix-alloc-leak branch from fc8482b to a14e13a Compare October 6, 2025 16:52
@andyyang890 andyyang890 requested a review from asg0451 October 6, 2025 16:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport-24.1.x Flags PRs that need to be backported to 24.1. backport-24.3.x Flags PRs that need to be backported to 24.3 backport-25.2.x Flags PRs that need to be backported to 25.2 backport-25.3.x Flags PRs that need to be backported to 25.3 backport-25.4.x Flags PRs that need to be backported to 25.4
Projects
None yet
Development

Successfully merging this pull request may close these issues.

changefeedccl: feed excluding column families leaks mem mon allocs for ignored events
4 participants