Skip to content

fix(meta): ignore stale refresh finish events after source drop#24983

Open
tabVersion wants to merge 1 commit intotabVersion/fix-24829from
bob/fix-refresh-drop-race
Open

fix(meta): ignore stale refresh finish events after source drop#24983
tabVersion wants to merge 1 commit intotabVersion/fix-24829from
bob/fix-refresh-drop-race

Conversation

@tabVersion
Copy link
Contributor

@tabVersion tabVersion commented Mar 5, 2026

Summary

  • handle stale refresh completion events (ListFinish/LoadFinish) when the associated source is already dropped
  • treat CatalogIdNotFound(object, ..) from get_object_database_id(associated_source_id) as idempotent skip instead of bubbling to barrier completion failure
  • keep other error types unchanged

Why

In REFRESH TABLE + immediate DROP TABLE races, refresh finish events can arrive after catalog cleanup. Previously this propagated as:
Failed to get database id for table -> object id not found, which could block command completion and trigger recovery noise.

Tests

  • cargo check -p risingwave_meta --lib
  • cargo test -p risingwave_meta --lib skip_refresh_finish_when_associated_source_missing
  • cargo test -p risingwave_meta --lib do_not_skip_refresh_finish_for_other_not_found_types

@gemini-code-assist
Copy link

Summary of Changes

Hello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request addresses a race condition where refresh completion events could arrive after a source object has been dropped, leading to errors that block command completion. By introducing a new helper function and specific error handling, the system now gracefully skips processing these stale events, improving robustness and preventing unnecessary recovery noise in such scenarios.

Highlights

  • Stale Refresh Events: Handled stale refresh completion events (ListFinish/LoadFinish) that may arrive after their associated source has already been dropped.
  • Error Handling: Treated CatalogIdNotFound("object", ..) errors from get_object_database_id as an idempotent skip, preventing them from causing barrier completion failures.
  • Error Preservation: Ensured that other types of errors continue to be propagated as before, only specifically handling the CatalogIdNotFound("object", ..) case.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Changelog
  • src/meta/src/barrier/context/context_impl.rs
    • Imported MetaErrorInner for more granular error matching.
    • Refactored the logic for retrieving the associated source's database ID during ListFinish and LoadFinish command processing into a new helper function.
    • Added get_source_database_id_for_refresh_stage to encapsulate the lookup and error handling, returning None and logging a warning if the associated source is not found (specifically for 'object' type CatalogIdNotFound errors).
    • Implemented should_skip_refresh_finish_for_missing_object to identify MetaErrorInner::CatalogIdNotFound("object", _) errors.
    • Included unit tests to verify the correct behavior of should_skip_refresh_finish_for_missing_object for both matching and non-matching error types.
Activity
  • The author, tabVersion, created this pull request to fix a meta-related issue.
  • The author provided a detailed summary, rationale, and testing instructions.
  • No human review comments or additional activity have been recorded yet.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request addresses a race condition where a refresh operation finishes after its associated source has been dropped. The changes correctly handle this by treating a CatalogIdNotFound("object", ...) error as a skippable event, preventing it from causing a barrier completion failure. A new helper function encapsulates this logic, making the code cleaner and more robust. The changes are well-tested and effectively resolve the issue of unnecessary recovery noise.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant