Skip to content

[ENH]: Get database name from get_collections_to_gc mcmr endpoint (WIP)#6392

Open
tanujnay112 wants to merge 1 commit intomainfrom
gc_threading
Open

[ENH]: Get database name from get_collections_to_gc mcmr endpoint (WIP)#6392
tanujnay112 wants to merge 1 commit intomainfrom
gc_threading

Conversation

@tanujnay112
Copy link
Contributor

Description of changes

Summarize the changes made by this PR.

  • Improvements & Bug fixes
    • ...
  • New functionality
    • ...

Test plan

How are these changes tested?

  • Tests pass locally with pytest for python, yarn test for js, cargo test for rust

Migration plan

Are there any migrations, or any forwards/backwards compatibility changes needed in order to make sure this change deploys reliably?

Observability plan

What is the plan to instrument and monitor this change?

Documentation Changes

Are all docstrings for user-facing APIs updated if required? Do we need to make documentation changes in the docs section?

@tanujnay112 tanujnay112 marked this pull request as ready for review February 10, 2026 00:46
@tanujnay112 tanujnay112 changed the title [ENH]: Get database name from get_collections_to_gc mcmr endpoint [ENH]: Get database name from get_collections_to_gc mcmr endpoint (WIP) Feb 10, 2026
@github-actions
Copy link

Reviewer Checklist

Please leverage this checklist to ensure your code review is thorough before approving

Testing, Bugs, Errors, Logs, Documentation

  • Can you think of any use case in which the code does not behave as intended? Have they been tested?
  • Can you think of any inputs or external events that could break the code? Is user input validated and safe? Have they been tested?
  • If appropriate, are there adequate property based tests?
  • If appropriate, are there adequate unit tests?
  • Should any logging, debugging, tracing information be added or removed?
  • Are error messages user-friendly?
  • Have all documentation changes needed been made?
  • Have all non-obvious changes been commented?

System Compatibility

  • Are there any potential impacts on other parts of the system or backward compatibility?
  • Does this change intersect with any items on our roadmap, and if so, is there a plan for fitting them together?

Quality

  • Is this code of a unexpectedly high quality (Readability, Modularity, Intuitiveness)

@propel-code-bot
Copy link
Contributor

propel-code-bot bot commented Feb 10, 2026

Add database-aware GC listing and orchestrator routing

This PR wires the ListCollectionsToGc flow through rust-sysdb and the garbage collector so GC jobs know which database/topology a collection belongs to. New request/response types, backend plumbing, and a Spanner query implementation expose collections to GC along with database metadata, and the gRPC server now serves the endpoint. The garbage collector stack (runtime component, orchestrator, tool, and coordinator proto) consumes the database name to route operations correctly, while logging/error handling improvements cover MCMR fan-out failures and parse errors.

Key Changes

• Introduced ListCollectionsToGcRequest/Response types plus runnable/assignable plumbing in rust-sysdb, implemented the Spanner query, and exposed the endpoint through SysdbService.
• Augmented gRPC clients (rust/sysdb) to fan out GC listing calls across single-region and MCMR backends with proper error logging and database-name propagation.
• Extended garbage collector orchestrator/component/tooling to carry an optional DatabaseName per collection when deleting files/logs or finalizing sysdb cleanup, and updated coordinator.proto to include database_name in CollectionToGcInfo.

Possible Issues

• Combined single-region + MCMR results bypass the requested limit, potentially doubling GC workload or starving later collections.
• No deduplication between the two sources may cause the same collection to be processed twice if present in both backends.
database_name is optional throughout; orchestrator paths converting it via DatabaseName::new still produce None if legacy data lacks names, which may break downstream operations expecting a string.

This summary was automatically generated by @propel-code-bot

Comment on lines 1640 to 1644
let mcmr_collections: Vec<CollectionToGcInfo> = mcmr_res
.into_inner()
.collections
.into_iter()
.filter_map(|c| c.try_into().ok())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Important

[Reliability] Silent failure on parsing errors: filter_map(|c| c.try_into().ok()) silently drops collections that fail to parse. If there's a data inconsistency or schema mismatch, collections will be missing from the GC list without any error indication, potentially causing memory leaks as those collections are never cleaned up.

Log parsing failures:

let mcmr_collections: Vec<CollectionToGcInfo> = mcmr_res
    .into_inner()
    .collections
    .into_iter()
    .filter_map(|c| {
        c.try_into()
            .map_err(|e| {
                tracing::warn!("Failed to parse collection from mcmr_client: {:?}", e);
                e
            })
            .ok()
    })
    .collect();
Context for Agents
Silent failure on parsing errors: `filter_map(|c| c.try_into().ok())` silently drops collections that fail to parse. If there's a data inconsistency or schema mismatch, collections will be missing from the GC list without any error indication, potentially causing memory leaks as those collections are never cleaned up.

Log parsing failures:

```rust
let mcmr_collections: Vec<CollectionToGcInfo> = mcmr_res
    .into_inner()
    .collections
    .into_iter()
    .filter_map(|c| {
        c.try_into()
            .map_err(|e| {
                tracing::warn!("Failed to parse collection from mcmr_client: {:?}", e);
                e
            })
            .ok()
    })
    .collect();
```

File: rust/sysdb/src/sysdb.rs
Line: 1644

@blacksmith-sh
Copy link
Contributor

blacksmith-sh bot commented Feb 12, 2026

OOM Events Detected

  • Job Python tests / test-rust-bindings-stress (3.9) has run into an OOM error.

Comment on lines +1642 to +1643

// This doesn't respect limit anymore:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Important

[Logic] Logic error in limit handling: When mcmr_client is present, the total number of collections returned can exceed the requested limit (up to 2x). The comment acknowledges this but doesn't handle it.

If limit=100, you could get 100 from primary + 100 from mcmr = 200 total, violating the caller's expectation.

if let Some(limit) = req.limit {
    let remaining = (limit as usize).saturating_sub(collections.len());
    if remaining == 0 {
        return Ok(collections);
    }
    // Adjust req.limit before calling mcmr_client
    let mut mcmr_req = req.clone();
    mcmr_req.limit = Some(remaining as u64);
    match mcmr_client.list_collections_to_gc(mcmr_req).await { /* ... */ }
}
Context for Agents
Logic error in limit handling: When `mcmr_client` is present, the total number of collections returned can exceed the requested `limit` (up to 2x). The comment acknowledges this but doesn't handle it.

If `limit=100`, you could get 100 from primary + 100 from mcmr = 200 total, violating the caller's expectation.

```rust
if let Some(limit) = req.limit {
    let remaining = (limit as usize).saturating_sub(collections.len());
    if remaining == 0 {
        return Ok(collections);
    }
    // Adjust req.limit before calling mcmr_client
    let mut mcmr_req = req.clone();
    mcmr_req.limit = Some(remaining as u64);
    match mcmr_client.list_collections_to_gc(mcmr_req).await { /* ... */ }
}
```

File: rust/sysdb/src/sysdb.rs
Line: 1643

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant