Skip to content

feat(catalog): support for multiple catalog pods#2169

Open
pboyd wants to merge 2 commits intokubeflow:mainfrom
pboyd:multi-catalog-pods
Open

feat(catalog): support for multiple catalog pods#2169
pboyd wants to merge 2 commits intokubeflow:mainfrom
pboyd:multi-catalog-pods

Conversation

@pboyd
Copy link
Member

@pboyd pboyd commented Jan 29, 2026

Description

Enables horizontal scaling through PostgreSQL-based leader election that coordinates database writes across multiple pods.

All pods serve read requests from in-memory data and database queries. The leader alone performs database writes: fetches models, writes updates, and cleans up orphaned data. Leadership transfers automatically when the leader fails.

Implementation:

  • Leader election package using pglock for distributed locking
  • Loader split into StartReadOnly() and StartLeader() modes
  • Configuration: CATALOG_LEADER_LOCK_DURATION and CATALOG_LEADER_HEARTBEAT environment variables
  • Integration tests for multi-pod scenarios

How Has This Been Tested?

In a local dev environment the deployment can be scaled up to multiple pods. Killing the leader or deleting the entry in the locks is enough to trigger fail-over.

Merge criteria:

  • All the commits have been signed-off (To pass the DCO check)
  • The commits have meaningful messages
  • Automated tests are provided as part of the PR for major new functionalities; testing instructions have been added in the PR body (for PRs involving changes that are not immediately obvious).
  • The developer has manually tested the changes and verified that the changes work.
  • Code changes follow the kubeflow contribution guidelines.

@google-oss-prow
Copy link

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please ask for approval from pboyd. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@google-oss-prow google-oss-prow bot requested a review from jonburdo January 29, 2026 20:06
@pboyd pboyd force-pushed the multi-catalog-pods branch 3 times, most recently from bb1378d to 907a106 Compare February 2, 2026 14:14
@pboyd pboyd marked this pull request as ready for review February 2, 2026 14:17
pboyd added 2 commits February 6, 2026 13:17
Enables horizontal scaling through PostgreSQL-based leader election
that coordinates database writes across multiple pods.

All pods serve read requests from in-memory data and database queries.
The leader alone performs database writes: fetches models, writes
updates, and cleans up orphaned data. Leadership transfers
automatically when the leader fails.

Implementation:
- Leader election package using pglock for distributed locking
- Loader split into StartReadOnly() and StartLeader() modes
- Configuration: CATALOG_LEADER_LOCK_DURATION and
  CATALOG_LEADER_HEARTBEAT environment variables
- Integration tests for multi-pod scenarios

Signed-off-by: Paul Boyd <paul@pboyd.io>
Signed-off-by: Paul Boyd <paul@pboyd.io>
@pboyd pboyd force-pushed the multi-catalog-pods branch from 907a106 to 56a20d0 Compare February 6, 2026 18:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant