[feat][pip] PIP-454: Metadata Store Migration Framework#25196
[feat][pip] PIP-454: Metadata Store Migration Framework#25196merlimat wants to merge 2 commits intoapache:masterfrom
Conversation
|
|
||
| **Broker/Bookie actions (automatic, triggered by watching the flag):** | ||
| 1. Detect migration flag via watch on `/pulsar/migration-coordinator/migration` | ||
| 2. Defer non-critical metadata writes (e.g., ledger rollovers, bundle ownership changes) |
There was a problem hiding this comment.
would this use the existing solution? a SessionEvent.ConnectionLost/SessionEvent.SessionLost event sets a flag metadataServiceAvailable that is used for this purpose in many locations.
It seems that currently ledger trimming, ledger rollover and loadbalancer load shedding are using the the metadataServiceAvailable flag in ManagedLedgerFactoryImpl.
There's also a dependency on the event directly:
would the coordinator send a SessionEvent.ConnectionLost event when migration starts so that it remains compatible with the existing solution?
AbstractMetadataStore has a flag isConnected which could also be useful? it's not currently used within Pulsar, just for metadata store caching decisions. I guess it would be necessary to skip cache refreshs while the migration is on-going.
| Each broker and bookie registers itself as a migration participant by creating a sequential ephemeral node: | ||
| - Path: `/pulsar/migration-coordinator/participants/id-NNNN` (sequential) | ||
| - This allows the coordinator to know how many participants exist before migration starts |
There was a problem hiding this comment.
Which node is selected as the migration coordinator?
If it's the broker, what if the migration takes a lot more memory than the broker usually does and causes OOME.
Would it be possible to deploy a dedicated coordinator or run the coordinator in-process, let's say in a pod with sufficient resources, running in a Pulsar cluster?
There was a problem hiding this comment.
Some additional questions:
Does the same single coordinator instance migrate both Pulsar and Bookkeeper metadata? Why does the coordinator need to know how many participants exist?
lhotari
left a comment
There was a problem hiding this comment.
LGTM, just a few comments / questions.
eolivelli
left a comment
There was a problem hiding this comment.
A couple of questions:
- is it possible to also migrate back from Oxia to ZooKeeper ?
- this PIP is not about the metadata store to coordinate geo replication, should we state it explicitly ?
|
Good questions.
While the mechanism can be quite generic, the tricky part to make it work seamlessly is to carry over the same version ids on each metadata record. That allows brokers and bookies conditional writes to not fail in the migration. In Oxia we were able to add explicit override for this specific purpose. With ZK it's unfortunately not as easy.
Good point. I'll highlight this more. |
Motivation
Modifications
Verifying this change
(Please pick either of the following options)
This change is a trivial rework / code cleanup without any test coverage.
(or)
This change is already covered by existing tests, such as (please describe tests).
(or)
This change added tests and can be verified as follows:
(example:)
Does this pull request potentially affect one of the following parts:
If the box was checked, please highlight the changes
Documentation
docdoc-requireddoc-not-neededdoc-completeMatching PR in forked repository
PR in forked repository: