Skip to content

Conversation

eleonoradgr
Copy link
Contributor

What changed?
[WIP] Requiring early feedback on the approach

Logic implemented in the shard distributor:
when receiving the heartbeat

  • error if the namespace is in local passthrough mode, we expect no external calls
  • local passhtrough shadow and distributed passthrough have the same behaviour: check the shard assignment for the executor, if nothing changed return it back, if it changed then delete the executor and add the shards again. in case these are the modality the namespace reconciliation loop is not running for this namespace (this way we do not reassign shards while we delete the executor)
  • previous expected flow in case the mode is onboarded

Logic that will be implemented in a followup pr for executor library

  • crete teo module to instantiate the executor with local passthrough or communication with sd
  • the modality will be assigned after the first hearbeat request (except if it is local passthrough which is statically assigned)
  • local passhtrough shadow, check the answer against the current request and do not assign back to the internal state
  • distributed passthrough before putting into place the new sharding assignment, send heartbeat and applied it after receiving it back
  • onboarded normal flow

NextNext PR
For each of the cases create a test in the canary

Why?

How did you test it?

Potential risks

Release notes

Documentation Changes

Copy link
Member

@jakobht jakobht left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great!
Looking at this I think we need to consider the switch to fully onboarded again.
Doing a gradual rollout of this will cause inconsistency in shard ownership, I think.

return nil, fmt.Errorf("delete executors: %w", err)
}
for shard := range request.GetShardStatusReports() {
err = h.storage.AssignShard(ctx, request.GetNamespace(), request.GetExecutorID(), shard)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is Ok for now, but slightly worried about transactionallity - maybe we should have an "assignShardsToExecutor" function in the storage?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we just deleted the executor from the store I'm concerned if this call will just fail since the executor is not there?
I think a new store function that's transactional sound good?

continue
}
if p.namespaceCfg.Mode != config.MigrationModeONBOARDED {
p.logger.Info("Namespace not onboarded, rebalance not triggered", tag.ShardNamespace(p.namespaceCfg.Name))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This might log a lot, but we can always adjust if it's too much

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants