-
-
Notifications
You must be signed in to change notification settings - Fork 20
Feat: Async Batched Backfill #606
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
| #[Attribute(Attribute::TARGET_CLASS)] | ||
| class ProjectionExecution | ||
| { | ||
| public function __construct( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed from ProjectionBatchSize attribute
jlabedo
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice !
| FROM {$streamTable} | ||
| WHERE metadata->>'_aggregate_type' = ? | ||
| SQL, [$this->aggregateType]); | ||
| ORDER BY aggregate_id |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will cause a full table scan for each batch.
For partitioned projections, I think we may have to handle all the partitioning in the projecting module. When you "prepareForBackfill", you read all the events from the event store once and write all aggregate ids in an ephemeral table, which acts like a queue. Then all workers can read the first unlocked row, lock it, project, then delete the row.
This kind of system may open the road to more advanced partitioning strategy. Like partition by any header.
Wdyt ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For partitioned projections, I think we may have to handle all the partitioning in the projecting module.
Why is it so?
In general the trick here is that prepareBackfill is to be as simple as possible. If you will take a look on it, it only need to know the count (single sql), and then generates all the messages to for rebuild within PHP process. This way, we can generate everything for rebuild within seconds even for large scale event streams.
The rest is to be done by async Message Consumers.
Therefore the case is that we need to ensure that we can go by deterministic in batches (limit offset provides next aggregate ids). So if current approach makes full table scan maybe we can do it differently, or require adding custom index for it?
| */ | ||
| declare(strict_types=1); | ||
|
|
||
| namespace Ecotone\EventSourcing\Attribute; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For me, those attributes are related to the pdo event sourcing module. If you want to project from another event source (let's say another event store implementation), you would create another attribute that would register its event store as an event source
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
They do not hold any database (pdo) specific, they just stand what stream should be used (maybe the aggregate-type is a bit specific, but it could be delivered as separate attribute).
So I think it's fine to allow it to be part of generic Projecton API?
| /** | ||
| * @deprecated Use prepareBackfill() instead. This method is kept for backward compatibility. | ||
| */ | ||
| public function backfill(): void |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not public for now, we may avoid deprecation and remove the method until it's stable ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ProjectingManager is internal, so won't be used by end users, right?
Why is this change proposed?
Batched backfill is a feature in Ecotone's projection system that allows rebuilding projections by processing partitions in configurable batches. It supports both synchronous and asynchronous execution modes, enabling efficient processing of large datasets without overwhelming system resources.
For global tracked projection it allows to do the rebuild asynchronously.
For partitioned projections it allows to scale whole process of rebuild, and do the rebuild for different partitions concurrently. Speeding up the whole process depending on the message consumers count.
Backfill attribute
Configures backfill behavior for a projection::
backfillPartitionBatchSize: Number of partitions to process in a single batch (default: 100, minimum: 1)
asyncChannelName: Optional async channel name for asynchronous backfill execution (null = synchronous)
Example: 5 Partitions with Batch Size 2
Configuration:
Execution:
Pull Request Contribution Terms