Possibility for a data race in the view? #185
Replies: 1 comment 3 replies
-
You are correct that race conditions can occur in queries where the command side is completely protected. This can even be common in some scenarios because CQRS is designed explicitly to allow these two sides to be very far apart (logically and physically). For instance, let's say you have an aggregate Account, but a query that is collecting Customer information by gathering any relevant events and placing them on a Kafka stream using the This is similar to the problem that CQRS provides no guarantees that an event is successfully processed by all queries. Its' guarantee ends at the event committal, further protections are needed within the query itself. It's a good practice to always ensure ordering logic (usually aggregate type + aggregate id + sequence number) is passed all the way down to the lowest level of logic so that queries can recognize where events are out of order. With that information there are a variety of ways to deal with this problem, a few are:
|
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
Hi,
I've been trying to figure out if I'm seeing a potential for a data race. I could be wrong of course, so I'd appreciate your feedback~
This specific routine:
cqrs/src/cqrs.rs
Lines 166 to 190 in 5f31437
Consider the following flow where we have two separate lambda instances, they each handle a separate command around the same time that will end up updating the same aggregate. The timestamps are incrementing and shown on the left:
Lambda 1 experiences a network delay just before it's about to read the latest view. In the meantime Lambda 2 started processing a separate command:
Note that there was no lock contention here. This is essentially saying that both lambdas are now in
dispatch()
awaiting to fetch the view:cqrs/src/cqrs.rs
Line 187 in 5f31437
In the meantime Lambda 1's network delay has been resolved. Now we have a race:
Depending on the view repository implementation this could either result in one of the two writes failing, or potentially silently allowing both writes to happen.
The
dynamo-es
crate has some protections against this in the update_view() routine where it checks for the proper version / sequence:That could potentially make
execute_with_metadata()
throw an error for one of the two lambdas. The client code could then try to re-apply the command which failed. This sounds good on the surface (as mentioned here).But there's a problem here. We've already committed the event. So if we try to push the command again we'll end up with a duplicate event in the store.
It seems like views can drift and become out of sync with events.
I don't have a proof of concept for this yet though. I'd appreciate some feedback before I work on the PoC.
But the problems or potential solutions that I see here are:
execute_with_metadata()
routine doesn't hold a single event store lock throughout its running time for that aggregate, instead there's 3 separate locks held at different times: when the aggregate is being loaded, when the events being committed, and when the view is being projected.Thanks a lot~
Beta Was this translation helpful? Give feedback.
All reactions