-
Notifications
You must be signed in to change notification settings - Fork 7
Separate outbox records by endpoint name to enable properly sharing the same collection between endpoints #799
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
| using NUnit.Framework; | ||
|
|
||
| // TODO Can we also move this test to Core? | ||
| public class When_subscribers_handles_the_same_event : NServiceBusAcceptanceTest |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@andreasohlund I took over the test you wrote. I think this should be moved into Core ideally. Thoughts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed
| settings.Set(transactionalSessionOptions); | ||
| settings.EnableFeatureByDefault<MongoTransactionalSession>(); | ||
|
|
||
| if (!string.IsNullOrEmpty(transactionalSessionOptions.ProcessorEndpoint)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@andreasohlund @cquirosj @poornimanayar FYI I'm adopting a similar pattern to the other persisters, which requires InternalsVisibleTo but that is standard practice with the transactional session integration. Just pinging you here in case you can think of anything I might have missed.
| /// <summary> | ||
| /// Configures the amount of time to keep outbox deduplication data. | ||
| /// </summary> | ||
| [ObsoleteMetadata( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note to myself: When releasing, these need to be moved down
| class OutboxPersister : IOutboxStorage | ||
| { | ||
| public OutboxPersister(IMongoClient client, string databaseName, MongoDatabaseSettings databaseSettings, Func<Type, string> collectionNamingConvention, MongoCollectionSettings collectionSettings) | ||
| public OutboxPersister(IMongoClient client, string partitionKey, bool readFallbackEnabled, string databaseName, MongoDatabaseSettings databaseSettings, Func<Type, string> collectionNamingConvention, MongoCollectionSettings collectionSettings) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The constructor gets bigger and bigger. At some point it might make sense to clean this up a bit, but I wanted to stay focused on the task at hand for now.
| { | ||
| UsesDefaultClassMap = usesDefaultClassMap, | ||
| TimeToKeepDeduplicationData = context.Settings.GetTimeToKeepOutboxDeduplicationData(), | ||
| #pragma warning disable IDE0037 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This keeps the left side consistent and allows us to rename the internal property.
|
|
||
| var outboxRecordId = new OutboxRecordId { MessageId = messageId, PartitionKey = partitionKey }; | ||
|
|
||
| var equalityPredicateWithFallback = CreateOutboxRecordFilterPredicate(outboxRecordId); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When the fallback is enabled, it is theoretically possible under a GUID collision (highly unlikely) or during a mixed migration that an entry is stored under the same message ID once in the legacy format and once in the new format. The only way to end up with both a legacy (string _id) and a structured (document _id) record for the same message is during a mixed-version migration. In that rare case, SetAsDispatched might update the legacy record instead of the structured one because the UpdateOneAsync with the OR is non-deterministic. In a regular outbox interaction, we always do a Get first to retrieve the deduplication record. If that is the case, the SingleOrDefault would throw, which safeguards us from doing the wrong thing.
In the transactional session cases, we currently do not support setting the "message id" on the outbox record so that part is not affected at this stage (even though we are not doing a Get there).
The alternative would be to get tighter control over the flow and do explicit roundtrips with fallbacks. This would remove the roundtrip optimization but might lead to more correct flows under any circumstances. I'm leaning towards this approach for now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See b07e368
de9fbe7 to
16bbb42
Compare
|
|
||
| public bool ReadFallbackEnabled { get; set; } = true; | ||
|
|
||
| public string PartitionKey { get; set; } = null!; // will be set by defaults |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure if we should call it ParitionKey. It's leaking internal implementation. Not even sure if this should be exposed at all.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree the name should be questioned. Would you prefer to name it what it represents currently which is EndpointName?
Regarding the "not sure it should be exposed at all". The current thinking of mine was that it is not exposed to the user but exposed as a configuration option so that there is a single path on which the configuration options flow into the persister. This helps bridge the settings over to a simple configuration class and pass that into the persister where needed. At one point I was thinking we could even streamline all the configuration that is currently passed into the outbox persister including the database settings with defaults and only pass in this configuration object as sort of a "creation option" into the type.
This would allow addressing this comment too https://github.com/Particular/NServiceBus.Storage.MongoDB/pull/799/files/16bbb427918ec8675550884a307eb28be54af7df#r2301681637 but I was hesitant to jump the gun.
Are you thinking along the lines that this class should only represent the properties that are exposed to the users and all the "internal stuff" should not be added here to clearly separate those configuration paths?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you thinking along the lines that this class should only represent the properties that are exposed to the users and all the "internal stuff" should not be added here to clearly separate those configuration paths?
Yep, that was my thought.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it. I think that is a valid angle. Will change that in the next update
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Been thinking about this a bit more and concluded it is not worth changing. The OutboxPersistenceConfiguration PartitionKey currently is either by default the endpoint name or in the case of the transactional session the remote processor endpoint name. This is what we currently use and therefore it presents a configuration option (even when only used "internally". Given that the OutboxRecordId also encodes the partition code as "pk" (partition key) and not "ep" (endpoint name) we seem to be using consistently the same "language".
I'd be more than happy to rediscuss the entire "verbiage" here in case we are no longer convinced partition key is the right term in general.
… the persistence tests have their own infrastructureˆ
a5814ad to
379c950
Compare
Fixes #767
This PR introduces a separation of the outbox records with a partition key that defaults to the endpoint name. The chosen approach is to use a structured ID that encodes the partition key into the document’s _id. This ensures that deduplication happens per endpoint and opens the door for future scenarios where the outbox collection may need to be sharded.
By making the partition key an explicit part of the identity { pk, mid }, we avoid the need for additional unique indexes (reducing storage and operational overhead) while still enforcing uniqueness at the correct granularity. The structured ID keeps reads and writes efficient by targeting _id, allows direct queries and indexes on subfields (e.g. _id.pk or _id.mid), and preserves flexibility in choosing shard keys later on. Compared to concatenated string IDs or separate compound indexes, this model is both more transparent and more naturally aligned with MongoDB’s compound key design.
Api Changes
TimeToKeepOutboxDeduplicationDataextension method as obsolete and redirected usage to the newMongoOutboxSettingsExtensions.TimeToKeepOutboxDeduplicationDatato align with the settings root they belong toMongoOutboxSettingsExtensionsfor disabling read fallback and configuring deduplication data retention under the outbox settingsImplementation tradeoffs
Replaced the $or filter used for lookups and updates with an explicit two-step flow: first attempt the structured _id, and only if no match is found (and fallback is enabled) attempt the legacy _id. The $or version was slightly more efficient in terms of roundtrips, but it left a small ambiguity: if both legacy and structured documents existed for the same message during migration, MongoDB could match either one, making it non-deterministic which record was updated. The explicit fallback removes that ambiguity and guarantees we always prefer the structured record, at the cost of an extra roundtrip only in the rare case where the legacy record is still present. This provides clearer semantics and safer migration behavior without meaningful performance impact.