Skip to content

Conversation

@danielmarbach
Copy link
Contributor

@danielmarbach danielmarbach commented Aug 26, 2025

Fixes #767

This PR introduces a separation of the outbox records with a partition key that defaults to the endpoint name. The chosen approach is to use a structured ID that encodes the partition key into the document’s _id. This ensures that deduplication happens per endpoint and opens the door for future scenarios where the outbox collection may need to be sharded.

By making the partition key an explicit part of the identity { pk, mid }, we avoid the need for additional unique indexes (reducing storage and operational overhead) while still enforcing uniqueness at the correct granularity. The structured ID keeps reads and writes efficient by targeting _id, allows direct queries and indexes on subfields (e.g. _id.pk or _id.mid), and preserves flexibility in choosing shard keys later on. Compared to concatenated string IDs or separate compound indexes, this model is both more transparent and more naturally aligned with MongoDB’s compound key design.

Api Changes

  • Marked the old TimeToKeepOutboxDeduplicationData extension method as obsolete and redirected usage to the new MongoOutboxSettingsExtensions.TimeToKeepOutboxDeduplicationData to align with the settings root they belong to
  • Added new extension methods to MongoOutboxSettingsExtensions for disabling read fallback and configuring deduplication data retention under the outbox settings

Implementation tradeoffs

Replaced the $or filter used for lookups and updates with an explicit two-step flow: first attempt the structured _id, and only if no match is found (and fallback is enabled) attempt the legacy _id. The $or version was slightly more efficient in terms of roundtrips, but it left a small ambiguity: if both legacy and structured documents existed for the same message during migration, MongoDB could match either one, making it non-deterministic which record was updated. The explicit fallback removes that ambiguity and guarantees we always prefer the structured record, at the cost of an extra roundtrip only in the rare case where the legacy record is still present. This provides clearer semantics and safer migration behavior without meaningful performance impact.

using NUnit.Framework;

// TODO Can we also move this test to Core?
public class When_subscribers_handles_the_same_event : NServiceBusAcceptanceTest
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@andreasohlund I took over the test you wrote. I think this should be moved into Core ideally. Thoughts?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed

settings.Set(transactionalSessionOptions);
settings.EnableFeatureByDefault<MongoTransactionalSession>();

if (!string.IsNullOrEmpty(transactionalSessionOptions.ProcessorEndpoint))
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@andreasohlund @cquirosj @poornimanayar FYI I'm adopting a similar pattern to the other persisters, which requires InternalsVisibleTo but that is standard practice with the transactional session integration. Just pinging you here in case you can think of anything I might have missed.

/// <summary>
/// Configures the amount of time to keep outbox deduplication data.
/// </summary>
[ObsoleteMetadata(
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note to myself: When releasing, these need to be moved down

class OutboxPersister : IOutboxStorage
{
public OutboxPersister(IMongoClient client, string databaseName, MongoDatabaseSettings databaseSettings, Func<Type, string> collectionNamingConvention, MongoCollectionSettings collectionSettings)
public OutboxPersister(IMongoClient client, string partitionKey, bool readFallbackEnabled, string databaseName, MongoDatabaseSettings databaseSettings, Func<Type, string> collectionNamingConvention, MongoCollectionSettings collectionSettings)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The constructor gets bigger and bigger. At some point it might make sense to clean this up a bit, but I wanted to stay focused on the task at hand for now.

{
UsesDefaultClassMap = usesDefaultClassMap,
TimeToKeepDeduplicationData = context.Settings.GetTimeToKeepOutboxDeduplicationData(),
#pragma warning disable IDE0037
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This keeps the left side consistent and allows us to rename the internal property.


var outboxRecordId = new OutboxRecordId { MessageId = messageId, PartitionKey = partitionKey };

var equalityPredicateWithFallback = CreateOutboxRecordFilterPredicate(outboxRecordId);
Copy link
Contributor Author

@danielmarbach danielmarbach Aug 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When the fallback is enabled, it is theoretically possible under a GUID collision (highly unlikely) or during a mixed migration that an entry is stored under the same message ID once in the legacy format and once in the new format. The only way to end up with both a legacy (string _id) and a structured (document _id) record for the same message is during a mixed-version migration. In that rare case, SetAsDispatched might update the legacy record instead of the structured one because the UpdateOneAsync with the OR is non-deterministic. In a regular outbox interaction, we always do a Get first to retrieve the deduplication record. If that is the case, the SingleOrDefault would throw, which safeguards us from doing the wrong thing.

In the transactional session cases, we currently do not support setting the "message id" on the outbox record so that part is not affected at this stage (even though we are not doing a Get there).

The alternative would be to get tighter control over the flow and do explicit roundtrips with fallbacks. This would remove the roundtrip optimization but might lead to more correct flows under any circumstances. I'm leaning towards this approach for now.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See b07e368

@danielmarbach danielmarbach changed the title Outbox record seperation Separate outbox records by endpoint name to enable properly sharing the same collection between endpoints Aug 27, 2025
@danielmarbach danielmarbach changed the title Separate outbox records by endpoint name to enable properly sharing the same collection between endpoints Seperate outbox records by endpoint name to enable properly sharing the same collection between endpoints Aug 27, 2025
@danielmarbach danielmarbach changed the title Seperate outbox records by endpoint name to enable properly sharing the same collection between endpoints Separate outbox records by endpoint name to enable properly sharing the same collection between endpoints Aug 27, 2025
@danielmarbach danielmarbach force-pushed the outbox-record-seperation branch from de9fbe7 to 16bbb42 Compare August 27, 2025 10:32
@danielmarbach danielmarbach requested a review from afprtclr August 27, 2025 10:35
@danielmarbach danielmarbach marked this pull request as ready for review August 27, 2025 10:35

public bool ReadFallbackEnabled { get; set; } = true;

public string PartitionKey { get; set; } = null!; // will be set by defaults

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if we should call it ParitionKey. It's leaking internal implementation. Not even sure if this should be exposed at all.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree the name should be questioned. Would you prefer to name it what it represents currently which is EndpointName?

Regarding the "not sure it should be exposed at all". The current thinking of mine was that it is not exposed to the user but exposed as a configuration option so that there is a single path on which the configuration options flow into the persister. This helps bridge the settings over to a simple configuration class and pass that into the persister where needed. At one point I was thinking we could even streamline all the configuration that is currently passed into the outbox persister including the database settings with defaults and only pass in this configuration object as sort of a "creation option" into the type.

This would allow addressing this comment too https://github.com/Particular/NServiceBus.Storage.MongoDB/pull/799/files/16bbb427918ec8675550884a307eb28be54af7df#r2301681637 but I was hesitant to jump the gun.

Are you thinking along the lines that this class should only represent the properties that are exposed to the users and all the "internal stuff" should not be added here to clearly separate those configuration paths?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you thinking along the lines that this class should only represent the properties that are exposed to the users and all the "internal stuff" should not be added here to clearly separate those configuration paths?

Yep, that was my thought.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it. I think that is a valid angle. Will change that in the next update

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Been thinking about this a bit more and concluded it is not worth changing. The OutboxPersistenceConfiguration PartitionKey currently is either by default the endpoint name or in the case of the transactional session the remote processor endpoint name. This is what we currently use and therefore it presents a configuration option (even when only used "internally". Given that the OutboxRecordId also encodes the partition code as "pk" (partition key) and not "ep" (endpoint name) we seem to be using consistently the same "language".

I'd be more than happy to rediscuss the entire "verbiage" here in case we are no longer convinced partition key is the right term in general.

@danielmarbach danielmarbach force-pushed the outbox-record-seperation branch from a5814ad to 379c950 Compare September 1, 2025 07:06
@danielmarbach danielmarbach merged commit 72d1e47 into master Sep 1, 2025
4 checks passed
@danielmarbach danielmarbach deleted the outbox-record-seperation branch September 1, 2025 14:15
@danielmarbach danielmarbach added this to the 6.0.0 milestone Sep 1, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Endpoints subscribed to the same event will experience message loss if sharing the same database

4 participants