Skip to content

Conversation

@agarwal-navin
Copy link
Contributor

@agarwal-navin agarwal-navin commented Jan 28, 2026

#26099 added a data corruption error "Found two messages with non-increasing clientSequenceNumber for a given client".
The error logs don't include clientSequenceNumber which will be useful in debugging. This PR adds it to the erorr logs for both the messages.

Also, added a create function to DataCorruptionError similar to DataProcessingError and used it here which logs important properties from the message that causes the corruption.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR enhances error logging for data corruption detection by adding clientSequenceNumber to the message payload string used in error reports. This additional information will help debug issues where clientSequenceNumbers fail to increase properly for a given client.

Changes:

  • Added clientSequenceNumber field to the comparableMessagePayload method output, which is used when logging data corruption errors

@anthony-murphy
Copy link
Contributor

just a note, but it doesn't seem like this data corruption error use the factories or the extract safe message properties function from telemetry. this could create problems for partners that depend on things like message timestamp to dedupe.

@markfields
Copy link
Member

Let's double-check that this won't interfere with this logic:

image

@markfields
Copy link
Member

markfields commented Jan 28, 2026

Let's double-check that this won't interfere with this logic:

<img>

So, I wonder if we just want to use that logic, rather than the new check which you recently added. i.e. revert your previous PR and just do this one?

Chatted offline, the logic is different. That one is about two different ops getting the same sequence number. This one is about a client's single op being sequenced twice (different sequence numbers)

@agarwal-navin
Copy link
Contributor Author

Let's double-check that this won't interfere with this logic:

image

I reverted the logic of comparableMessagePayload to what it was before adding this new error.

@agarwal-navin
Copy link
Contributor Author

just a note, but it doesn't seem like this data corruption error use the factories or the extract safe message properties function from telemetry. this could create problems for partners that depend on things like message timestamp to dedupe.

Good point. I added a factory for DataCorruptionError (shared code with DataProcessingError) and used it here.

// Note: It's possible for a duplicate op to be broadcasted and have everything the same except the timestamp.
private comparableMessagePayload(m: ISequencedDocumentMessage): string {
return `${m.clientId}-${m.type}-${m.sequenceNumber}-${m.minimumSequenceNumber}-${m.referenceSequenceNumber}-${m.timestamp}`;
return `${m.clientId}-${m.type}-${m.minimumSequenceNumber}-${m.referenceSequenceNumber}-${m.timestamp}`;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is reverted back to the state before #26099.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants