Skip to content

♻️ [PANA-5123] Assign node ids in preorder when serializing#4002

Merged
sethfowler-datadog merged 4 commits intomainfrom
seth.fowler/PANA-5123-assign-node-ids-top-down-when-serializing
Dec 15, 2025
Merged

♻️ [PANA-5123] Assign node ids in preorder when serializing#4002
sethfowler-datadog merged 4 commits intomainfrom
seth.fowler/PANA-5123-assign-node-ids-top-down-when-serializing

Conversation

@sethfowler-datadog
Copy link
Contributor

@sethfowler-datadog sethfowler-datadog commented Dec 5, 2025

Motivation

The current DOM serialization algorithm serializes a node's children before assigning it an id. This results in a post-order assignment, like this:

#document  // id: 5
  <html>  // id: 4
    <body>  // id: 3
      <div></div>  // id:1
      <p></p>  // id: 2

This is a fine solution for the current data format; node ids are recorded explicitly, so we can assign them using whatever scheme we wish. However, the new format assigns node ids implicitly in the order that the nodes appear, which means that parent nodes will naturally be assigned a node id before their descendants. The new format uses a pre-order assignment, which looks like this:

#document  // id: 0
  <html>  // id: 1
    <body>  // id: 2
      <div></div>  // id:3
      <p></p>  // id: 4

This difference in node id numbering is the biggest source of complexity when comparing the output of the current DOM serialization algorithm with the output of the new one. It'd be ideal if both algorithms used the same ids; this would make translation between their representations straightforward.

The good news is that, because the current data format records node ids explicitly, there's no backwards compatibility issue with simply changing how node ids are assigned. So, we can switch the existing algorithm to use a pre-order assignment, and eliminate this impedance mismatch between the two algorithms.

Changes

This PR:

  1. Moves the code for assigning an id to a node from serializeNodeWithId() to SerializationTransaction#assignId().
  2. Updates each of the DOM node serialization functions to assign ids when each serialized node is constructed, using SerializationTransaction#assignId() to do the work.
  3. Removes serializeNodeWithId(), which now does nothing, and replaces calls to it with direct calls to serializeNode().
  4. Changes the first node id we'll assign from 1 to 0, for better alignment between the old and new algorithms.
  5. Updates the small number of tests that needed to be adjusted.

Test instructions

You can test the changes using the "Live Replay" tab of the browser SDK extension.

Checklist

  • Tested locally
  • Tested on staging
  • Added unit tests for this change.
  • Added e2e/integration tests for this change.

@sethfowler-datadog sethfowler-datadog requested a review from a team as a code owner December 5, 2025 15:41
Comment on lines +476 to 478
const expectedHost = expectNewNode({ type: NodeType.Element, tagName: 'div' })
const shadowRootNode = expectNewNode({ type: NodeType.DocumentFragment, isShadowRoot: true })
const child = expectNewNode({ type: NodeType.Element, tagName: 'span' })
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The helper function expectNewNode() assigns ids to each node in order, so we have to reverse the order of these calls to get ids that match the new approach.


expect(serializeDocumentNode(document, NodePrivacyLevel.ALLOW, transaction)).toEqual({
type: NodeType.Document,
id: 0,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was necessary to add this because serializeDocumentNode() used to return a node with no id assigned, but now it returns a node with an id. Conveniently, the id of the document node is always zero with pre-order assignment!

export * from '../types'

export { serializeNodeWithId } from '../domain/record'
export { serializeNode, serializeNode as serializeNodeWithId } from '../domain/record'
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are some known external callers of this function in Datadog-internal code. I've exported the function under both the old and new names for compatibility, but I plan to update those callers once this change ships, and then we can remove the old name.


export const enum NodeIdConstants {
FIRST_ID = 1,
FIRST_ID = 0,

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why it needs to start at 0, serialization requirement?
Is this change backwards compatible?
Are we positive we are not using 0 elsewhere as a sentinel value?

Sorry if these don't make sense, this change seems risky for someone without all the context (like me 😄)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why it needs to start at 0, serialization requirement?

It's because in the new data format, the ids can also be interpreted as array indices; it'd be awkward if they didn't start at 0.

Is this change backwards compatible? Are we positive we are not using 0 elsewhere as a sentinel value?

For the data format, yes. For code which might confuse 0 with other values like undefined (since 0 is falsy), there is definitely a potential risk. I've updated the code in both this repo and in Datadog-internal repos to handle this case correctly. (Almost everything already did, fortunately, but there were a few small issues that have been hammered out.)

Base automatically changed from seth.fowler/PANA-5105-serialize-all-dom-attribute-values-as-strings to main December 15, 2025 18:13
@sethfowler-datadog sethfowler-datadog requested a review from a team as a code owner December 15, 2025 18:13
…wn-when-serializing

* main: (26 commits)
  ♻️ [PANA-5105] Serialize all DOM attribute values as strings (#3999)
  🎨 [PANA-5053] Separate DOM and virtual attribute serialization (#3998)
  🐛 clear chain after finalize (#4027)
  🎨 [PANA-5222] Make CODEOWNERS more accurate for recording code (#4034)
  ♻️ Replace longTaskRegistry by longTaskContexts (#4013)
  👷 Bump staging to staging-51
  Add flagEvaluationEndpointBuilder to TransportConfiguration interface. (#4025)
  👷 Enable more renovate flags... (#4023)
  🐛 fix developer extension packaging (#4024)
  👷 add a script to easily create an access token (#4020)
  👷 Ensure that renovate do a full install (#4021)
  [RUM Browser Profiler] stop profiler when session expires (#4011)
  👷 Ensure to have tarballs built at install (#4019)
  👷: migrate config renovate.json (#4016)
  👷 Update dependency vite to v5.4.21 [SECURITY] (#4015)
  👷 Configure test apps dependencies  (#4014)
  v6.25.0 (#4012)
  👷 Update dependency vite to v5.4.21 [SECURITY] (#4010)
  👷 Include test apps in renovate scan (#4009)
  👷 restore canary deployment (#4008)
  ...
@cit-pr-commenter
Copy link

cit-pr-commenter bot commented Dec 15, 2025

Bundles Sizes Evolution

📦 Bundle Name Base Size Local Size 𝚫 𝚫% Status
Rum 164.33 KiB 164.33 KiB 0 B 0.00%
Rum Profiler 4.32 KiB 4.32 KiB 0 B 0.00%
Rum Recorder 19.87 KiB 20.02 KiB +146 B +0.72%
Logs 56.14 KiB 56.14 KiB 0 B 0.00%
Flagging 944 B 944 B 0 B 0.00%
Rum Slim 121.61 KiB 121.61 KiB 0 B 0.00%
Worker 23.63 KiB 23.63 KiB 0 B 0.00%
🚀 CPU Performance
Action Name Base CPU Time (ms) Local CPU Time (ms) 𝚫%
RUM - add global context 0.0053 0.0044 -16.98%
RUM - add action 0.0184 0.0158 -14.13%
RUM - add error 0.0169 0.0165 -2.37%
RUM - add timing 0.0046 0.0046 0.00%
RUM - start view 0.0056 0.0044 -21.43%
RUM - start/stop session replay recording 0.0011 0.0008 -27.27%
Logs - log message 0.021 0.0156 -25.71%
🧠 Memory Performance
Action Name Base Memory Consumption Local Memory Consumption 𝚫
RUM - add global context 25.21 KiB 25.26 KiB +58 B
RUM - add action 48.11 KiB 48.33 KiB +233 B
RUM - add timing 24.58 KiB 24.80 KiB +221 B
RUM - add error 54.10 KiB 54.64 KiB +551 B
RUM - start/stop session replay recording 23.63 KiB 24.01 KiB +388 B
RUM - start view 423.78 KiB 421.76 KiB -2.02 KiB
Logs - log message 44.06 KiB 43.70 KiB -375 B

🔗 RealWorld

@datadog-official
Copy link

✅ Tests

🎉 All green!

❄️ No new flaky tests detected
🧪 All tests passed

🎯 Code Coverage
Patch Coverage: 100.00%
Overall Coverage: 92.67% (-0.02%)

View detailed report

This comment will be updated automatically if new data arrives.
🔗 Commit SHA: cfee5f0 | Docs | Datadog PR Page | Was this helpful? Give us feedback!

@sethfowler-datadog sethfowler-datadog merged commit ae688e5 into main Dec 15, 2025
21 checks passed
@sethfowler-datadog sethfowler-datadog deleted the seth.fowler/PANA-5123-assign-node-ids-top-down-when-serializing branch December 15, 2025 22:02
@github-actions github-actions bot locked and limited conversation to collaborators Dec 15, 2025
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants