⚗ [RUMF-902] implement a new mutation observer #810
Conversation
159bf2e to
11779f9
Compare
Codecov Report
@@ Coverage Diff @@
## master #810 +/- ##
==========================================
+ Coverage 86.24% 86.81% +0.57%
==========================================
Files 78 80 +2
Lines 3737 3869 +132
Branches 848 879 +31
==========================================
+ Hits 3223 3359 +136
+ Misses 514 510 -4
Continue to review full report at Codecov.
|
This commit introduces a few utility functions to store and retrieve serialized node ids stored in DOM nodes. The equivalent tools (`mirror`, `INode`...) that are only used by the 'mutation' module are intentionally left as-is, and will be removed in a future PR when we'll enable the new mutation observer. Before this commit, some 'observer' strategies didn't check if `mirror.getId` returned a valid `id` before emitting a record. In practice it shouldn't happen, but in unit tests it did happen because we artificially emitted 'input' events on a DOM node that wasn't in the document. This is why some tests were slightly adjusted to reflect the reality more accurately.
11779f9 to
3863583
Compare
packages/rum-recorder/src/domain/rrweb/mutationObserver.spec.ts
Outdated
Show resolved
Hide resolved
83cdea4 to
2630513
Compare
Co-authored-by: Bastien Caudan <bastien.caudan@datadoghq.com>
2630513 to
6a232ed
Compare
We chose to remove this test because its intent is a bit obscure and the new mutation observer has a good enough test coverage. I kept the test for the old mutation observer implementation since its coverage is still lacking. It will be removed when we enable the new mutation observer.
1404e07 to
581bc9d
Compare
Co-authored-by: Bastien Caudan <bastien.caudan@datadoghq.com>
vlad-mh
left a comment
There was a problem hiding this comment.
nice, super exciting perf wins! let's test it out
some nits, but mostly ideas for further improvements
| document.contains(mutation.target) && | ||
| hasSerializedNode(mutation.target) && | ||
| !nodeOrAncestorsIsIgnored(mutation.target) && | ||
| !nodeOrAncestorsShouldBeHidden(mutation.target) |
There was a problem hiding this comment.
some optimization thoughts, probably for a later PR:
- both
nodeOrAncestorsIsIgnoredandnodeOrAncestorsShouldBeHiddenwalk the parent hierarchy ofmutation.target. Ultimately, we should do it only once. document.containsby definition must be O(N), N being number of dom elements? If we walk the parent hierarchy ofmutation.targetanyway, we might as well just check the oldest parent isdocument?
There was a problem hiding this comment.
Yes I agree, this is still something I have in mind: optimizing this loop or using the 'cache' optimization could still be usefull here. In practice, I observed that this whole filter loop takes less than 1ms even when hundreds of mutations occurs.
document.contains is fast. It is not O(N), and the browser probably uses internal tricks to avoid walking on all parents. See for yourself, run this in a busy page:
{ const list = document.querySelectorAll('*'); const start = performance.now(); list.forEach(e => { while (e) { e = e.parentNode }; }); console.log(performance.now() - start) }
{ const list = document.querySelectorAll('*'); const start = performance.now(); list.forEach(e => document.contains(e)); console.log(performance.now() - start) }| if (!addedAndMovedNodes.has(node)) { | ||
| removedNodes.set(node, mutation.target) | ||
| } | ||
| addedAndMovedNodes.delete(node) |
There was a problem hiding this comment.
I'm nitpicking at this point, so feel free to ignore but if you processed mutation.removedNodes prior to mutation.addedNodes I think you would, in your second loop, only end up doing one lookup in the set rather than two?
in my mind:
forEach(mutation.removedNodes, (node) => {
removedNodes.set(node, mutation.target)
}
forEach(mutation.addedNodes, (node) => {
if (!removedNodes.has(node)) {
addedAndMovedNodes.set(node, mutation.target)
}
})
There was a problem hiding this comment.
A node cannot be both removed and added in a single mutation. Then, the process order doen't matter.
|
|
||
| // Deduplicate mutations based on their target node | ||
| const handledNodes = new Set<Node>() | ||
| const filteredMutations = mutations.filter((mutation) => { |
There was a problem hiding this comment.
can we live with just 1 loop here? 🤔
const textMutations: TextMutation[] = []
const handledNodes = new Set<Node>()
for (n in mutations) {
if (handledNodes.has(mutation.target)) {
continue;
}
handledNodes.add(mutation.target)
const value = mutation.target.textContent
if (value === mutation.oldValue) {
continue
}
textMutations.push({
id: getSerializedNodeId(mutation.target),
value,
})
There was a problem hiding this comment.
This was my first version for both processCharacterDataMutations and processAttributesMutations. But, for attributes, the loop got a bit hairy... so I split the loop into "deduplication" and "emission". To have a consistent pattern between the two functions, I applied the same structure to character data mutations. I find it a bit easier to read, and it shouldn't matter performance wise.
Motivation
The current mutation observer logic is complicated and inefficient. Let's implement a new mutation observer.
Changes
Testing
Manual, unit/e2e
Perf impact:
Before:
SDK time: about 1900ms
After:
SDK time: about 560ms
I have gone over the contributing documentation.