Skip to content

Conversation

enisdenjo
Copy link
Member

@enisdenjo enisdenjo commented Aug 2, 2025

Ref GW-276

Perform 5 runs of load testing on each memtest; take a heap snapshot at the end of each "calmdown" phase1; compare every heap snapshot consecutively and find leaks by find all objects that grew in size in every2 snapshot.

Because of increased accuracy, there is no need to retry tests as there are not flakey. Also, the tests can run for a shorter time since we're looking for leaks by comparing heap snapshots instead of detecting high frames from allocation sampling. Tests now run for at least half, sometimes third, of the time.

Background

We were using heap allocation sampling profiles that can guess leaks. It does so because it measures how much memory a certain frame allocates over time. If that frame, in total, allocates and frees memory - it wont show up in the test results, but if it allocates and does not free (enough) memory it will show up.

This approach works, but is not very accurate, because there can be frames that are under constant pressure during load-testing that do allocate a lot, but eventually (slower than the others) all of their memory gets freed.

A much more accurate approach would be heap snapshot comparison, looking at which classes/constructors have been growing in allocation size over the compared snapshots.

However, there's not much out there in the ecosystem helping with parsing or comparing heap snapshots.

  • memlab from Meta is centred around the browser and CLI usage, their API is fidgety and lacking - and their parsing+diffing algorithm is incredibly slow.
  • @vscode/v8-heap-parser is a faster in parsing, but is giving inaccurate results that are different from results than what you would see in Chrome DevTools; even so, it's not meant to be used for comparing snapshots and doing manual comparison in Node is giga slow (since a class under pressure can have 50k+ nodes that all have to be compared to calculate freed/allocated memory).

The only option at the moment is to use Chrome DevTools heap snapshot worker which is fast (leveraging workers), gives accurate results and has primitives for efficiently comparing snapshots. It is tightly coupled for the browser and Chrome machinery. Meaning, it needs to be forked, analysed how the heap snapshot worker is used (DevTools have no programmatic documentation, just code) and finally stripped of its browser requirements implementing it for Node (and other JS) environments - which has been done in this PR.

TODO

  • Upload heapsnapshots as artifacts when tests fail for investigation
  • Compress heap snapshots for smaller git size

Footnotes

  1. After each load testing, there is a "calmdown" phase which triggers garbage collection (through the DevTools inspector protocol) and waits for a while to let the GC do it thing.

  2. If an object grew in one heap snapshot, but not another - we do not consider it a leak.

@enisdenjo enisdenjo changed the title Heapsnap Heap snapshot diffs for accurate memtest leak detection Aug 2, 2025
@theguild-bot
Copy link
Collaborator

theguild-bot commented Aug 2, 2025

🚀 Snapshot Release (alpha)

The latest changes of this PR are available as alpha on npm (based on the declared changesets):

Package Version Info
@graphql-hive/gateway 1.16.3-alpha-b036abed553a3f749bb6b76a78885feaca2ce77a npm ↗︎ unpkg ↗︎
@graphql-hive/nestjs 1.0.22-alpha-b036abed553a3f749bb6b76a78885feaca2ce77a npm ↗︎ unpkg ↗︎
@graphql-hive/plugin-aws-sigv4 1.0.18-alpha-b036abed553a3f749bb6b76a78885feaca2ce77a npm ↗︎ unpkg ↗︎

@theguild-bot
Copy link
Collaborator

theguild-bot commented Aug 2, 2025

🚀 Snapshot Release (Binary for macOS-ARM64)

The latest changes of this PR are available for download (based on the declared changesets).

Download

@theguild-bot
Copy link
Collaborator

theguild-bot commented Aug 2, 2025

🚀 Snapshot Release (Binary for Linux-ARM64)

The latest changes of this PR are available for download (based on the declared changesets).

Download

@theguild-bot
Copy link
Collaborator

theguild-bot commented Aug 2, 2025

🚀 Snapshot Release (Binary for Linux-X64)

The latest changes of this PR are available for download (based on the declared changesets).

Download

@theguild-bot
Copy link
Collaborator

theguild-bot commented Aug 2, 2025

🚀 Snapshot Release (Binary for macOS-X64)

The latest changes of this PR are available for download (based on the declared changesets).

Download

@theguild-bot
Copy link
Collaborator

theguild-bot commented Aug 2, 2025

🚀 Snapshot Release (Bun Docker Image)

The latest changes of this PR are available as image on GitHub Container Registry (based on the declared changesets):

ghcr.io/graphql-hive/gateway:1.16.3-alpha-b036abed553a3f749bb6b76a78885feaca2ce77a-bun

@theguild-bot
Copy link
Collaborator

theguild-bot commented Aug 2, 2025

🚀 Snapshot Release (Node Docker Image)

The latest changes of this PR are available as image on GitHub Container Registry (based on the declared changesets):

ghcr.io/graphql-hive/gateway:1.16.3-alpha-b036abed553a3f749bb6b76a78885feaca2ce77a

@theguild-bot
Copy link
Collaborator

theguild-bot commented Aug 2, 2025

🚀 Snapshot Release (Binary for Windows-X64)

The latest changes of this PR are available for download (based on the declared changesets).

Download

@Urigo
Copy link

Urigo commented Aug 2, 2025

Wow

@enisdenjo enisdenjo marked this pull request as ready for review August 2, 2025 23:30
@Copilot Copilot AI review requested due to automatic review settings August 2, 2025 23:30
Copy link
Contributor

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR replaces the existing heap allocation sampling approach with a more accurate heap snapshot diffing method for memory leak detection in memtests. Instead of relying on allocation profiles that can produce false positives, the system now takes heap snapshots at the end of each test run and compares them to identify objects that consistently grow across all snapshots.

  • Implements heap snapshot parsing and diffing using a forked Chrome DevTools heap snapshot worker
  • Changes default memtest behavior to use heap snapshots instead of heap sampling profiles
  • Updates memtest configuration with new defaults and flags for better control

Reviewed Changes

Copilot reviewed 31 out of 55 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
internal/perf/src/memtest.ts Core memtest logic updated to use heap snapshot diffing instead of heap sampling analysis
internal/perf/src/loadtest.ts Modified to support both heap snapshots and sampling with improved file naming
internal/perf/src/heapsnapshot.ts New module for heap snapshot diffing and leak detection logic
internal/perf/src/heapsampling.ts Cleaned up heap sampling code, removed unused heap snapshot analysis
internal/heapsnapshot/ Complete new package containing forked Chrome DevTools heap snapshot worker
internal/perf/tests/heapsnapshot.test.ts Test suite for heap snapshot leak detection functionality
Comments suppressed due to low confidence (1)

internal/perf/src/loadtest.ts:284

  • [nitpick] The variable name 'filenameSafeStartTime' is misleading since it's not the start time itself but a formatted version of it. Consider renaming to 'formattedStartTime' or 'filenameSafeTimestamp'.
  const filenameSafeStartTime = startTime

@@ -1,6 +1,7 @@
node_modules/
.DS_Store
dist/
!/internal/heapsnapshot/dist/
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps indicate here why it's commited ?

Copy link
Collaborator

@EmrysMyrddin EmrysMyrddin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks amazing ! Great work !

@enisdenjo enisdenjo force-pushed the heapsnap branch 2 times, most recently from ab101c1 to 5e052af Compare August 18, 2025 13:02
@enisdenjo enisdenjo enabled auto-merge (squash) August 18, 2025 13:06
@enisdenjo enisdenjo merged commit d7a9de9 into main Aug 18, 2025
62 checks passed
@enisdenjo enisdenjo deleted the heapsnap branch August 18, 2025 13:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants