Improve query memory usage #365

Yogu · 2025-10-30T15:07:07Z

regression tests store AQL in files
new regression tests with "memory pitfalls" that show us where a document is loaded into memory even if it should not
(work in progress) changes in AQL generation so it's simpler and fixes a few memory issues
- avoid patterns that disable the reduce-extraction-to-projection opitimizations
- avoid subqueries
- generally simplify queries

…data

…ression-tests=

Regression tests usually don't have many or big objects, so they shouldn't take a lot of memory. If they do, the query is likely not optimal. If the value (10 MB) does not work we can also tweak it later. This can also be used by special tests that have a large "payload" field to ensure this field never makes it into query memory. Needed to refactor one query because it was one query that performed a lot of filters.

The original code incorrectly assumed that a log4js logger had its own level. Instead, changing a logger's level changes the level of the logger's category system-wide. That leads to very confusing behavior that depends on the order in which you create projects or database adapters. We're still changing global state for the trace level, but in a direct and more understandable way.

This was not buggy, but it's also unnecessarily complicated in these situations.

The two ways introduce unnecessary complexity. We also will add a feature to output AQL strings for each operation, which requires operation names. Also, some of the anonymous operations were better split into multiple named operations.

There are quite a lot of files possible per test, so this is easier to scan. Also, we will add AQL files where multiple files will be generated per test (one per operation). Also, it's now easier to copy a test by just copying the directory. access-restrictions/logistics-reader was accidentally deleted in eed1127, restored.

This has mainly two use cases - Getting a general feeling of how queries are translated into AQL to discover potential for optimizations and simplifications - Double-checking that a code change does not change AQL in a bad way, even if you don't notice it in the result with the test data

Copilot

Pull Request Overview

This PR improves query memory usage by simplifying AQL generation and storing regression test queries in files. The changes focus on optimizing query patterns to enable better database-level optimizations and avoid unnecessary memory consumption.

Key changes:

Regression tests now store AQL queries in separate .aql files for better tracking and comparison
Refactored query generation to eliminate patterns that prevent reduce-extraction-to-projection optimizations
Introduced new regression tests demonstrating memory-efficient query patterns

Reviewed Changes

Copilot reviewed 191 out of 1032 changed files in this pull request and generated 3 comments.

Show a summary per file

File	Description
spec/regression/initialization.ts	Refactored to use new InitTestDataContext class for test data initialization
spec/regression/init-test-data-context.ts	New context class encapsulating GraphQL execution logic for test initialization
spec/regression/list-limits/tests//aql/.aql	New AQL files for list limits regression tests
spec/regression/keywords/tests//aql/.aql	New AQL files for keyword escaping tests
spec/regression/collect/tests//aql/.aql	New AQL files for collection operation tests
spec/regression/access-restrictions/tests//aql/.aql	New AQL files for access restriction tests
spec/regression/access-groups/tests//aql/.aql	New AQL files for access group tests
spec/regression/collect/tests/*/result.json	Updated result files showing work-in-progress error states
spec/regression/access-restrictions/tests/accounting/test.graphql	Added query name to GraphQL test

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

spec/regression/collect/tests/relation-and-field-aggregation/result.json

spec/regression/collect/tests/field-aggregation/result.json

spec/regression/collect/tests/distinct-aggregation/result.json

…s-expected The proper formatting is done by prettier via husky, but this part is easy to fix

…ields The tests currently assert that the memory limit is exceeded. This is expected and will be fixed in an upcoming commit by a performance improvement.

This is the recommended value for Node 18: https://github.com/tsconfig/bases/blob/main/bases/node18.json It includes findLastIndex() which is useful for us

This avoids variable names changing when features like sorting and filtering are added or removed, causing changes in .aql files of the regression tests

The null checks prevent arangodb from applying the reduce-extraction-to-projection optimization.

… in ArangoDB through dangling-edge filter We currently do not trust edges - if the document they point to does not exist, we ignore the edge. In the past, this access was done via $node != null. This has the problem that ArangoDB considers it a full document access and no longer applies the reduce-extraction-to-projection optimization rule. This rule is very important to limit query memory limit usage. Removing the dangling edges filter completely would be breaking, but we can optimize them so the reduce-extraction-to-projection optimization still works.

will add some more to this commit later

Instead of collecting { root, obj } pairs in a list, we now generate the whole collect result including field selection in one subquery. WIP: InMemoryAdapter (do this last when we're sure about the query tree API)

Yogu force-pushed the refactor-traversal branch 2 times, most recently from 70c7e4b to 1b60bac Compare October 30, 2025 16:32

Yogu added 11 commits October 30, 2025 17:37

style: format regression test files with prettier

9a65c03

test: allow regression suites to have a typescript file to init test …

07e376c

…data

test: include suite name (as suite/test) in filter specified by --reg…

f59c831

…ression-tests=

test: support suite-global meta file

b9667c3

refactor: avoid using globalContext only to get the logger

8209b31

This was not buggy, but it's also unnecessarily complicated in these situations.

test: make clearing the regression test database more explicit

0384bfe

Yogu force-pushed the refactor-traversal branch 3 times, most recently from 7e03cc3 to 47468d5 Compare October 30, 2025 16:46

Yogu requested a review from Copilot October 30, 2025 16:48

Copilot AI reviewed Oct 30, 2025

View reviewed changes

Yogu force-pushed the refactor-traversal branch 6 times, most recently from fe0a03a to 0dbe80e Compare October 31, 2025 15:51

Yogu added 3 commits November 3, 2025 13:28

chore: fix husky configuration

66c9460

style: reformat regression test result files with prettier

766560d

test: insert trailing newline in files generated with --save-actual-a…

2cc271f

…s-expected The proper formatting is done by prettier via husky, but this part is easy to fix

Yogu force-pushed the refactor-traversal branch 3 times, most recently from 053beb9 to db1f075 Compare November 3, 2025 16:40

Yogu added 3 commits November 4, 2025 17:15

test: add tests for excessive memory usage in traversal and collect f…

f3e1135

…ields The tests currently assert that the memory limit is exceeded. This is expected and will be fixed in an upcoming commit by a performance improvement.

chore: upgrade lib to ES2023

7951835

This is the recommended value for Node 18: https://github.com/tsconfig/bases/blob/main/bases/node18.json It includes findLastIndex() which is useful for us

refactor: consistently use decapitalized type names for variables

4a6419d

This avoids variable names changing when features like sorting and filtering are added or removed, causing changes in .aql files of the regression tests

Yogu force-pushed the refactor-traversal branch 6 times, most recently from 81c67b6 to 707cfd0 Compare November 6, 2025 17:20

Yogu added 3 commits November 6, 2025 18:20

chore: update mocha

92df6ec

perf: skip null check for parent and root fields

f715cb2

The null checks prevent arangodb from applying the reduce-extraction-to-projection optimization.

Yogu force-pushed the refactor-traversal branch from 707cfd0 to 519a613 Compare November 6, 2025 17:21

Yogu added 2 commits November 7, 2025 17:28

wip: test: add more test for traversals

59c7ab7

will add some more to this commit later

wip: perf: improve query memory usage of collect fields

ad8f739

Instead of collecting { root, obj } pairs in a list, we now generate the whole collect result including field selection in one subquery. WIP: InMemoryAdapter (do this last when we're sure about the query tree API)

Yogu force-pushed the refactor-traversal branch from 519a613 to ad8f739 Compare November 7, 2025 16:30

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Improve query memory usage #365

Improve query memory usage #365

Uh oh!

Yogu commented Oct 30, 2025

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Improve query memory usage #365

Are you sure you want to change the base?

Improve query memory usage #365

Uh oh!

Conversation

Yogu commented Oct 30, 2025

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants