-
Notifications
You must be signed in to change notification settings - Fork 16
Improve query memory usage #365
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Yogu
commented
Oct 30, 2025
- regression tests store AQL in files
- new regression tests with "memory pitfalls" that show us where a document is loaded into memory even if it should not
- (work in progress) changes in AQL generation so it's simpler and fixes a few memory issues
- avoid patterns that disable the reduce-extraction-to-projection opitimizations
- avoid subqueries
- generally simplify queries
70c7e4b to
1b60bac
Compare
Regression tests usually don't have many or big objects, so they shouldn't take a lot of memory. If they do, the query is likely not optimal. If the value (10 MB) does not work we can also tweak it later. This can also be used by special tests that have a large "payload" field to ensure this field never makes it into query memory. Needed to refactor one query because it was one query that performed a lot of filters.
The original code incorrectly assumed that a log4js logger had its own level. Instead, changing a logger's level changes the level of the logger's category system-wide. That leads to very confusing behavior that depends on the order in which you create projects or database adapters. We're still changing global state for the trace level, but in a direct and more understandable way.
This was not buggy, but it's also unnecessarily complicated in these situations.
The two ways introduce unnecessary complexity. We also will add a feature to output AQL strings for each operation, which requires operation names. Also, some of the anonymous operations were better split into multiple named operations.
There are quite a lot of files possible per test, so this is easier to scan. Also, we will add AQL files where multiple files will be generated per test (one per operation). Also, it's now easier to copy a test by just copying the directory. access-restrictions/logistics-reader was accidentally deleted in eed1127, restored.
This has mainly two use cases - Getting a general feeling of how queries are translated into AQL to discover potential for optimizations and simplifications - Double-checking that a code change does not change AQL in a bad way, even if you don't notice it in the result with the test data
7e03cc3 to
47468d5
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR improves query memory usage by simplifying AQL generation and storing regression test queries in files. The changes focus on optimizing query patterns to enable better database-level optimizations and avoid unnecessary memory consumption.
Key changes:
- Regression tests now store AQL queries in separate
.aqlfiles for better tracking and comparison - Refactored query generation to eliminate patterns that prevent reduce-extraction-to-projection optimizations
- Introduced new regression tests demonstrating memory-efficient query patterns
Reviewed Changes
Copilot reviewed 191 out of 1032 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
| spec/regression/initialization.ts | Refactored to use new InitTestDataContext class for test data initialization |
| spec/regression/init-test-data-context.ts | New context class encapsulating GraphQL execution logic for test initialization |
| spec/regression/list-limits/tests//aql/.aql | New AQL files for list limits regression tests |
| spec/regression/keywords/tests//aql/.aql | New AQL files for keyword escaping tests |
| spec/regression/collect/tests//aql/.aql | New AQL files for collection operation tests |
| spec/regression/access-restrictions/tests//aql/.aql | New AQL files for access restriction tests |
| spec/regression/access-groups/tests//aql/.aql | New AQL files for access group tests |
| spec/regression/collect/tests/*/result.json | Updated result files showing work-in-progress error states |
| spec/regression/access-restrictions/tests/accounting/test.graphql | Added query name to GraphQL test |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
spec/regression/collect/tests/relation-and-field-aggregation/result.json
Outdated
Show resolved
Hide resolved
fe0a03a to
0dbe80e
Compare
…s-expected The proper formatting is done by prettier via husky, but this part is easy to fix
053beb9 to
db1f075
Compare
…ields The tests currently assert that the memory limit is exceeded. This is expected and will be fixed in an upcoming commit by a performance improvement.
This is the recommended value for Node 18: https://github.com/tsconfig/bases/blob/main/bases/node18.json It includes findLastIndex() which is useful for us
This avoids variable names changing when features like sorting and filtering are added or removed, causing changes in .aql files of the regression tests
81c67b6 to
707cfd0
Compare
The null checks prevent arangodb from applying the reduce-extraction-to-projection optimization.
… in ArangoDB through dangling-edge filter We currently do not trust edges - if the document they point to does not exist, we ignore the edge. In the past, this access was done via $node != null. This has the problem that ArangoDB considers it a full document access and no longer applies the reduce-extraction-to-projection optimization rule. This rule is very important to limit query memory limit usage. Removing the dangling edges filter completely would be breaking, but we can optimize them so the reduce-extraction-to-projection optimization still works.
707cfd0 to
519a613
Compare
will add some more to this commit later
Instead of collecting { root, obj } pairs in a list, we now generate the whole collect result including field selection in one subquery.
WIP: InMemoryAdapter (do this last when we're sure about the query tree API)
519a613 to
ad8f739
Compare