Adding a NEO4J_SAMPLE parameter to enable control of apoc.meta.schema sample size #211
+157
−13
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Introduces a --sample CLI argument and NEO4J_SAMPLE environment variable to control the sample size used in APOC schema inspection queries. This allows limiting the number of nodes scanned for schema operations, improving performance on large graphs. Includes updates to config processing, server logic, and unit tests for sample precedence and validation.
Description
Type of Change
Complexity
Complexity:
How Has This Been Tested?
NOTE
Manual tests consist of using the build python module in another project to confirm MCP server still functions, schema is loaded when sample setting is None, and when setting is set to a value. Schema appears to be reduced in size when set to a low value (eg 1), but this has not been validated against an actual large graph to determine effect of reducing the sample size.
An additional improvement would be to add full logging of resulting schema node/property counts, and possibly expose an MCP tool that helps the user test which sample size produces optimal schema with enough context to inform cypher generation without requiring long load time.
Checklist
The following requirements should have been met (depending on the changes in the branch):