You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Adding a NEO4J_SCHEMA_SAMPLE_SIZE parameter to enable control of apoc.meta.schema sample size (#211)
* Add configurable sample size for schema operations
Introduces a --sample CLI argument and NEO4J_SAMPLE environment variable to control the sample size used in APOC schema inspection queries. This allows limiting the number of nodes scanned for schema operations, improving performance on large graphs. Includes updates to config processing, server logic, and unit tests for sample precedence and validation.
* Updating readme to add in sample parameter
Updating readme docs for sampling parameter
* replace NEO4J_SAMPLE with NEO4J_SCHEMA_SAMPLE_SIZE
* update config to use schema_sample_size key
* update `get_neo4j_schema` tool and changelog
* fix server constructor fn, update get schema docstring, test with claude
* update default values in utils, update docstring from testing with claude
* add Field to sample_size
* update get_schema docstring
---------
Co-authored-by: alex <[email protected]>
Copy file name to clipboardExpand all lines: servers/mcp-neo4j-cypher/README.md
+60-1Lines changed: 60 additions & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -44,8 +44,10 @@ The server offers these core tools:
44
44
45
45
-`get_neo4j_schema`
46
46
- Get a list of all nodes types in the graph database, their attributes with name, type and relationships to other node types
47
-
- No input required
47
+
- Input:
48
+
-`sample_param` (integer, optional): Number of nodes to sample for schema analysis. Overrides server default if provided.
48
49
- Returns: JSON serialized list of node labels with two dictionaries: one for attributes and one for relationships
50
+
-**Performance**: Uses sampling by default (1000 nodes per label). Reduce number for faster analysis on large databases. To stop sampling, set to -1.
49
51
50
52
### 🏷️ Namespacing
51
53
@@ -105,6 +107,62 @@ When a response exceeds the token limit, it will be automatically truncated to f
105
107
106
108
**Note**: Token limits only apply to `read_neo4j_cypher` responses. Schema queries and write operations return summary information and are not affected.
107
109
110
+
#### 🔍 Schema Sampling
111
+
112
+
Control the performance and scope of schema inspection with the `sample` parameter for the `get_neo4j_schema` tool:
113
+
114
+
**Command Line:**
115
+
```bash
116
+
mcp-neo4j-cypher --sample 1000 # Sample 1000 nodes per label
117
+
```
118
+
119
+
**Environment Variable:**
120
+
```bash
121
+
export NEO4J_SCHEMA_SAMPLE_SIZE=1000
122
+
```
123
+
124
+
**Docker:**
125
+
```bash
126
+
docker run -e NEO4J_SCHEMA_SAMPLE_SIZE=1000 mcp-neo4j-cypher:latest
127
+
```
128
+
129
+
The `sample` parameter controls how many nodes are examined when generating the database schema:
130
+
131
+
-**Default**: `1000` nodes per label are sampled for schema analysis
132
+
-**Performance**: Lower values (`100`, `500`) provide faster schema inspection on large databases
133
+
-**Accuracy**: Higher values (`5000`, `10000`) provide more comprehensive schema coverage
134
+
-**Full Scan**: Set to `-1` to examine all nodes (can be very slow on large databases)
135
+
-**Per-Call Override**: The `get_neo4j_schema` tool accepts a `sample_param` parameter to override the server default
List all nodes, their attributes and their relationships to other nodes in the neo4j database.
68
-
This requires that the APOC plugin is installed and enabled.
66
+
asyncdefget_neo4j_schema(sample_size: int=Field(default=config_sample_size, description="The sample size used to infer the graph schema. Larger samples are slower, but more accurate. Smaller samples are faster, but might miss information.")) ->list[ToolResult]:
69
67
"""
68
+
Returns nodes, their properties (with types and indexed flags), and relationships
69
+
using APOC's schema inspection.
70
+
71
+
You should only provide a `sample_size` value if requested by the user, or tuning the retrieval performance.
70
72
71
-
get_schema_query="""
72
-
CALL apoc.meta.schema();
73
+
Performance Notes:
74
+
- If `sample_size` is not provided, uses the server's default sample setting defined in the server configuration.
75
+
- If retrieving the schema times out, try lowering the sample size, e.g. `sample_size=100`.
76
+
- To sample the entire graph use `sample_size=-1`.
73
77
"""
74
78
79
+
# Use provided sample_size, otherwise fall back to server default - 1000
0 commit comments