Skip to content

Commit 4582599

Browse files
fix: Add default name property to node types defined without a properties key (#499)
* Add default prop * Update changelog * Fix mypy errors * Update user guide
1 parent bbf50ce commit 4582599

File tree

3 files changed

+61
-29
lines changed

3 files changed

+61
-29
lines changed

CHANGELOG.md

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -6,6 +6,10 @@
66

77
- MarkdownLoader (experimental): added a Markdown loader to support `.md` and `.markdown` files.
88

9+
### Fixed
10+
11+
- `NodeType`: a node type defined without a `properties` key (e.g. `{"label": "Person"}` or `{"label": "Person", "description": "..."}`) now automatically gets a default `name: STRING` property and `additional_properties=True`, preventing a `ValidationError` from the `min_length=1` constraint. This matches the existing behaviour for string input. Auto-addition is skipped when `properties` is explicitly provided (including as an empty list) or when `additional_properties` is explicitly set to `False`.
12+
913
### Changed
1014

1115
- SimpleKG pipeline (experimental): the `from_pdf` parameter is deprecated in favor of `from_file` (PDF and Markdown inputs). `from_pdf` still works but emits a deprecation warning and will be removed in a future version.

docs/source/user_guide_kg_builder.rst

Lines changed: 28 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -79,18 +79,18 @@ with, optionally, a list of their expected properties)
7979
and instructions on how to connect them (patterns).
8080
Node and relationship types can be represented
8181
as either simple strings (for their labels) or dictionaries. If using a dictionary,
82-
it must include a label key and can optionally include description and properties keys,
82+
it must include a ``label`` key and can optionally include ``description`` and ``properties`` keys,
8383
as shown below:
8484

8585
.. code:: python
8686
8787
NODE_TYPES = [
88-
# node types can be defined with a simple label...
88+
# node types can be defined with a simple label string...
8989
"Person",
90-
# ... or with a dict if more details are needed,
91-
# such as a description:
90+
# ... or with a dict for more detail such as a description.
91+
# When no properties key is provided, a default "name" property is added automatically.
9292
{"label": "House", "description": "Family the person belongs to"},
93-
# or a list of properties the LLM will try to attach to the entity:
93+
# or with an explicit list of properties the LLM will try to attach to the entity:
9494
{"label": "Planet", "properties": [{"name": "name", "type": "STRING", "required": True}, {"name": "weather", "type": "STRING"}]},
9595
]
9696
# same thing for relationships:
@@ -912,19 +912,32 @@ For improved reliability with :ref:`OpenAILLM <openaillm>` or :ref:`VertexAILLM
912912
Schema Validation and Node Properties
913913
--------------------------------------
914914

915-
**Important:** All node types must have at least one property defined. When using string shorthand for node types (e.g., ``"Person"``), a default ``"name"`` property is automatically added with ``additional_properties=True`` to allow flexible LLM extraction:
915+
All node types must have at least one property defined. When no properties are provided,
916+
a default ``name: STRING`` property is added automatically and ``additional_properties``
917+
is set to ``True`` to allow the LLM to extract additional properties freely.
918+
919+
This applies to both the **string shorthand** and the **long dict syntax** when the
920+
``properties`` key is omitted:
916921

917922
.. code:: python
918923
919-
# String shorthand - automatically gets default property
920-
NodeType("Person") # Becomes: properties=[{"name": "name", "type": "STRING"}], additional_properties=True
921-
922-
# Explicit definition - must include at least one property
923-
NodeType(
924-
label="Person",
925-
properties=[PropertyType(name="name", type="STRING")],
926-
additional_properties=True # Allow LLM to extract additional properties
927-
)
924+
# String shorthand — "name" property added automatically
925+
"Person"
926+
# equivalent to:
927+
NodeType(label="Person", properties=[PropertyType(name="name", type="STRING")], additional_properties=True)
928+
929+
# Long syntax without a properties key — same auto-addition applies
930+
{"label": "House", "description": "Family the person belongs to"}
931+
# equivalent to:
932+
NodeType(label="House", description="Family the person belongs to",
933+
properties=[PropertyType(name="name", type="STRING")], additional_properties=True)
934+
935+
Passing ``properties`` explicitly as an empty list raises a ``ValidationError``:
936+
937+
.. code:: python
938+
939+
# Raises ValidationError — empty list is not auto-filled
940+
{"label": "House", "properties": []}
928941
929942
**Relationship types** with no properties automatically set ``additional_properties=True`` to preserve LLM-extracted properties during graph construction.
930943

src/neo4j_graphrag/experimental/components/schema.py

Lines changed: 29 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -15,54 +15,52 @@
1515
from __future__ import annotations
1616

1717
import json
18-
import re
19-
20-
import neo4j
2118
import logging
19+
import re
2220
import warnings
21+
from pathlib import Path
2322
from typing import (
2423
Any,
24+
Callable,
2525
Dict,
2626
Iterator,
2727
List,
2828
Literal,
2929
Optional,
30+
Sequence,
3031
Tuple,
3132
Union,
32-
Sequence,
33-
Callable,
3433
cast,
3534
)
36-
from pathlib import Path
3735

36+
import neo4j
3837
from pydantic import (
3938
BaseModel,
39+
ConfigDict,
40+
Field,
4041
PrivateAttr,
42+
ValidationError,
4143
field_validator,
4244
model_validator,
4345
validate_call,
44-
ConfigDict,
45-
ValidationError,
46-
Field,
4746
)
4847
from typing_extensions import Self
4948

5049
from neo4j_graphrag.exceptions import (
51-
SchemaValidationError,
5250
LLMGenerationError,
5351
SchemaExtractionError,
52+
SchemaValidationError,
5453
)
5554
from neo4j_graphrag.experimental.pipeline.component import Component, DataModel
5655
from neo4j_graphrag.experimental.pipeline.types.schema import (
5756
EntityInputType,
5857
RelationInputType,
5958
)
60-
from neo4j_graphrag.generation import SchemaExtractionTemplate, PromptTemplate
59+
from neo4j_graphrag.generation import PromptTemplate, SchemaExtractionTemplate
6160
from neo4j_graphrag.llm import LLMInterface
62-
from neo4j_graphrag.types import LLMMessage
63-
from neo4j_graphrag.utils.file_handler import FileHandler, FileFormat
6461
from neo4j_graphrag.schema import get_structured_schema
65-
62+
from neo4j_graphrag.types import LLMMessage
63+
from neo4j_graphrag.utils.file_handler import FileFormat, FileHandler
6664

6765
logger = logging.getLogger(__name__)
6866

@@ -146,6 +144,23 @@ def validate_input_if_string(cls, data: EntityInputType) -> EntityInputType:
146144
# allow LLM to extract additional properties beyond the default "name"
147145
"additional_properties": True, # type: ignore[dict-item]
148146
}
147+
if isinstance(data, dict) and "properties" not in data:
148+
if data.get("additional_properties") is False: # type: ignore[comparison-overlap]
149+
return data
150+
label = data.get("label", "")
151+
logger.info(
152+
f"No properties defined for NodeType '{label}'. "
153+
f"Adding default 'name' property and additional_properties=True "
154+
f"to allow flexible property extraction."
155+
)
156+
return {
157+
**data,
158+
# added to satisfy the model validation (min_length=1 for properties of node types)
159+
"properties": [{"name": "name", "type": "STRING"}],
160+
# allow LLM to extract additional properties beyond the default "name"
161+
"additional_properties": True, # type: ignore[dict-item]
162+
}
163+
149164
return data
150165

151166
@model_validator(mode="after")

0 commit comments

Comments
 (0)