Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 4 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,10 @@

- MarkdownLoader (experimental): added a Markdown loader to support `.md` and `.markdown` files.

### Fixed

- `NodeType`: a node type defined without a `properties` key (e.g. `{"label": "Person"}` or `{"label": "Person", "description": "..."}`) now automatically gets a default `name: STRING` property and `additional_properties=True`, preventing a `ValidationError` from the `min_length=1` constraint. This matches the existing behaviour for string input. Auto-addition is skipped when `properties` is explicitly provided (including as an empty list) or when `additional_properties` is explicitly set to `False`.

### Changed

- SimpleKG pipeline (experimental): the `from_pdf` parameter is deprecated in favor of `from_file` (PDF and Markdown inputs). `from_pdf` still works but emits a deprecation warning and will be removed in a future version.
Expand Down
43 changes: 28 additions & 15 deletions docs/source/user_guide_kg_builder.rst
Original file line number Diff line number Diff line change
Expand Up @@ -79,18 +79,18 @@ with, optionally, a list of their expected properties)
and instructions on how to connect them (patterns).
Node and relationship types can be represented
as either simple strings (for their labels) or dictionaries. If using a dictionary,
it must include a label key and can optionally include description and properties keys,
it must include a ``label`` key and can optionally include ``description`` and ``properties`` keys,
as shown below:

.. code:: python

NODE_TYPES = [
# node types can be defined with a simple label...
# node types can be defined with a simple label string...
"Person",
# ... or with a dict if more details are needed,
# such as a description:
# ... or with a dict for more detail such as a description.
# When no properties key is provided, a default "name" property is added automatically.
{"label": "House", "description": "Family the person belongs to"},
# or a list of properties the LLM will try to attach to the entity:
# or with an explicit list of properties the LLM will try to attach to the entity:
{"label": "Planet", "properties": [{"name": "name", "type": "STRING", "required": True}, {"name": "weather", "type": "STRING"}]},
]
# same thing for relationships:
Expand Down Expand Up @@ -912,19 +912,32 @@ For improved reliability with :ref:`OpenAILLM <openaillm>` or :ref:`VertexAILLM
Schema Validation and Node Properties
--------------------------------------

**Important:** All node types must have at least one property defined. When using string shorthand for node types (e.g., ``"Person"``), a default ``"name"`` property is automatically added with ``additional_properties=True`` to allow flexible LLM extraction:
All node types must have at least one property defined. When no properties are provided,
a default ``name: STRING`` property is added automatically and ``additional_properties``
is set to ``True`` to allow the LLM to extract additional properties freely.

This applies to both the **string shorthand** and the **long dict syntax** when the
``properties`` key is omitted:

.. code:: python

# String shorthand - automatically gets default property
NodeType("Person") # Becomes: properties=[{"name": "name", "type": "STRING"}], additional_properties=True

# Explicit definition - must include at least one property
NodeType(
label="Person",
properties=[PropertyType(name="name", type="STRING")],
additional_properties=True # Allow LLM to extract additional properties
)
# String shorthand — "name" property added automatically
"Person"
# equivalent to:
NodeType(label="Person", properties=[PropertyType(name="name", type="STRING")], additional_properties=True)

# Long syntax without a properties key — same auto-addition applies
{"label": "House", "description": "Family the person belongs to"}
# equivalent to:
NodeType(label="House", description="Family the person belongs to",
properties=[PropertyType(name="name", type="STRING")], additional_properties=True)

Passing ``properties`` explicitly as an empty list raises a ``ValidationError``:

.. code:: python

# Raises ValidationError — empty list is not auto-filled
{"label": "House", "properties": []}

**Relationship types** with no properties automatically set ``additional_properties=True`` to preserve LLM-extracted properties during graph construction.

Expand Down
43 changes: 29 additions & 14 deletions src/neo4j_graphrag/experimental/components/schema.py
Original file line number Diff line number Diff line change
Expand Up @@ -15,54 +15,52 @@
from __future__ import annotations

import json
import re

import neo4j
import logging
import re
import warnings
from pathlib import Path
from typing import (
Any,
Callable,
Dict,
Iterator,
List,
Literal,
Optional,
Sequence,
Tuple,
Union,
Sequence,
Callable,
cast,
)
from pathlib import Path

import neo4j
from pydantic import (
BaseModel,
ConfigDict,
Field,
PrivateAttr,
ValidationError,
field_validator,
model_validator,
validate_call,
ConfigDict,
ValidationError,
Field,
)
from typing_extensions import Self

from neo4j_graphrag.exceptions import (
SchemaValidationError,
LLMGenerationError,
SchemaExtractionError,
SchemaValidationError,
)
from neo4j_graphrag.experimental.pipeline.component import Component, DataModel
from neo4j_graphrag.experimental.pipeline.types.schema import (
EntityInputType,
RelationInputType,
)
from neo4j_graphrag.generation import SchemaExtractionTemplate, PromptTemplate
from neo4j_graphrag.generation import PromptTemplate, SchemaExtractionTemplate
from neo4j_graphrag.llm import LLMInterface
from neo4j_graphrag.types import LLMMessage
from neo4j_graphrag.utils.file_handler import FileHandler, FileFormat
from neo4j_graphrag.schema import get_structured_schema

from neo4j_graphrag.types import LLMMessage
from neo4j_graphrag.utils.file_handler import FileFormat, FileHandler

logger = logging.getLogger(__name__)

Expand Down Expand Up @@ -146,6 +144,23 @@ def validate_input_if_string(cls, data: EntityInputType) -> EntityInputType:
# allow LLM to extract additional properties beyond the default "name"
"additional_properties": True, # type: ignore[dict-item]
}
if isinstance(data, dict) and "properties" not in data:
if data.get("additional_properties") is False: # type: ignore[comparison-overlap]
return data
label = data.get("label", "")
logger.info(
f"No properties defined for NodeType '{label}'. "
f"Adding default 'name' property and additional_properties=True "
f"to allow flexible property extraction."
)
return {
**data,
# added to satisfy the model validation (min_length=1 for properties of node types)
"properties": [{"name": "name", "type": "STRING"}],
# allow LLM to extract additional properties beyond the default "name"
"additional_properties": True, # type: ignore[dict-item]
}

return data

@model_validator(mode="after")
Expand Down
Loading