- MarkdownLoader (experimental): added a Markdown loader to support
.mdand.markdownfiles.
- SimpleKG pipeline (experimental): the
from_pdfparameter is deprecated in favor offrom_file(PDF and Markdown inputs).from_pdfstill works but emits a deprecation warning and will be removed in a future version. - Data loaders (experimental): the
PdfDocumenttype name is deprecated in favor ofLoadedDocument;PdfDocumentremains available as a backward-compatible alias with a deprecation warning.
NodeTypeandRelationshipTypenow reject labels and types that start or end with double underscores (__), e.g.__Person__. This convention is reserved for internal Neo4j GraphRAG labels. AValidationErroris raised on construction.- SimpleKG pipeline (experimental): Markdown inputs (
.md/.markdown) are supported alongside PDF via the default extension-based file loader when building from a file path.
SchemaExtractionTemplateprompt updated to explicitly instruct the LLM not to use__as a prefix or suffix in node labels or relationship types.
- Fixed
ValueErrorinNeo4jGraphParquetFormatterwhen nodes of the same label have mixed property types (e.g.strandintfor the same property), which causedpa.Table.from_pylist()to fail. Mixed-type columns are now coerced to a consistent type before Parquet table creation. - Fixed a bug where the rate limit handler was not being called on the
VertexAILLMandMistralAILLM__invoke_v2and__ainvoke_v2methods.
- Parquet export (experimental):
ParquetWriter(extendsKGWriter),Neo4jGraphParquetFormatter, andFilenameCollisionHandlerfor writing knowledge graphs to Parquet (one file per node label and per relationship type).
- Updated examples, default values, and documentation to use
gpt-4.1/gpt-4.1-miniinstead of deprecated GPT-4* models (e.g.gpt-4o,gpt-4). - Breaking:
SimpleKGPipelinenow automatically enables structured output when theLLMInterfacesupports structured output (so far,OpenAILLM,VertexAILLM). This takes precedence over anyresponse_formatconfigured inmodel_params(e.g.,{"type": "json_object"}), which will be ignored.
- Fixed invalid lexical graph relationships causing "Relationship references unknown start node" errors during parquet import when nodes are pruned.
- Make rate limit handler open to which exceptions it can retry on
- Fix the initialization of the vertexai LLM class
- Support for structured output in
OpenAILLMandVertexAILLMviaresponse_formatparameter. Accepts Pydantic models (requiresConfigDict(extra="forbid")) or JSON schemas. - Added
use_structured_outputparameter toLLMEntityRelationExtractorfor improved entity extraction reliability with OpenAI/VertexAI LLMs. - Added
use_structured_outputparameter toSchemaFromTextExtractorfor improved schema generation with OpenAI/VertexAI LLMs. EnforcesGraphSchemastructure via Pydantic model validation and includes automatic cleanup of invalid patterns/constraints. - Added
supports_structured_outputcapability flag toLLMInterfacefor forward-compatible detection of structured output support across LLM implementations. - Support for async embeddings in the
Embedderbase class and implementation forOllamaEmbedding.
- Breaking: made
Neo4jNode,Neo4jRelationship, andNeo4jGraphstricter: properties field now uses typedPropertyValue(Neo4j primitives, temporal values, lists,GeoPoint). - Breaking:
NodeType.propertiesnow requires at least one property (min_length=1). String-based node definitions (e.g.,NodeType("Person")) automatically receive a default "name" property withadditional_properties=True. - Breaking:
RelationshipTypewith empty properties andadditional_properties=Falseis now auto-corrected toadditional_properties=Trueto prevent pruning of LLM-extracted properties. - Introduced
PatternPydantic model for internal storage of graph patterns, replacing tuple format. Public APIs maintain backward compatibility by accepting both tuples andPatternobjects.
- Support for Python 3.14
- Support for version 6.0.0 of the Neo4j Python driver
- Switched project/dependency management from Poetry to uv.
- Dropped support for Python 3.9 (EOL)
- Added an optional
node_label_neo4jparameter in the external retrievers to speed up the search query in Neo4j. - Exposed optional
sampleparameter onget_schemaandget_structured_schemato control APOC sampling for schema discovery. - Added an optional
id_property_gettercallable parameter in the Qdrant retriever to allow for custom ID retrieval.
- Added automatic rate limiting with retry logic and exponential backoff for all Embedding providers using tenacity. The
RateLimitHandlerinterface allows for custom rate limiting strategies, including the ability to disable rate limiting entirely. - JSON response returned to
SchemaFromTextExtractoris cleansed of any markdown code blocks before being loaded. - Tool calling support for OllamaLLM.
- Added a
ToolsRetrieverretriever that uses an LLM to decide on what tools to use to find the relevant data. - Added
convert_to_toolmethod to theRetrieverinterface to convert a Retriever to a Tool so it can be used within the ToolsRetriever. This is useful when you might want to have both a VectorRetriever and a Text2CypherRetreiver as a fallback. - Added
schema_visualizationfunction to visualize a graph schema using neo4j-viz.
- Fixed an edge case where the LLM can output a property with type 'map', which was causing errors during import as it is not a valid property type in Neo4j.
- Document node is now always created when running SimpleKGPipeline, even if
from_pdf=False. - Document metadata is exposed in SimpleKGPipeline run method.
- Fixed documentation for PdfLoader
- Fixed a bug where the
formatargument forOllamaLLMwas not propagated to the client. - Fixed
AttributeErrorinSchemaFromTextExtractorwhen filtering out node/relationship types with no labels. - Fixed an import error in
VertexAIEmbeddings.
- Fixed a bug where Session nodes were duplicated.
- Added automatic rate limiting with retry logic and exponential backoff for all LLM providers using tenacity. The
RateLimitHandlerinterface allows for custom rate limiting strategies, including the ability to disable rate limiting entirely.
- Support for Python 3.13
- Added support for automatic schema extraction from text using LLMs. In the
SimpleKGPipeline, when the user provides no schema, the automatic schema extraction is enabled by default. - Added ability to return a user-defined message if context is empty in GraphRAG (which skips the LLM call).
- Fixed a bug where
spacyandrapidfuzzneeded to be installed even if not using the relevant entity resolvers. - Fixed a bug where
VertexAILLM.(a)invoke_with_toolscalled with multiple tools would raise an error.
- Strict mode in
SimpleKGPipeline: theenforce_schemaoption is removed and replaced by a schema-driven pruning.
- The
SchemaEntitymodel has been renamedNodeType. - The
SchemaRelationmodel has been renamedRelationshipType. - The
SchemaPropertymodel has been renamedPropertyType. SchemaConfighas been removed in favor ofGraphSchema(used in theSchemaBuilderandEntityRelationExtractorclasses).entities,relationsandpotential_schemafields have also been renamednode_types,relationship_typesandpatternsrespectively.
- The reserved
idproperty on__KGBuilder__nodes is removed. - The
chunk_indexproperty on__Entity__nodes is removed. Use theFROM_CHUNKrelationship instead. - The
__entity__idindex is not used anymore and can be dropped from the database (it has been replaced by__entity__tmp_internal_id).
- Added tool calling functionality to the LLM base class with OpenAI and VertexAI implementations, enabling structured parameter extraction and function calling.
- Added support for multi-vector collection in Qdrant driver.
- Added a
Pipeline.streammethod to stream pipeline progress. - Added a new semantic match resolver to the KG Builder for entity resolution based on spaCy embeddings and cosine similarities so that nodes with similar textual properties get merged.
- Added a new fuzzy match resolver to the KG Builder for entity resolution based on RapiFuzz string fuzzy matching.
- Improved log output readability in Retrievers and GraphRAG and added embedded vector to retriever result metadata for debugging.
- Switched from pygraphviz to neo4j-viz
- Renders interactive graph now on HTML instead of PNG
- Removed
get_pygraphviz_graphmethod
- Fixed a bug where the
$ninoperator for metadata pre-filtering in retrievers would create an invalid Cypher query.
- Added the
run_with_contextmethod toComponent. This method includes acontext_parameter, which provides information about the pipeline from which the component is executed (e.g., therun_id). It also enables the component to send events to the pipeline's callback function.
- Added
enforce_schemaparameter toSimpleKGPipelinefor optional schema enforcement.
- Added optional schema enforcement as a validation layer after entity and relation extraction.
- Introduced a linear hybrid search ranker for HybridRetriever and HybridCypherRetriever, allowing customizable ranking with an
alphaparameter. - Introduced SearchQueryParseError for handling invalid Lucene query strings in HybridRetriever and HybridCypherRetriever.
- Fixed config loading after module reload (usage in jupyter notebooks)
- Qdrant retriever now fallbacks on the point ID if the
external_id_propertyis not found in the payload. - Updated a few dependencies, mainly
pypdf,anthropicandcohere.
- Utility functions to retrieve metadata for vector and full-text indexes.
- Support for effective_search_ratio parameter in vector and hybrid searches.
- Introduced upsert_vectors utility function for batch upserting embeddings to vector indexes.
- Introduced
extract_cypherfunction to enhance Cypher query extraction and formatting inText2CypherRetriever. - Introduced Neo4jMessageHistory and InMemoryMessageHistory classes for managing LLM message histories.
- Added examples and documentation for using message history with Neo4j and in-memory storage.
- Updated LLM and GraphRAG classes to support new message history classes.
- Refactored index-related functions for improved compatibility and functionality.
- Added deprecation warnings to upsert_vector, upsert_vector_on_relationship functions in favor of upsert_vectors.
- Added deprecation warnings to async_upsert_vector, async_upsert_vector_on_relationship functions notifying developers that they will be removed in a future release.
- Added support for database, timeout, and sanitize arguments in schema functions.
- Resolved an issue with an incorrectly hard coded node alias in the
_handle_field_filterfunction.
- Ability to add event listener to get notifications about Pipeline progress.
- Added py.typed so that mypy knows to use type annotations from the neo4j-graphrag package.
- Support for creating enhanced schemas with detailed property statistics.
- New utility functions for schema formatting and value sanitization.
- Updated unit and integration tests to cover enhanced schema functionality.
- Changed the default behaviour of
FixedSizeSplitterto avoid words cut-off in the chunks whenever it is possible. - Refactored schema creation code to reduce duplication and improve maintainability.
- Removed the
uuidpackage from dependencies (not needed with Python 3). - Fixed a bug in the
AnthropicLLMclass preventing it from being used inGraphRAGpipeline.
- Fix a bug where the
OllamaEmbedderwould return alist[list[float]]instead of the expectedlist[float].
- PyYAML dependency was missing and has been added.
- Weaviate was unintentionally added as a mandatory dependency in previous version, this behavior has been reverted.
- PyPDF and fsspec are not optional anymore so that SimpleKGPipeline examples can run out of the box (they just require the independent installation of openai python package if using OpenAI).
- Support for conversations with message history, including a new
message_historyparameter for LLM interactions. - Ability to include system instructions in LLM invoke method.
- Summarization of chat history to enhance query embedding and context handling in GraphRAG.
- Updated LLM implementations to handle message history consistently across providers.
- The
id_prefixparameter in theLexicalGraphConfigis deprecated.
- IDs for the Document and Chunk nodes in the lexical graph are now randomly generated and unique across multiple runs, fixing issues in the lexical graph where relationships were created between chunks that were created by different pipeline runs.
- Improve logging for a better debugging experience: long lists and strings are now truncated. The max length can be controlled using the
LOGGING__MAX_LIST_LENGTHandLOGGING__MAX_STRING_LENGTHenv variables.
- Integrated
json-repairpackage to handle and repair invalid JSON generated by LLMs. - Introduced
InvalidJSONErrorexception for handling cases where JSON repair fails. - Ability to create a Pipeline or SimpleKGPipeline from a config file. See the example.
- Added
OllamaLLMandOllamaEmbeddingsclasses to make Ollama support more explicit. Implementations using theOpenAILLMandOpenAIEmbeddingsclasses will still work.
- Updated LLM prompt for Entity and Relation extraction to include stricter instructions for generating valid JSON.
- Added schema functions to the documentation.
- Introduced optional lexical graph configuration for
SimpleKGPipeline, enhancing flexibility in customizing node labels and relationship types in the lexical graph. - Introduced optional
neo4j_databaseparameter forSimpleKGPipeline,Neo4jChunkReaderandText2CypherRetriever. - Ability to provide description and list of properties for entities and relations in the
SimpleKGPipelineconstructor.
neo4j_databaseparameter is now used for all queries in theNeo4jWriter.
- Updated all examples to use
neo4j_databaseparameter instead of an undocumented neo4j driver constructor. - All
READqueries are now routed to a reader replica (for clusters). This impacts all retrievers, theNeo4jChunkReaderandSinglePropertyExactMatchResolvercomponents.
- Made
relationsandpotential_schemaoptional inSchemaBuilder. - Added a check to prevent the use of deprecated Cypher syntax for Neo4j versions 5.23.0 and above.
- Added a
LexicalGraphBuildercomponent to enable the import of the lexical graph (document, chunks) without performing entity and relation extraction. - Added a
Neo4jChunkReadercomponent to be able to read chunk text from the database.
- Vector and Hybrid retrievers used with
return_propertiesnow also return the node labels (nodeLabels) and the node's element ID (id). HybridRetrievernow filters out the embedding property index inself.vector_index_namefrom the retriever result by default.- Removed support for neo4j.AsyncDriver in the KG creation pipeline, affecting Neo4jWriter and related components.
- Updated examples and unit tests to reflect the removal of async driver support.
- Resolved issue with
AzureOpenAIEmbeddingsincorrectly inheriting fromOpenAIEmbeddings, now inherits fromBaseOpenAIEmbeddings.
- Introduced a
fail_if_existoption to index creation functions to control behavior when an index already exists. - Added Qdrant retriever in neo4j_graphrag.retrievers.
- Comprehensive rewrite of the README to improve clarity and provide detailed usage examples.
- Fix a bug where
openaiPython client andnumpywere required to import any embedder or LLM.
- The value associated to the enum field
OnError.IGNOREhas been changed from "CONTINUE" to "IGNORE" to stick to the convention and match the field name.
- Added
SinglePropertyExactMatchResolvercomponent allowing to merge entities with exact same property (e.g. name) - Added the
SimpleKGPipelineclass, a simplified abstraction layer to streamline knowledge graph building processes from text documents.
- Added
SinglePropertyExactMatchResolvercomponent allowing to merge entities with exact same property (e.g. name)
- Added AzureOpenAILLM and AzureOpenAIEmbeddings to support Azure served OpenAI models
- Added
templatevalidation inPromptTemplateclass upon construction. - Examples demonstrating the use of Mistral embeddings and LLM in RAG pipelines.
- Added feature to include kwargs in
Text2CypherRetriever.search()that will be injected into a custom prompt, if provided. - Added validation to
custom_promptparameter ofText2CypherRetrieverto ensure thatquery_textplaceholder exists in prompt. - Introduced a fixed size text splitter component for splitting text into specified fixed size chunks with overlap. Updated examples and tests to utilize this new component.
- Introduced Vertex AI LLM class for integrating Vertex AI models.
- Added unit tests for the Vertex AI LLM class.
- Added support for Cohere LLM and embeddings - added optional dependency to
cohere. - Added support for Anthropic LLM - added optional dependency to
anthropic. - Added support for MistralAI LLM - added optional dependency to
mistralai. - Added support for Qdrant - added optional dependency to
qdrant-client.
- Resolved import issue with the Vertex AI Embeddings class.
- Fixed bug in
Text2CypherRetrieverusingcustom_promptarg where thesearchmethod would not inject thequery_textcontent. custom_promptarg is now converted toText2CypherTemplateclass within theText2CypherRetriever.get_search_resultsmethod.Text2CypherTemplateandRAGTemplateprompt templates now requirequery_textarg and will error if it is not present. Previousquery_textaliases may be used, but will warn of deprecation.- Resolved issue where Neo4jWriter component would raise an error if the start or end node ID was not defined properly in the input.
- Resolved issue where relationship types was not escaped in the insert Cypher query.
- Improved query performance in Neo4jWriter: created nodes now have a generic
__KGBuilder__label and an index is created on the__KGBuilder__.idproperty. Moreover, insertion queries are now batched. Batch size can be controlled using thebatch_sizeparameter in theNeo4jWritercomponent.
- Moved the Embedder class to the neo4j_graphrag.embeddings directory for better organization alongside other custom embedders.
- Removed query argument from the GraphRAG class'
.searchmethod; users must now usequery_text. - Neo4jWriter component now runs a single query to merge node and set its embeddings if any.
- Nodes created by the
Neo4jWriternow have an extra__KGBuilder__label. Nodes from the entity graph also have an__Entity__label. - Dropped support for Python 3.8 (end of life).
- Updated documentation links in README.
- Renamed deprecated package references in documentation.
- Introduction page to the documentation content tree.
- Introduced a new Vertex AI embeddings class for generating text embeddings using Vertex AI.
- Updated documentation to include OpenAI and Vertex AI embeddings classes.
- Added google-cloud-aiplatform as an optional dependency for Vertex AI embeddings.
- Make
pygraphvizan optional dependency - it is now only required when callingpipeline.draw.
- Moved pygraphviz to optional dependencies under [tool.poetry.extras] in pyproject.toml to resolve an issue where pip install neo4j-graphrag incorrectly required pygraphviz as a mandatory dependency.
- Officially renamed neo4j-genai to neo4j-graphrag. For the final release version of neo4j-genai, please visit https://pypi.org/project/neo4j-genai/.
- The
neo4j-genaipackage is now deprecated. Users are advised to switch to the new packageneo4j-graphrag.
- Ability to visualise pipeline with
my_pipeline.draw("pipeline.png"). LexicalGraphBuildercomponent to create the lexical graph without entity-relation extraction.
- Pipelines now return correct results when the same pipeline is run in parallel.
- Pipeline run method now return a PipelineResult object.
- Improved parameter validation for pipelines (#124). Pipeline now raise an error before a run starts if:
- the same parameter is mapped twice
- or a parameter is defined in the mapping but is not a valid component input
- PDF-to-graph pipeline for knowledge graph construction in experimental mode
- Introduced support for Component/Pipeline flexible architecture.
- Added new components for knowledge graph construction, including text splitters, schema builders, entity-relation extractors, and Neo4j writers.
- Implemented end-to-end tests for the new knowledge graph builder pipeline.
- When saving the lexical graph in a KG creation pipeline, the document is also saved as a specific node, together with relationships between each chunk and the document they were created from.
- Corrected the hybrid retriever query to ensure proper normalization of scores in vector search results.
- Add optional custom_prompt arg to the Text2CypherRetriever class.
GraphRAG.searchmethod first parameter has been renamedquery_text(wasquery) for consistency with the retrievers interface.- Made
GraphRAG.searchmethod backwards compatible with the query parameter, raising warnings to encourage using query_text instead.
- Corrected initialization to allow specifying the embedding model name.
- Removed sentence_transformers from embeddings/init.py to avoid ImportError when the package is not installed.
- Stopped embeddings from being returned when searching with
VectorRetriever. AddednodeLabelsandidto the metadata ofVectorRetrieverresults. - Added
upsert_vectorutility function for attaching vectors to node properties. - Introduced
Neo4jInsertionErrorfor handling insertion failures in Neo4j. - Included Pinecone and Weaviate retrievers in neo4j_graphrag.retrievers.
- Introduced the GraphRAG object, enabling a full RAG (Retrieval-Augmented Generation) pipeline with context retrieval, prompt formatting, and answer generation.
- Added PromptTemplate and RagTemplate for customizable prompt generation.
- Added LLMInterface with implementation for OpenAI LLM.
- Updated project configuration to support multiple Python versions (3.8 to 3.12) in CI workflows.
- Improved developer experience by copying the docstring from the
Retriever.get_search_resultsmethod to theRetriever.searchmethod - Support for specifying database names in index handling methods and retrievers.
- User Guide in documentation.
- Introduced result_formatter argument to all retrievers, allowing custom formatting of retriever results.
- Refactored import paths for retrievers to neo4j_graphrag.retrievers.
- Implemented exception chaining for all re-raised exceptions to improve stack trace readability.
- Made error messages in
index.pymore consistent. - Renamed
Retriever._get_search_resultstoRetriever.get_search_results - Updated retrievers and index handling methods to accept optional database names.
- Removed Pinecone and Weaviate retrievers from init.py to prevent ImportError when optional dependencies are not installed.
- Moved few-shot examples in
Text2CypherRetrieverto the constructor for better initialization and usage. Updated unit tests and example script accordingly. - Fixed regex warnings in E2E tests for Weaviate and Pinecone retrievers.
- Corrected HuggingFaceEmbeddings import in E2E tests.
- Introduced custom exceptions for improved error handling, including
RetrieverInitializationError,SearchValidationError,FilterValidationError,EmbeddingRequiredError,RecordCreationError,Neo4jIndexError, andNeo4jVersionError. - Retrievers that integrates with a Weaviate vector database:
WeaviateNeo4jRetriever. - New return types that help with getting retriever results:
RetrieverResultandRetrieverResultItem. - Supported wrapper embedder object for sentence-transformers embeddings:
SentenceTransformerEmbeddings. Text2CypherRetrieverobject which allows for the retrieval of records from a Neo4j database using natural language.
- Replaced
ValueErrorwith custom exceptions across various modules for clearer and more specific error messages.
- Updated documentation to include new custom exceptions.
- Improved the use of Pydantic for input data validation for retriever objects.