Skip to content

Commit bf68808

Browse files
authored
Merge branch 'main' into improve-llm-generated-json-fixing
2 parents fadde05 + 6ac97b7 commit bf68808

File tree

64 files changed

+1390
-151
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

64 files changed

+1390
-151
lines changed

CHANGELOG.md

Lines changed: 20 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2,17 +2,37 @@
22

33
## Next
44

5+
## 1.2.1
6+
7+
### Added
8+
- Introduced optional lexical graph configuration for `SimpleKGPipeline`, enhancing flexibility in customizing node labels and relationship types in the lexical graph.
9+
- Introduced optional `neo4j_database` parameter for `SimpleKGPipeline`, `Neo4jChunkReader`and `Text2CypherRetriever`.
10+
- Ability to provide description and list of properties for entities and relations in the `SimpleKGPipeline` constructor.
11+
12+
### Fixed
13+
- `neo4j_database` parameter is now used for all queries in the `Neo4jWriter`.
14+
15+
### Changed
16+
- Updated all examples to use `neo4j_database` parameter instead of an undocumented neo4j driver constructor.
17+
- All `READ` queries are now routed to a reader replica (for clusters). This impacts all retrievers, the `Neo4jChunkReader` and `SinglePropertyExactMatchResolver` components.
18+
19+
20+
## 1.2.0
21+
522
### Added
623
- Made `relations` and `potential_schema` optional in `SchemaBuilder`.
724
- Added a check to prevent the use of deprecated Cypher syntax for Neo4j versions 5.23.0 and above.
825
- Added a `LexicalGraphBuilder` component to enable the import of the lexical graph (document, chunks) without performing entity and relation extraction.
26+
- Added a `Neo4jChunkReader` component to be able to read chunk text from the database.
927

1028
### Changed
1129
- Vector and Hybrid retrievers used with `return_properties` now also return the node labels (`nodeLabels`) and the node's element ID (`id`).
1230
- `HybridRetriever` now filters out the embedding property index in `self.vector_index_name` from the retriever result by default.
1331
- Removed support for neo4j.AsyncDriver in the KG creation pipeline, affecting Neo4jWriter and related components.
1432
- Updated examples and unit tests to reflect the removal of async driver support.
1533

34+
### Fixed
35+
- Resolved issue with `AzureOpenAIEmbeddings` incorrectly inheriting from `OpenAIEmbeddings`, now inherits from `BaseOpenAIEmbeddings`.
1636

1737
## 1.1.0
1838

docs/source/api.rst

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -58,6 +58,14 @@ LexicalGraphBuilder
5858
:members:
5959
:exclude-members: component_inputs, component_outputs
6060

61+
62+
Neo4jChunkReader
63+
================
64+
65+
.. autoclass:: neo4j_graphrag.experimental.components.neo4j_reader.Neo4jChunkReader
66+
:members:
67+
:exclude-members: component_inputs, component_outputs
68+
6169
SchemaBuilder
6270
=============
6371

docs/source/index.rst

Lines changed: 0 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -295,10 +295,3 @@ Further information
295295

296296
- `The official Neo4j Python driver <https://github.com/neo4j/neo4j-python-driver>`_
297297
- `Neo4j GenAI integrations <https://neo4j.com/docs/cypher-manual/current/genai-integrations/>`_
298-
299-
Indices and tables
300-
==================
301-
302-
* :ref:`genindex`
303-
* :ref:`modindex`
304-
* :ref:`search`

docs/source/user_guide_kg_builder.rst

Lines changed: 42 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -16,7 +16,7 @@ unstructured data.
1616
Pipeline Structure
1717
******************
1818

19-
A Knowledge Graph (KG) construction pipeline requires a few components:
19+
A Knowledge Graph (KG) construction pipeline requires a few components (some of the below components are optional):
2020

2121
- **Document parser**: extract text from files (PDFs, ...).
2222
- **Document chunker**: split the text into smaller pieces of text, manageable by the LLM context window (token limit).
@@ -205,6 +205,47 @@ Example usage:
205205
See :ref:`kg-writer-section` to learn how to write the resulting nodes and relationships to Neo4j.
206206

207207

208+
Neo4j Chunk Reader
209+
==================
210+
211+
The Neo4j chunk reader component is used to read text chunks from Neo4j. Text chunks can be created
212+
by the lexical graph builder or another process.
213+
214+
.. code:: python
215+
216+
import neo4j
217+
from neo4j_graphrag.experimental.components.neo4j_reader import Neo4jChunkReader
218+
from neo4j_graphrag.experimental.components.types import LexicalGraphConfig
219+
220+
reader = Neo4jChunkReader(driver)
221+
result = await reader.run()
222+
223+
224+
Configure node labels and relationship types
225+
---------------------------------------------
226+
227+
Optionally, the document and chunk node labels can be configured using a `LexicalGraphConfig` object:
228+
229+
.. code:: python
230+
231+
from neo4j_graphrag.experimental.components.neo4j_reader import Neo4jChunkReader
232+
from neo4j_graphrag.experimental.components.types import LexicalGraphConfig, TextChunks
233+
234+
# optionally, define a LexicalGraphConfig object
235+
# shown below with the default values
236+
config = LexicalGraphConfig(
237+
id_prefix="", # used to prefix the chunk and document IDs
238+
chunk_node_label="Chunk",
239+
document_node_label="Document",
240+
chunk_to_document_relationship_type="PART_OF_DOCUMENT",
241+
next_chunk_relationship_type="NEXT_CHUNK",
242+
node_to_chunk_relationship_type="PART_OF_CHUNK",
243+
chunk_embedding_property="embeddings",
244+
)
245+
reader = Neo4jChunkReader(driver)
246+
result = await reader.run(lexical_graph_config=config)
247+
248+
208249
Schema Builder
209250
==============
210251

examples/README.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -92,6 +92,8 @@ are listed in [the last section of this file](#customize).
9292
- [End to end example with explicit components and text input](./customize/build_graph/pipeline/kg_builder_from_text.py)
9393
- [End to end example with explicit components and PDF input](./customize/build_graph/pipeline/kg_builder_from_pdf.py)
9494
- [Process multiple documents](./customize/build_graph/pipeline/kg_builder_two_documents_entity_resolution.py)
95+
- [Export lexical graph creation into another pipeline](./customize/build_graph/pipeline/text_to_lexical_graph_to_entity_graph_two_pipelines.py)
96+
9597

9698
#### Components
9799

examples/build_graph/simple_kg_builder_from_pdf.py

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -50,6 +50,7 @@ async def define_and_run_pipeline(
5050
entities=ENTITIES,
5151
relations=RELATIONS,
5252
potential_schema=POTENTIAL_SCHEMA,
53+
neo4j_database=DATABASE,
5354
)
5455
return await kg_builder.run_async(file_path=str(file_path))
5556

@@ -62,7 +63,7 @@ async def main() -> PipelineResult:
6263
"response_format": {"type": "json_object"},
6364
},
6465
)
65-
with neo4j.GraphDatabase.driver(URI, auth=AUTH, database=DATABASE) as driver:
66+
with neo4j.GraphDatabase.driver(URI, auth=AUTH) as driver:
6667
res = await define_and_run_pipeline(driver, llm)
6768
await llm.async_client.close()
6869
return res

examples/build_graph/simple_kg_builder_from_text.py

Lines changed: 27 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -3,6 +3,8 @@
33
44
This example assumes a Neo4j db is up and running. Update the credentials below
55
if needed.
6+
7+
NB: when building a KG from text, no 'Document' node is created in the Knowledge Graph.
68
"""
79

810
import asyncio
@@ -11,6 +13,10 @@
1113
from neo4j_graphrag.embeddings import OpenAIEmbeddings
1214
from neo4j_graphrag.experimental.pipeline.kg_builder import SimpleKGPipeline
1315
from neo4j_graphrag.experimental.pipeline.pipeline import PipelineResult
16+
from neo4j_graphrag.experimental.pipeline.types import (
17+
EntityInputType,
18+
RelationInputType,
19+
)
1420
from neo4j_graphrag.llm import LLMInterface
1521
from neo4j_graphrag.llm.openai_llm import OpenAILLM
1622

@@ -21,12 +27,28 @@
2127

2228
# Text to process
2329
TEXT = """The son of Duke Leto Atreides and the Lady Jessica, Paul is the heir of House Atreides,
24-
an aristocratic family that rules the planet Caladan."""
30+
an aristocratic family that rules the planet Caladan, the rainy planet, since 10191."""
2531

2632
# Instantiate Entity and Relation objects. This defines the
2733
# entities and relations the LLM will be looking for in the text.
28-
ENTITIES = ["Person", "House", "Planet"]
29-
RELATIONS = ["PARENT_OF", "HEIR_OF", "RULES"]
34+
ENTITIES: list[EntityInputType] = [
35+
# entities can be defined with a simple label...
36+
"Person",
37+
# ... or with a dict if more details are needed,
38+
# such as a description:
39+
{"label": "House", "description": "Family the person belongs to"},
40+
# or a list of properties the LLM will try to attach to the entity:
41+
{"label": "Planet", "properties": [{"name": "weather", "type": "STRING"}]},
42+
]
43+
# same thing for relationships:
44+
RELATIONS: list[RelationInputType] = [
45+
"PARENT_OF",
46+
{
47+
"label": "HEIR_OF",
48+
"description": "Used for inheritor relationship between father and sons",
49+
},
50+
{"label": "RULES", "properties": [{"name": "fromYear", "type": "INTEGER"}]},
51+
]
3052
POTENTIAL_SCHEMA = [
3153
("Person", "PARENT_OF", "Person"),
3254
("Person", "HEIR_OF", "House"),
@@ -47,6 +69,7 @@ async def define_and_run_pipeline(
4769
relations=RELATIONS,
4870
potential_schema=POTENTIAL_SCHEMA,
4971
from_pdf=False,
72+
neo4j_database=DATABASE,
5073
)
5174
return await kg_builder.run_async(text=TEXT)
5275

@@ -59,7 +82,7 @@ async def main() -> PipelineResult:
5982
"response_format": {"type": "json_object"},
6083
},
6184
)
62-
with neo4j.GraphDatabase.driver(URI, auth=AUTH, database=DATABASE) as driver:
85+
with neo4j.GraphDatabase.driver(URI, auth=AUTH) as driver:
6386
res = await define_and_run_pipeline(driver, llm)
6487
await llm.async_client.close()
6588
return res

examples/customize/answer/custom_prompt.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -23,7 +23,6 @@
2323
driver = neo4j.GraphDatabase.driver(
2424
URI,
2525
auth=AUTH,
26-
database=DATABASE,
2726
)
2827

2928
embedder = OpenAIEmbeddings()
@@ -33,6 +32,7 @@
3332
index_name=INDEX,
3433
retrieval_query="WITH node, score RETURN node.title as title, node.plot as plot",
3534
embedder=embedder,
35+
neo4j_database=DATABASE,
3636
)
3737

3838
llm = OpenAILLM(model_name="gpt-4o", model_params={"temperature": 0})

examples/customize/answer/langchain_compatiblity.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -21,7 +21,6 @@
2121
driver = neo4j.GraphDatabase.driver(
2222
URI,
2323
auth=AUTH,
24-
database=DATABASE,
2524
)
2625

2726
embedder = OpenAIEmbeddings(model="text-embedding-ada-002")
@@ -31,6 +30,7 @@
3130
index_name=INDEX,
3231
retrieval_query="WITH node, score RETURN node.title as title, node.plot as plot",
3332
embedder=embedder, # type: ignore[arg-type, unused-ignore]
33+
neo4j_database=DATABASE,
3434
)
3535

3636
llm = ChatOpenAI(model="gpt-4o", temperature=0)
Lines changed: 21 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,21 @@
1+
import asyncio
2+
3+
import neo4j
4+
from neo4j_graphrag.experimental.components.neo4j_reader import Neo4jChunkReader
5+
from neo4j_graphrag.experimental.components.types import LexicalGraphConfig, TextChunks
6+
7+
8+
async def main(driver: neo4j.Driver) -> TextChunks:
9+
config = LexicalGraphConfig( # only needed to overwrite the default values
10+
chunk_node_label="TextPart",
11+
)
12+
reader = Neo4jChunkReader(driver)
13+
result = await reader.run(lexical_graph_config=config)
14+
return result
15+
16+
17+
if __name__ == "__main__":
18+
with neo4j.GraphDatabase.driver(
19+
"bolt://localhost:7687", auth=("neo4j", "password")
20+
) as driver:
21+
print(asyncio.run(main(driver)))

0 commit comments

Comments
 (0)