-
Notifications
You must be signed in to change notification settings - Fork 2.2k
Description
Description
When deleting episodes via remove_episode(), entities that lack MENTIONS relationships are not cleaned up, resulting in orphaned Entity nodes accumulating in the Neo4j database over time.
Impact
- Orphaned entities persist indefinitely in the knowledge graph
- Database grows unnecessarily with unreachable entities
- No automatic cleanup mechanism exists for these orphans
Root Cause
The issue stems from LLM non-determinism during knowledge graph extraction. When Graphiti processes documents:
- The LLM sometimes creates Entity nodes without establishing corresponding MENTIONS relationships from Episodic nodes
- The current
remove_episode()implementation only checks entities returned byget_mentioned_nodes() get_mentioned_nodes()requires MENTIONS relationships to exist - entities without them are invisible to the cleanup logic- These orphaned entities are never considered for deletion
Relevant Code (graphiti_core/graphiti.py, lines 1251-1263):
# Find nodes mentioned by the episode
nodes = await get_mentioned_nodes(self.driver, [episode])
# We should delete all nodes that are only mentioned in the deleted episode
nodes_to_delete: list[EntityNode] = []
for node in nodes:
query: LiteralString = 'MATCH (e:Episodic)-[:MENTIONS]->(n:Entity {uuid: $uuid}) RETURN count(*) AS episode_count'
records, _, _ = await self.driver.execute_query(query, uuid=node.uuid, routing_='r')
for record in records:
if record['episode_count'] == 1:
nodes_to_delete.append(node)The problem: get_mentioned_nodes() only returns entities WITH MENTIONS relationships. Entities with 0 MENTIONS relationships are never checked.
Reproduction Steps
- Ingest multiple documents into Graphiti (LLM will probabilistically create some entities without MENTIONS relationships)
- Delete the collection/episodes
- Run this Cypher query to find orphans:
MATCH (n:Entity) WHERE NOT EXISTS { (e:Episodic)-[:MENTIONS]->(n) } RETURN n
- Observe orphaned Entity nodes that should have been cleaned up
Real-World Test Case
In our testing with RAG Memory (which uses Graphiti):
- Before fix: Ingested 9 pages from React documentation, deleted collection → 2 orphaned entities remained ("state", "GeeksforGeeks")
- After fix: Clean slate test with same data → 0 orphaned entities, all cleaned up successfully
Proposed Fix
Add orphan cleanup logic to remove_episode() that runs after deleting the episode. This must run after episode deletion because entities mentioned only by the deleted episode will still have MENTIONS relationships if checked before deletion.
Add this to graphiti_core/graphiti.py:remove_episode() after line 1268 (await episode.delete(self.driver)):
await Edge.delete_by_uuids(self.driver, [edge.uuid for edge in edges_to_delete])
await Node.delete_by_uuids(self.driver, [node.uuid for node in nodes_to_delete])
await episode.delete(self.driver)
# Additional orphan cleanup: Find and delete entities with no MENTIONS relationships
# This handles entities that were created during ingestion but never got MENTIONS relationships
# due to LLM non-determinism or partial write failures
# IMPORTANT: This must run AFTER deleting the episode, otherwise entities mentioned only by
# this episode will still have MENTIONS relationships and won't be detected as orphans
orphan_query: LiteralString = """
MATCH (n:Entity)
WHERE NOT EXISTS { (e:Episodic)-[:MENTIONS]->(n) }
AND n.group_id = $group_id
RETURN n.uuid AS uuid, n.name AS name
"""
orphan_records, _, _ = await self.driver.execute_query(
orphan_query, group_id=episode.group_id, routing_='r'
)
orphaned_uuids = [record['uuid'] for record in orphan_records]
if orphaned_uuids:
logger.warning(
f"Found {len(orphaned_uuids)} orphaned entities during episode deletion cleanup: "
f"{[record['name'] for record in orphan_records]}"
)
await Node.delete_by_uuids(self.driver, orphaned_uuids)Why This Fix Works
- Timing: Runs after episode deletion, so entities only mentioned by the deleted episode now have 0 MENTIONS relationships
- Comprehensive: Uses
NOT EXISTS { (e:Episodic)-[:MENTIONS]->(n) }to find ALL orphans, not just those withepisode_count == 1 - Scoped: Filters by
group_idto only clean up entities in the same collection - Observable: Logs warning with entity names for debugging and monitoring
Testing Evidence
Clean slate verification (completely empty databases):
Before deletion:
- 83 nodes (75 Entity + 8 Episodic)
- 221 relationships
- 126 entities created from 8 ingested pages
After deletion with fix:
- 0 nodes remaining (verified with
MATCH (n) RETURN count(n)) - Fix detected and cleaned up 3 orphans: "JSX", "useState", "useEffect"
Without the fix, these 3 entities (plus others from previous tests) would have remained orphaned indefinitely.
Environment
- Graphiti version: Current main branch (as of January 2025)
- Neo4j version: 5.x
- Integration: RAG Memory MCP server using graphiti-core