Skip to content

Orphaned entities not cleaned up during episode deletion due to missing MENTIONS relationships #1083

@codingthefuturewithai

Description

@codingthefuturewithai

Description

When deleting episodes via remove_episode(), entities that lack MENTIONS relationships are not cleaned up, resulting in orphaned Entity nodes accumulating in the Neo4j database over time.

Impact

  • Orphaned entities persist indefinitely in the knowledge graph
  • Database grows unnecessarily with unreachable entities
  • No automatic cleanup mechanism exists for these orphans

Root Cause

The issue stems from LLM non-determinism during knowledge graph extraction. When Graphiti processes documents:

  1. The LLM sometimes creates Entity nodes without establishing corresponding MENTIONS relationships from Episodic nodes
  2. The current remove_episode() implementation only checks entities returned by get_mentioned_nodes()
  3. get_mentioned_nodes() requires MENTIONS relationships to exist - entities without them are invisible to the cleanup logic
  4. These orphaned entities are never considered for deletion

Relevant Code (graphiti_core/graphiti.py, lines 1251-1263):

# Find nodes mentioned by the episode
nodes = await get_mentioned_nodes(self.driver, [episode])
# We should delete all nodes that are only mentioned in the deleted episode
nodes_to_delete: list[EntityNode] = []
for node in nodes:
    query: LiteralString = 'MATCH (e:Episodic)-[:MENTIONS]->(n:Entity {uuid: $uuid}) RETURN count(*) AS episode_count'
    records, _, _ = await self.driver.execute_query(query, uuid=node.uuid, routing_='r')

    for record in records:
        if record['episode_count'] == 1:
            nodes_to_delete.append(node)

The problem: get_mentioned_nodes() only returns entities WITH MENTIONS relationships. Entities with 0 MENTIONS relationships are never checked.

Reproduction Steps

  1. Ingest multiple documents into Graphiti (LLM will probabilistically create some entities without MENTIONS relationships)
  2. Delete the collection/episodes
  3. Run this Cypher query to find orphans:
    MATCH (n:Entity) 
    WHERE NOT EXISTS { (e:Episodic)-[:MENTIONS]->(n) } 
    RETURN n
  4. Observe orphaned Entity nodes that should have been cleaned up

Real-World Test Case

In our testing with RAG Memory (which uses Graphiti):

  • Before fix: Ingested 9 pages from React documentation, deleted collection → 2 orphaned entities remained ("state", "GeeksforGeeks")
  • After fix: Clean slate test with same data → 0 orphaned entities, all cleaned up successfully

Proposed Fix

Add orphan cleanup logic to remove_episode() that runs after deleting the episode. This must run after episode deletion because entities mentioned only by the deleted episode will still have MENTIONS relationships if checked before deletion.

Add this to graphiti_core/graphiti.py:remove_episode() after line 1268 (await episode.delete(self.driver)):

await Edge.delete_by_uuids(self.driver, [edge.uuid for edge in edges_to_delete])
await Node.delete_by_uuids(self.driver, [node.uuid for node in nodes_to_delete])

await episode.delete(self.driver)

# Additional orphan cleanup: Find and delete entities with no MENTIONS relationships
# This handles entities that were created during ingestion but never got MENTIONS relationships
# due to LLM non-determinism or partial write failures
# IMPORTANT: This must run AFTER deleting the episode, otherwise entities mentioned only by
# this episode will still have MENTIONS relationships and won't be detected as orphans
orphan_query: LiteralString = """
    MATCH (n:Entity)
    WHERE NOT EXISTS { (e:Episodic)-[:MENTIONS]->(n) }
    AND n.group_id = $group_id
    RETURN n.uuid AS uuid, n.name AS name
"""
orphan_records, _, _ = await self.driver.execute_query(
    orphan_query, group_id=episode.group_id, routing_='r'
)

orphaned_uuids = [record['uuid'] for record in orphan_records]
if orphaned_uuids:
    logger.warning(
        f"Found {len(orphaned_uuids)} orphaned entities during episode deletion cleanup: "
        f"{[record['name'] for record in orphan_records]}"
    )
    await Node.delete_by_uuids(self.driver, orphaned_uuids)

Why This Fix Works

  1. Timing: Runs after episode deletion, so entities only mentioned by the deleted episode now have 0 MENTIONS relationships
  2. Comprehensive: Uses NOT EXISTS { (e:Episodic)-[:MENTIONS]->(n) } to find ALL orphans, not just those with episode_count == 1
  3. Scoped: Filters by group_id to only clean up entities in the same collection
  4. Observable: Logs warning with entity names for debugging and monitoring

Testing Evidence

Clean slate verification (completely empty databases):

Before deletion:

  • 83 nodes (75 Entity + 8 Episodic)
  • 221 relationships
  • 126 entities created from 8 ingested pages

After deletion with fix:

  • 0 nodes remaining (verified with MATCH (n) RETURN count(n))
  • Fix detected and cleaned up 3 orphans: "JSX", "useState", "useEffect"

Without the fix, these 3 entities (plus others from previous tests) would have remained orphaned indefinitely.

Environment

  • Graphiti version: Current main branch (as of January 2025)
  • Neo4j version: 5.x
  • Integration: RAG Memory MCP server using graphiti-core

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions