Skip to content

Conversation

@ydzhu98
Copy link

@ydzhu98 ydzhu98 commented Feb 2, 2026

Description

This PR implements Layer 4 of the Knowledge Graph Query Library for Mellea: Graph Backend Abstraction. This layer provides the foundation for executing graph queries across different database systems, starting with Neo4j support.

What's Included

1. Core Data Structures (base.py)

Pure dataclasses for representing graph data:

  • GraphNode: Represents a node with id, label, and properties
  • GraphEdge: Represents an edge connecting two nodes
  • GraphPath: Represents a path through the graph

Key features:

  • Simple dataclasses (not Components)
  • Factory methods for creating from Neo4j objects
  • Full type safety with type hints

2. Abstract Backend Interface (graph_dbs/base.py)

GraphBackend ABC following Mellea's Backend pattern:

class GraphBackend(ABC):
    - execute_query(query) -> GraphResult
    - get_schema() -> dict
    - validate_query(query) -> (bool, error)
    - supports_query_type(type) -> bool
    - close()

Design principles:

  • Follows Mellea's Backend(model_id, model_options) pattern
  • Abstract methods for core operations
  • Backend-agnostic interface

3. Neo4j Backend Implementation (graph_dbs/neo4j.py)

Full Neo4j integration:

Features:

  • ✅ Execute Cypher queries with parameters
  • ✅ Parse Neo4j results into GraphNode/GraphEdge/GraphPath
  • ✅ Retrieve graph schema (node types, edge types, properties)
  • ✅ Validate Cypher syntax using EXPLAIN
  • ✅ Automatic deduplication of nodes and edges
  • ✅ Support for paths
  • ✅ Async/await throughout
  • ✅ Proper connection management

Key implementation details:

  • Uses both sync and async Neo4j drivers
  • Caches nodes during parsing to handle relationships
  • Deduplicates results across multiple records
  • Handles Neo4j-specific types (Node, Relationship, Path)

4. Mock Backend for Testing (graph_dbs/mock.py)

Testing utility:

Features:

  • Predefined mock data
  • Query history tracking
  • Always validates queries as valid
  • Supports all query types
  • No database connection required

Use cases:

  • Unit testing without database
  • Development without Neo4j
  • CI/CD pipelines

5. Minimal Component Stubs

Temporary implementations for testing Layer 4:

  • components/query.py: Minimal GraphQuery class
  • components/result.py: Minimal GraphResult class
  • components/traversal.py: Minimal GraphTraversal class

Note: These will be replaced with full Component implementations in Layer 2.

Testing

Test Structure

test/contribs/kg/
├── test_base.py              # Data structure tests (9 tests)
├── test_mock_backend.py      # Mock backend tests (7 tests)
└── test_neo4j_backend.py     # Neo4j integration tests (14 tests)

Test Coverage

Total: 30 tests

  • 18 passing (without Neo4j)
  • 12 skipped (require running Neo4j instance)

Test Categories

  1. Base Data Structures (9 tests)

    • Node/Edge/Path creation
    • Property handling
    • Equality testing
  2. Mock Backend (7 tests)

    • Creation and configuration
    • Schema retrieval
    • Query validation
    • History tracking
  3. Neo4j Backend (14 tests)

    • Connection management
    • Query execution
    • Parameter binding
    • Result parsing
    • Schema retrieval
    • Query validation
    • Error handling

Running Tests

# All tests (mocks pass, Neo4j tests skip if no instance)
uv run pytest test/contribs/kg/ -v

# With Neo4j running:
docker run --rm -p 7687:7687 -p 7474:7474 \
    -e NEO4J_AUTH=neo4j/testpassword \
    neo4j:latest

uv run pytest test/contribs/kg/test_neo4j_backend.py -v

Module Structure

Following the design document requirements:

mellea/contribs/kg/
├── __init__.py                # Public API exports
├── base.py                    # Core data structures
├── graph_dbs/                 # Backend implementations
│   ├── __init__.py
│   ├── base.py                # GraphBackend ABC
│   ├── neo4j.py               # Neo4jBackend
│   └── mock.py                # MockGraphBackend
├── components/                # Minimal stubs for Layer 4
│   ├── __init__.py
│   ├── query.py               # GraphQuery (minimal)
│   ├── result.py              # GraphResult (minimal)
│   └── traversal.py           # GraphTraversal (minimal)
├── sampling/                  # Empty (Layer 3)
│   └── __init__.py
├── requirements/              # Empty (Layer 3)
│   └── __init__.py
└── README.md                  # Documentation

Design Decisions

1. Data Structures vs Components

  • GraphNode, GraphEdge, GraphPath are dataclasses, not Components
  • Simple, pure data representation
  • Components (with format_for_llm()) come in Layer 2

2. Backend Pattern

  • Follows Mellea's Backend abstraction for LLMs
  • backend_id and backend_options similar to model_id and model_options
  • Abstract methods for core operations
  • Easy to add new backends (Neptune, RDF, etc.)

3. Neo4j Element IDs

  • Uses element_id instead of deprecated id property
  • Compatible with Neo4j 5.x and 6.x

4. Result Deduplication

  • Automatically deduplicates nodes and edges
  • Handles UNION queries correctly
  • Maintains node cache for efficient edge creation

5. Async-First

  • All I/O operations are async
  • Uses Neo4j's AsyncGraphDatabase
  • Maintains sync driver for compatibility

API Examples

Basic Usage

from mellea.contribs.kg.graph_dbs import Neo4jBackend
from mellea.contribs.kg.components import GraphQuery

# Connect
backend = Neo4jBackend(
    connection_uri="bolt://localhost:7687",
    auth=("neo4j", "password"),
)

# Query
query = GraphQuery(
    query_string="MATCH (p:Person)-[:ACTED_IN]->(m:Movie) RETURN p, m",
    parameters={},
)
result = await backend.execute_query(query)

# Results
for node in result.nodes:
    print(f"{node.label}: {node.properties}")

await backend.close()

Validation

query = GraphQuery(query_string="MATCH (n) RETURN n")
is_valid, error = await backend.validate_query(query)

if not is_valid:
    print(f"Invalid: {error}")

Schema

schema = await backend.get_schema()
print(f"Node types: {schema['node_types']}")
print(f"Edge types: {schema['edge_types']}")

Documentation

Next Steps

After this PR is merged:

  1. Layer 2 PR: Implement full Graph Query Components

    • Convert minimal stubs to full Components
    • Add format_for_llm() implementations
    • Implement fluent Cypher query builder
    • Add multiple result format styles
  2. Layer 3 PR: Add LLM-Guided Query Construction

    • Implement @generative functions
    • Add validation strategies
    • Create query requirements
  3. Layer 1 PR: Application Examples

    • End-to-end usage examples
    • KGRag integration
    • Best practices documentation

Commits

This PR contains the following components:

  1. Core data structures (base.py)
  2. Abstract backend interface
  3. Neo4j backend implementation
  4. Mock backend for testing
  5. Minimal component stubs
  6. Comprehensive test suite
  7. Documentation

Testing

  • Tests added to the respective file if code was changed
  • New code has 100% coverage if code as added
  • Ensure existing tests and github automation passes (a maintainer will kick off the github automation when the rest of the PR is populated)

@mergify
Copy link

mergify bot commented Feb 2, 2026

Merge Protections

Your pull request matches the following merge protections and will not be merged until they are valid.

🔴 Enforce conventional commit

This rule is failing.

Make sure that we follow https://www.conventionalcommits.org/en/v1.0.0/

  • title ~= ^(fix|feat|docs|style|refactor|perf|test|build|ci|chore|revert|release)(?:\(.+\))?:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant