Skip to content

Evaluate and Integrate Neo4j GraphRAG for Enhanced Knowledge Graph CapabilitiesΒ #18

@ma3u

Description

@ma3u

Evaluate and Integrate Neo4j GraphRAG for Enhanced Knowledge Graph Capabilities

🎯 Objective

Evaluate and potentially integrate Neo4j GraphRAG Python package (v1.10.0) to enhance our production RAG system with graph intelligence, entity extraction, and multi-hop reasoning capabilities while maintaining our individual optimization.

πŸ“‹ Context

Current System (Production):

  • βœ… 30,006 chunks across 12 technical books
  • βœ… 417x optimized vector search (110ms queries)
  • βœ… Hybrid deployment (local Docker + Neo4j Aura)
  • βœ… Azure AI Foundry integration
  • ❌ No entity extraction or graph traversal
  • ❌ Limited cross-document reasoning

Neo4j GraphRAG (Official Package):

  • βœ… VectorCypherRetriever (vector + graph traversal)
  • βœ… SimpleKGPipeline (entity extraction)
  • βœ… Multi-LLM support (Azure OpenAI, Ollama, custom)
  • βœ… Hybrid environment compatible
  • ❓ Performance impact unknown

πŸš€ Implementation Phases

Phase 1: Evaluation & Testing (Week 1) - Priority: HIGH

Tasks:

  • Install neo4j-graphrag[openai,sentence-transformers]==1.10.0
  • Test compatibility with existing Aura instance (6b870b04)
  • Create Neo4jGraphRAGRetriever wrapper class
  • Run side-by-side comparison: current vs GraphRAG (10 test queries)
  • Measure latency impact (target: <200ms acceptable)
  • Validate hybrid deployment (local + cloud)

Deliverables:

  • neo4j-rag-demo/src/neo4j_graphrag_retriever.py (wrapper)
  • neo4j-rag-demo/tests/test_graphrag_performance.py (benchmarks)
  • Performance report document

Success Criteria:

  • GraphRAG works with our existing data (no schema changes needed)
  • Latency overhead <90ms (total <200ms)
  • Works in both local Docker and Aura environments

Phase 2: Entity Extraction PoC (Week 2) - Priority: MEDIUM

Tasks:

  • Select 1-2 representative books (~2,500 chunks)
  • Configure SimpleKGPipeline with Azure OpenAI (gpt-4o-mini)
  • Extract entities: Technology, Concept, Framework, Author
  • Manual quality assessment (review 50 random entities)
  • Measure cost (actual vs estimated $2-3)
  • Test VectorCypherRetriever with entity graph

Deliverables:

  • neo4j-rag-demo/scripts/extract_entities_graphrag.py
  • Entity extraction quality report
  • Cost analysis

Success Criteria:

  • Entity extraction accuracy >80%
  • Entities provide useful relationships
  • Cost within budget (<$5 for PoC)

Phase 3: Integration & A/B Testing (Week 3) - Priority: MEDIUM

Tasks:

  • Integrate GraphRAG retriever into main system
  • Add API flag: ?use_graphrag=true
  • Create 20 test queries (simple + complex)
  • A/B test: current vs GraphRAG on quality + latency
  • Document trade-offs (performance vs context richness)

Deliverables:

  • Updated neo4j-rag-demo/src/neo4j_rag.py (hybrid support)
  • A/B test results document
  • Decision recommendation

Success Criteria:

  • GraphRAG improves answer quality by β‰₯20%
  • Performance acceptable for production use
  • Clear recommendation: proceed or defer

Phase 4: Production Rollout (Optional, Week 4+) - Priority: LOW

Tasks (if Phase 3 is successful):

  • Extract entities for all 30K chunks (~$5-10)
  • Update Azure AI Foundry Assistant functions
  • Deploy to Azure Container Apps
  • Update documentation (README, Cypher queries)
  • Add monitoring for GraphRAG queries
  • Update NODES2025 presentation materials

Deliverables:

  • Full entity graph in Aura instance
  • Updated deployment documentation
  • Monitoring dashboards

Success Criteria:

  • Zero production issues
  • User feedback positive
  • Latency within SLA

πŸ“Š Comparison: Current vs GraphRAG

Feature Current System With GraphRAG
Vector search βœ… 110ms βœ… 150-200ms (est.)
Entity extraction ❌ None βœ… Automated
Graph traversal ❌ None βœ… Multi-hop
Cross-document links ❌ Limited βœ… Rich
Maintenance ⚠️ Custom βœ… Neo4j supported
Performance βœ… Optimized ❓ To be tested

See detailed comparison: docs/GRAPHRAG_COMPARISON.md


πŸ’° Cost Analysis

One-Time Costs:

  • Entity extraction (30K chunks): ~$5-10 (gpt-4o-mini)
  • Development time: 2-3 weeks
  • Testing & validation: 1 week

Ongoing Costs:

  • $0 additional (uses existing infrastructure)
  • Entity graph stored in same Aura instance (+10% storage)

Alternative (Free):

  • Use Ollama/BitNet for local entity extraction (slower but $0)

πŸ”— Resources

Documentation:

Neo4j Resources:

NODES2025:


βœ… Decision Criteria

Proceed to Phase 4 if:

  • βœ… GraphRAG improves answer quality by β‰₯20%
  • βœ… Latency increase acceptable (<200ms total)
  • βœ… Entity extraction provides clear value
  • βœ… Hybrid deployment works seamlessly
  • βœ… Cost justified (~$10 one-time)

Defer if:

  • ❌ Minimal quality improvement (<10%)
  • ❌ Performance issues (>300ms queries)
  • ❌ Entity extraction quality poor (<70%)
  • ❌ Integration too complex

Keep as optional feature if:

  • ⚠️ Quality improvement moderate (10-20%)
  • ⚠️ Performance acceptable but not optimal
  • ⚠️ Useful for some queries but not all

🎯 Success Metrics

Metric Current Target Measurement
Query Latency 110ms <200ms Benchmark tests
Answer Quality Baseline +20% Manual evaluation
Cross-Doc Links 0 >50 entities Graph analysis
Entity Accuracy N/A >80% Manual review
Deployment Working Working Smoke tests

🏷️ Labels

  • enhancement - New feature addition
  • evaluation - Needs evaluation/testing
  • graphrag - Related to GraphRAG integration
  • performance - Performance impact consideration
  • documentation - Requires documentation updates

πŸ‘₯ Stakeholders

  • Developer: Technical implementation
  • Aura Instance: 6b870b04 (production)
  • NODES2025: Potential presentation update

πŸ“… Timeline

Week 1 (Phase 1): Evaluation & testing
Week 2 (Phase 2): Entity extraction PoC
Week 3 (Phase 3): A/B testing & decision
Week 4+ (Phase 4): Production rollout (if approved)

Total Estimate: 3-4 weeks from start to production (if proceeding)


πŸ”„ Updates

This issue will be updated with:

  • Phase 1 results
  • Phase 2 quality assessment
  • Phase 3 A/B test findings
  • Final decision and rationale

Metadata

Metadata

Assignees

Labels

enhancementNew feature or request

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions