The ultimate Graph + Vector + Text Retrieval Engine for InterSystems IRIS.
IRIS Vector Graph is a general-purpose graph utility built on InterSystems IRIS that supports and demonstrates knowledge graph construction and query techniques. It combines graph traversal, HNSW vector similarity, and lexical search in a single, unified database.
- Multi-Query Power: Query your graph via SQL, openCypher (v1.3 with DML), or GraphQL — all on the same data.
- Transactional Engine: Beyond retrieval — support for
CREATE,DELETE, andMERGEoperations. - Blazing Fast Vectors: Native HNSW indexing delivering ~1.7ms search latency (vs 5.8s standard).
- Zero-Dependency Integration: Built with IRIS Embedded Python — no external vector DBs or graph engines required.
- Production-Ready: The engine behind iris-vector-rag for advanced RAG pipelines.
pip install iris-vector-graphNote: Requires InterSystems IRIS 2025.1+ with the irispython runtime enabled.
# 1. Clone & Sync
git clone https://github.com/intersystems-community/iris-vector-graph.git && cd iris-vector-graph
uv sync
# 2. Spin up IRIS
docker-compose up -d
# 3. Start API
uvicorn api.main:app --reloadVisit:
- GraphQL Playground: http://localhost:8000/graphql
- API Docs: http://localhost:8000/docs
IRIS Vector Graph features a custom recursive-descent Cypher parser supporting multi-stage queries and transactional updates:
// Complex fraud analysis with WITH and Aggregations
MATCH (a:Account)-[r]->(t:Transaction)
WITH a, count(t) AS txn_count
WHERE txn_count > 5
MATCH (a)-[:OWNED_BY]->(p:Person)
RETURN p.name, txn_countSupported Clauses: MATCH, OPTIONAL MATCH, WITH, WHERE, RETURN, UNWIND, CREATE, DELETE, DETACH DELETE, MERGE, SET, REMOVE.
query {
protein(id: "PROTEIN:TP53") {
name
interactsWith(first: 5) { id name }
similar(limit: 3) { protein { name } similarity }
}
}SELECT TOP 10 id,
kg_RRF_FUSE(id, vector, 'cancer suppressor') as score
FROM nodes
ORDER BY score DESCThe integration of a native HNSW (Hierarchical Navigable Small World) functional index directly into InterSystems IRIS provides massive scaling benefits for hybrid graph-vector workloads.
By keeping the vector index in-process with the graph data, we achieve subsecond multi-modal queries that would otherwise require complex application-side joins across multiple databases.
- High-Speed Traversal: ~1.84M TEPS (Traversed Edges Per Second).
- Sub-millisecond Latency: 2-hop BFS on 10k nodes in <40ms.
- RDF 1.2 Support: Native support for Quoted Triples (Metadata on edges) via subject-referenced properties.
- Query Signatures: O(1) hop-rejection using ASQ-inspired Master Label Sets.
Consider a "Find-and-Follow" query common in fraud detection:
- Find the top 10 accounts most semantically similar to a known fraudulent pattern (Vector Search).
- Follow all outbound transactions from those 10 accounts to identify the next layer of the money laundering ring (Graph Hop).
In a standard database without HNSW, the first step (vector search) can take several seconds as the dataset grows, blocking the subsequent graph traversals. With iris-vector-graph, the vector lookup is reduced to ~1.7ms, enabling the entire hybrid traversal to complete in a fraction of a second.
Experience the power of IRIS Vector Graph through our interactive demo applications.
Explore protein-protein interaction networks with vector similarity and D3.js visualization.
Real-time fraud scoring with transaction networks, Cypher-based pattern matching, and bitemporal audit trails.
To run the CLI demos:
export PYTHONPATH=$PYTHONPATH:.
# Cypher-powered fraud detection
python3 examples/demo_fraud_detection.py
# SQL-powered "drop down" example
python3 examples/demo_fraud_detection_sql.pyTo run the Web Visualization demos:
# Start the demo server
uv run uvicorn src.iris_demo_server.app:app --port 8200 --host 0.0.0.0Visit http://localhost:8200 to begin.
IRIS Vector Graph is the core engine powering iris-vector-rag. You can use it in your RAG pipelines like this:
from iris_vector_rag import create_pipeline
# Create a GraphRAG pipeline powered by this engine
pipeline = create_pipeline('graphrag')
# Combined vector + text + graph retrieval
result = pipeline.query(
"What are the latest cancer treatment approaches?",
top_k=5
)- Detailed Architecture
- Biomedical Domain Examples
- Full Test Suite
- iris-vector-rag Integration
- Verbose README (Legacy)
- High-Performance Batch API: New
get_nodes(node_ids)reduces database round-trips by 100x+ for large result sets - Advanced Substring Search: Integrated IRIS
iFindindexing for sub-20msCONTAINSqueries on 10,000+ records - GraphQL Acceleration: Implemented
GenericNodeLoaderto eliminate N+1 query patterns in GQL traversals - Transactional Batching: Optimized
bulk_create_nodes/edgeswithexecutemanyand unified transactions - Functional Indexing: Native JSON-based edge confidence indexing for fast complex filtering
- Schema Cleanup: Removed invalid
VECTOR_DIMENSIONcall from schema utilities - Refinement: Engine now relies solely on inference and explicit config for dimensions
- Robust Embeddings: Fixed embedding dimension detection for IRIS Community 2025.1
- API Improvements: Added
embedding_dimensionparam toIRISGraphEnginefor manual override - Auto-Inference: Automatically infers dimension from input if detection fails
- Code Quality: Major cleanup of
engine.pyto remove legacy duplicates
- Engine Acceleration: Ported high-performance SQL paths for
get_node()andcount_nodes() - Bulk Loading: New
bulk_create_nodes()andbulk_create_edges()methods with%NOINDEXsupport - Performance: Verified 80x speedup for single-node reads and 450x for counts vs standard Cypher
- Extreme Performance: Verified 38ms latency for 5,000-node property queries (at 10k entity scale)
- Subquery Stability: Optimized
REPLACEstring aggregation to avoid IRIS%QPARoptimizer bugs - Scale Verified: Robust E2E stress tests confirm industrial-grade performance for 10,000+ nodes
- Exact Collation: Added
%EXACTto VARCHAR columns for case-sensitive matching - Performance: Prevents default
UPPERcollation behavior in IRIS 2024.2+ - Case Sensitivity: Ensures node IDs, labels, and property keys are case-sensitive
- Fix SUBSCRIPT error: Removed
idx_props_key_valwhich caused errors with large values - Improved Performance: Maintained composite indexes that don't include large VARCHAR columns
- Revert to VARCHAR(64000): LONGVARCHAR broke REPLACE; VARCHAR(64000) keeps compatibility
- Large Values: 64KB property values, REPLACE works, no CAST needed
- v1.4.5 used LONGVARCHAR which broke REPLACE function
- v1.4.6 used CAST which broke on old schemas
- Bulk Loading Support:
%NOINDEXINSERTs,disable_indexes(),rebuild_indexes() - Fast Ingest: Skip index maintenance during bulk loads, rebuild after
- Composite Indexes: Added (s,key), (s,p), (p,o_id), (s,label) based on TrustGraph patterns
- 12 indexes total: Optimized for label filtering, property lookups, edge traversal
- Performance Indexes: Added indexes on rdf_labels, rdf_props, rdf_edges for fast graph traversal
- ensure_indexes(): New method to add indexes to existing databases
- Composite Index: Added (key, val) index on rdf_props for property value lookups
- Embedding API: Added
get_embedding(),get_embeddings(),delete_embedding()methods - Schema Prefix in Engine: All engine SQL now uses configurable schema prefix
- Schema Prefix Support:
set_schema_prefix('Graph_KG')for qualified table names - Pattern Operators Fixed:
CONTAINS,STARTS WITH,ENDS WITHnow work correctly - IRIS Compatibility: Removed recursive CTEs and
NULLS LAST(unsupported by IRIS) - ORDER BY Fix: Properties in ORDER BY now properly join rdf_props table
- type(r) Verified: Relationship type function works in RETURN/WHERE clauses
Author: Thomas Dyar (thomas.dyar@intersystems.com)