langchain-oceanbase

This package contains the LangChain integration with OceanBase. Current version: 0.3.3

OceanBase Database is a distributed relational database. It is developed entirely by Ant Group. The OceanBase Database is built on a common server cluster. Based on the Paxos protocol and its distributed structure, the OceanBase Database provides high availability and linear scalability.

OceanBase currently has the ability to store vectors. Users can easily perform the following operations with SQL:

Create a table containing vector type fields;
Create a vector index table based on the HNSW algorithm;
Perform vector approximate nearest neighbor queries;
...

LangChain Integration

OceanbaseVectorStore is the official LangChain integration for OceanBase.

Support for ChatMessageHistory is provided as an additional integration and is not part of the official VectorStore API.

Official documentation: https://python.langchain.com/docs/integrations/vectorstores/oceanbase/

Features

Built-in Embedding: Built-in embedding function using all-MiniLM-L6-v2 model (384 dimensions) with no API keys required. Perfect for quick prototyping and local development.
- No API Keys Required: Uses local ONNX models, no external API calls needed
- Quick Start: Perfect for rapid prototyping and testing
- LangChain Compatible: Fully compatible with LangChain's Embeddings interface
- Batch Processing: Supports efficient batch embedding generation
- Automatic Integration: Can be automatically used in OceanbaseVectorStore by setting embedding_function=None
- Technical Specs: Model all-MiniLM-L6-v2, 384 dimensions, ONNX Runtime inference
Vector Storage: Store embeddings from any LangChain embedding model in OceanBase with automatic table creation and index management.
Embedded SeekDB (optional): Run local embedded SeekDB through pyobvector (path= or pyseekdb_client= on OceanbaseVectorStore) without OceanBase; requires pyobvector[pyseekdb] or a recent pyseekdb that installs pylibseekdb. See docs/vectorstores.md#embedded-seekdb-optional and examples/embedded_seekdb_vectorstore.py.
Similarity Search: Perform efficient similarity searches on vector data with multiple distance metrics (L2, cosine, inner product).
Hybrid Search: Combine vector search with sparse vector search and full-text search for improved results with configurable weights.
Maximal Marginal Relevance: Filter for diversity in search results to avoid redundant information.
Multiple Index Types: Support for HNSW, IVF, FLAT and other vector index types with automatic parameter optimization.
Sparse Embeddings: Native support for sparse vector embeddings with BM25-like functionality.
Advanced Filtering: Built-in support for metadata filtering and complex query conditions.
Async Support: Full support for async operations and high-concurrency scenarios.
LangGraph Checkpointer (0.3.3+): Persist LangGraph conversation checkpoints in OceanBase via OceanBaseCheckpointSaver; supports time-travel and multi-thread state. See Migration Guide and examples/langgraph_agent.py.
Custom Exceptions (0.3.3+): OceanBaseError, OceanBaseConnectionError, OceanBaseVectorDimensionError, OceanBaseIndexError, OceanBaseVersionError, OceanBaseConfigurationError with troubleshooting links in messages.

Installation

pip install -U langchain-oceanbase

Requirements

Python >=3.11
langchain-core >=1.0.0
pyobvector >=0.2.0 (required for database client)
pyseekdb >=0.1.0 (required dependency; use >=1.2 on supported platforms for embedded SeekDB and the pylibseekdb runtime)

Tip: The current version (0.3.3) supports langchain-core >=1.0.0. See CHANGELOG.md for version history.

Platform Support

✅ Linux: Full support (x86_64 and ARM64)
✅ macOS/Windows: Supported - pyobvector works on all platforms

Built-in Embedding Dependencies

For built-in embedding functionality (no API keys required), pyseekdb is automatically installed as an optional dependency. It provides:

Local ONNX-based embedding inference
Default embedding model: all-MiniLM-L6-v2 (384 dimensions)
No external API calls needed

We recommend using Docker to deploy OceanBase:

docker run --name=oceanbase -e MODE=mini -e OB_SERVER_IP=127.0.0.1 -p 2881:2881 -d oceanbase/oceanbase-ce:latest

For AI Functions support, use OceanBase 4.4.1 or later:

docker run --name=oceanbase -e MODE=mini -e OB_SERVER_IP=127.0.0.1 -p 2881:2881 -d oceanbase/oceanbase-ce:4.4.1.0-100000032025101610

More methods to deploy OceanBase cluster

Usage

Documentation Formats

Choose your preferred format:

Jupyter Notebook - Interactive notebook with executable code cells
Markdown - Static documentation for easy reading (includes embedded SeekDB)
Embedded SeekDB example - Runnable script using local SeekDB without Docker

Additional Resources

Built-in Embedding Guide - Interactive notebook for built-in embedding functionality
Built-in Embedding Guide (Markdown) - Static documentation for built-in embeddings
Hybrid Search Guide - Interactive notebook for hybrid search features
Hybrid Search Guide (Markdown) - Static documentation for hybrid search
AI Functions Guide - Documentation for AI Functions (AI_EMBED, AI_COMPLETE, AI_RERANK)
AI Functions Guide (Notebook) - Interactive notebook for AI Functions
Migration Guide - Migrating to LangGraph Checkpointer and schema changes

Built-in Embedding Sections:

Installation - Install required packages
Direct Use - Use DefaultEmbeddingFunction directly
LangChain Compatible - Use DefaultEmbeddingFunctionAdapter
Vector Store Integration - Use in OceanbaseVectorStore
Text Similarity - Compute similarity between texts
Performance - Batch vs single processing comparison

Hybrid Search Sections:

Setup - Deploy OceanBase and install packages
Vector Search - Semantic similarity matching
Sparse Vector Search - Keyword-based exact matching
Full-text Search - Content-based text search
Multi-modal Search - Combined search strategies

AI Functions Sections:

Setup - Deploy OceanBase and configure AI models
Initialization - Configure and create AI functions client
AI_EMBED - Convert text to vector embeddings
AI_COMPLETE - Generate text completions
AI_RERANK - Rerank search results
Model Configuration API - Setup AI models and endpoints

Quick Start

Using Built-in Embedding (No API Keys Required)

The simplest way to get started is using the built-in embedding function, which requires no API keys. Prerequisite: OceanBase must be running (e.g. docker run --name=oceanbase -e MODE=mini -e OB_SERVER_IP=127.0.0.1 -p 2881:2881 -d oceanbase/oceanbase-ce:latest).

from langchain_oceanbase.vectorstores import OceanbaseVectorStore
from langchain_core.documents import Document

# Connection configuration
connection_args = {
    "host": "127.0.0.1",
    "port": "2881",
    "user": "root@test",
    "password": "",
    "db_name": "test",
}

# Use default embedding (set embedding_function=None)
vector_store = OceanbaseVectorStore(
    embedding_function=None,  # Automatically uses DefaultEmbeddingFunction
    table_name="langchain_vector",
    connection_args=connection_args,
    vidx_metric_type="l2",
    drop_old=True,
    embedding_dim=384,  # all-MiniLM-L6-v2 dimension
)

# Add documents
documents = [
    Document(page_content="Machine learning is a subset of artificial intelligence"),
    Document(page_content="Python is a popular programming language"),
    Document(page_content="OceanBase is a distributed relational database"),
]
ids = vector_store.add_documents(documents)

# Perform similarity search
results = vector_store.similarity_search("artificial intelligence", k=2)
for doc in results:
    print(f"* {doc.page_content}")

You can verify this example without OceanBase (imports and constructor only) by running: poetry run python tests/run_readme_quickstart.py.

Key Benefits of Built-in Embedding:

✅ No API keys or external services required
✅ Works offline with local ONNX models
✅ Fast batch processing
✅ Perfect for prototyping and testing
✅ Model files (~80MB) downloaded automatically on first use

Additional Quick Start Guides

Setup - Deploy OceanBase and install dependencies
Initialization - Configure and create vector store
Manage vector store - Add, update, and delete vectors
Query vector store - Search and retrieve vectors
Build RAG(Retrieval Augmented Generation) - Build powerful RAG applications
Full-text Search - Implement full-text search capabilities
Hybrid Search - Combine vector and text search for better results
Advanced Filtering - Metadata filtering and complex query conditions
Maximal Marginal Relevance - Filter for diversity in search results
Multiple Index Types - Different vector index types (HNSW, IVF, FLAT)

Troubleshooting

Connection Refused

Error: Can't connect to MySQL server on 'localhost' or ConnectionRefusedError

Cause: OceanBase is not running or not accessible on the specified host/port.

Solution:

Check if OceanBase is running:
```
docker ps | grep oceanbase
```
Start OceanBase if not running:
```
docker start oceanbase
```
Verify the port is correct (default: 2881 for local, 3306 for cloud)
Check firewall settings if connecting to remote server

Vector Dimension Mismatch

Error: Vector dimension mismatch or OceanBaseVectorDimensionError

Cause: The embedding model's output dimension doesn't match the table's vector dimension.

Solution:

Check your embedding model's output dimension (e.g., all-MiniLM-L6-v2 outputs 384 dimensions)
Set the correct embedding_dim parameter when initializing OceanbaseVectorStore

If the embedding model changed, recreate the table with drop_old=True:

vector_store = OceanbaseVectorStore(
    embedding_function=new_embedding,
    embedding_dim=new_dim,
    drop_old=True,  # Recreate table with new dimension
    ...
)

Index Creation Failed

Error: Failed to create index or OceanBaseIndexError

Cause: Insufficient memory, incompatible OceanBase version, or invalid index parameters.

Solution:

Check available memory on your OceanBase server
Verify OceanBase version supports the index type:
- HNSW: OceanBase 4.3.0+
- IVF variants: OceanBase 4.3.0+

Try a simpler index type for small datasets:

vector_store = OceanbaseVectorStore(
    index_type="FLAT",  # No index, exact search
    ...
)

For HNSW, reduce M parameter if memory is limited:

vector_store = OceanbaseVectorStore(
    index_type="HNSW",
    vidx_algo_params={"M": 8, "efConstruction": 100},
    ...
)

AI Functions Not Supported

Error: AI functions are not supported or OceanBaseVersionError

Cause: OceanBase version is older than 4.4.1, which is required for AI functions.

Solution:

Upgrade to OceanBase 4.4.1 or later:

docker run --name=oceanbase -e MODE=mini -e OB_SERVER_IP=127.0.0.1 \
    -p 2881:2881 -d oceanbase/oceanbase-ce:4.4.1.0-100000032025101610

Alternatively, use SeekDB which also supports AI functions
Check current version:
```
SELECT version();
```

Slow Queries

Cause: Missing vector index, wrong index type, or suboptimal search parameters.

Solution:

Ensure a vector index is created (check with SHOW INDEX FROM table_name)
Use appropriate index type:
- HNSW: Best for large datasets with high recall requirements
- IVF_FLAT: Good balance of speed and accuracy
- FLAT: Best accuracy but slowest (no index)

Tune search parameters for HNSW:

# Higher efSearch = better accuracy but slower
vector_store.hnsw_ef_search = 128  # Default is 64

For IVF indexes, adjust nprobe parameter

Sparse Vector / Full-text Search Not Working

Error: Sparse vector support not enabled or Full-text search support not enabled

Cause: The vector store was not initialized with sparse/fulltext support.

Solution:

# Enable sparse vector support
vector_store = OceanbaseVectorStore(
    include_sparse=True,
    ...
)

# Enable both sparse and full-text search
vector_store = OceanbaseVectorStore(
    include_sparse=True,
    include_fulltext=True,
    ...
)

Note: Full-text search requires include_sparse=True to be set as well.

Import Errors

Error: ModuleNotFoundError: No module named 'pyobvector'

Cause: Required dependencies are not installed.

Solution:

pip install -U langchain-oceanbase pyobvector

For AI functions support:

pip install -U langchain-oceanbase pyobvector langgraph-checkpoint

Quickstart

A short quickstart to run the local dev environment and example scripts.

Prerequisites:

Git
Docker & Docker Compose
Python 3.10+
(Optional) OpenAI API key for embeddings / LLM examples

Clone the repo

git clone https://github.com/oceanbase/langchain-oceanbase.git
cd langchain-oceanbase

Start the local database

# start OceanBase
make docker-up

# or start SeekDB (lightweight alternative)
make docker-up-seek

Set environment variables (create a .env file or export them)

OB_HOST=127.0.0.1
OB_PORT=3306
OB_USER=root
OB_PASSWORD=changeme
OB_DB=langchain_ob_demo
OPENAI_API_KEY=sk-...

Install example dependencies (examples use these packages)

pip install openai mysql-connector-python numpy

Run an example

python examples/quickstart.py
python examples/rag_demo.py
python examples/hybrid_search_demo.py

Files of interest

docker-compose.yml — OceanBase CE service for local development
docker-compose.seekdb.yml — SeekDB lightweight alternative
Makefile — convenience targets: make docker-up, make docker-down, make docker-logs, plus format/lint/typecheck/test helpers
CONTRIBUTING.md — developer setup, running tests, code style, PR process
examples/ — quickstart.py, rag_demo.py, hybrid_search_demo.py, and examples/README.md

Running tests and linters

Unit tests (no database required):

make test
# or: poetry run pytest tests/unit_tests/

Integration tests (requires OceanBase/SeekDB, e.g. make docker-up):

make docker-up
make integration_tests
# or: poetry run pytest tests/integration_tests/

Lint / formatting:

make format   # code formatting (ruff format + import sort)
make lint    # ruff check + mypy

Contributing

See CONTRIBUTING.md for detailed developer setup and the PR process. When submitting a PR, please:

Base your branch on main
Reference the issue (e.g., Closes #43) in the PR body
Run linters and tests locally

Name		Name	Last commit message	Last commit date
Latest commit History 74 Commits
.github		.github
docs		docs
examples		examples
langchain_oceanbase		langchain_oceanbase
scripts		scripts
tests		tests
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
docker-compose.seekdb.yml		docker-compose.seekdb.yml
docker-compose.yml		docker-compose.yml
poetry.lock		poetry.lock
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Folders and files

Latest commit

History

Repository files navigation

langchain-oceanbase

LangChain Integration

Features

Installation

Requirements

Platform Support

Built-in Embedding Dependencies

Usage

Documentation Formats

Additional Resources

Built-in Embedding Sections:

Hybrid Search Sections:

AI Functions Sections:

Quick Start

Using Built-in Embedding (No API Keys Required)

Additional Quick Start Guides

Troubleshooting

Connection Refused

Vector Dimension Mismatch

Index Creation Failed

AI Functions Not Supported

Slow Queries

Sparse Vector / Full-text Search Not Working

Import Errors

Quickstart

Files of interest

Running tests and linters

Contributing

About

Resources

License

Code of conduct

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases 12

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages