| title | summary | aliases | |||
|---|---|---|---|---|---|
Integrate TiDB Vector Search with Django ORM |
Learn how to integrate TiDB Vector Search with Django ORM to store embeddings and perform semantic search. |
|
This tutorial walks you through how to use the Django ORM to interact with TiDB Vector Search, store embeddings, and perform vector search queries.
Note:
- The vector search feature is in beta. It might be changed without prior notice. If you find a bug, you can report an issue on GitHub.
- The vector search feature is available on TiDB Self-Managed, {{{ .starter }}}, {{{ .essential }}}, and TiDB Cloud Dedicated. For TiDB Self-Managed and TiDB Cloud Dedicated, the TiDB version must be v8.4.0 or later (v8.5.0 or later is recommended).
To complete this tutorial, you need:
- Python 3.8 or higher installed.
- Git installed.
- A TiDB cluster.
If you don't have a TiDB cluster, you can create one as follows:
- (Recommended) Follow Creating a {{{ .starter }}} cluster to create your own TiDB Cloud cluster.
- Follow Deploy a local test TiDB cluster or Deploy a production TiDB cluster to create a local cluster.
You can quickly learn how to integrate TiDB Vector Search with Django ORM by following the steps below.
Clone the tidb-vector-python repository to your local machine:
git clone https://github.com/pingcap/tidb-vector-python.gitCreate a virtual environment for your project:
cd tidb-vector-python/examples/orm-django-quickstart
python3 -m venv .venv
source .venv/bin/activateInstall the required dependencies for the demo project:
pip install -r requirements.txtAlternatively, you can install the following packages for your project:
pip install Django django-tidb mysqlclient numpy python-dotenvIf you encounter installation issues with mysqlclient, refer to the mysqlclient official documentation.
django-tidb is a TiDB dialect for Django, which enhances the Django ORM to support TiDB-specific features (for example, Vector Search) and resolves compatibility issues between TiDB and Django.
To install django-tidb, choose a version that matches your Django version. For example, if you are using django==4.2.*, install django-tidb==4.2.*. The minor version does not need to be the same. It is recommended to use the latest minor version.
For more information, refer to django-tidb repository.
Configure the environment variables depending on the TiDB deployment option you've selected.
For a {{{ .starter }}} cluster, take the following steps to obtain the cluster connection string and configure environment variables:
-
Navigate to the Clusters page, and then click the name of your target cluster to go to its overview page.
-
Click Connect in the upper-right corner. A connection dialog is displayed.
-
Ensure the configurations in the connection dialog match your operating environment.
- Connection Type is set to
Public - Branch is set to
main - Connect With is set to
General - Operating System matches your environment.
Tip:
If your program is running in Windows Subsystem for Linux (WSL), switch to the corresponding Linux distribution.
- Connection Type is set to
-
Copy the connection parameters from the connection dialog.
Tip:
If you have not set a password yet, click Generate Password to generate a random password.
-
In the root directory of your Python project, create a
.envfile and paste the connection parameters to the corresponding environment variables.TIDB_HOST: The host of the TiDB cluster.TIDB_PORT: The port of the TiDB cluster.TIDB_USERNAME: The username to connect to the TiDB cluster.TIDB_PASSWORD: The password to connect to the TiDB cluster.TIDB_DATABASE: The database name to connect to.TIDB_CA_PATH: The path to the root certificate file.
The following is an example for macOS:
TIDB_HOST=gateway01.****.prod.aws.tidbcloud.com TIDB_PORT=4000 TIDB_USERNAME=********.root TIDB_PASSWORD=******** TIDB_DATABASE=test TIDB_CA_PATH=/etc/ssl/cert.pem
For a TiDB Self-Managed cluster, create a .env file in the root directory of your Python project. Copy the following content into the .env file, and modify the environment variable values according to the connection parameters of your TiDB cluster:
TIDB_HOST=127.0.0.1
TIDB_PORT=4000
TIDB_USERNAME=root
TIDB_PASSWORD=
TIDB_DATABASE=testIf you are running TiDB on your local machine, TIDB_HOST is 127.0.0.1 by default. The initial TIDB_PASSWORD is empty, so if you are starting the cluster for the first time, you can omit this field.
The following are descriptions for each parameter:
TIDB_HOST: The host of the TiDB cluster.TIDB_PORT: The port of the TiDB cluster.TIDB_USERNAME: The username to connect to the TiDB cluster.TIDB_PASSWORD: The password to connect to the TiDB cluster.TIDB_DATABASE: The name of the database you want to connect to.
Migrate the database schema:
python manage.py migrateRun the Django development server:
python manage.py runserverOpen your browser and visit http://127.0.0.1:8000 to try the demo application. Here are the available API paths:
| API Path | Description |
|---|---|
POST: /insert_documents |
Insert documents with embeddings. |
GET: /get_nearest_neighbors_documents |
Get the 3-nearest neighbor documents. |
GET: /get_documents_within_distance |
Get documents within a certain distance. |
You can refer to the following sample code snippets to complete your own application development.
In the file sample_project/settings.py, add the following configurations:
dotenv.load_dotenv()
DATABASES = {
"default": {
# https://github.com/pingcap/django-tidb
"ENGINE": "django_tidb",
"HOST": os.environ.get("TIDB_HOST", "127.0.0.1"),
"PORT": int(os.environ.get("TIDB_PORT", 4000)),
"USER": os.environ.get("TIDB_USERNAME", "root"),
"PASSWORD": os.environ.get("TIDB_PASSWORD", ""),
"NAME": os.environ.get("TIDB_DATABASE", "test"),
"OPTIONS": {
"charset": "utf8mb4",
},
}
}
TIDB_CA_PATH = os.environ.get("TIDB_CA_PATH", "")
if TIDB_CA_PATH:
DATABASES["default"]["OPTIONS"]["ssl_mode"] = "VERIFY_IDENTITY"
DATABASES["default"]["OPTIONS"]["ssl"] = {
"ca": TIDB_CA_PATH,
}You can create a .env file in the root directory of your project and set up the environment variables TIDB_HOST, TIDB_PORT, TIDB_USERNAME, TIDB_PASSWORD, TIDB_DATABASE, and TIDB_CA_PATH with the actual values of your TiDB cluster.
tidb-django provides a VectorField to store vector embeddings in a table.
Create a table with a column named embedding that stores a 3-dimensional vector.
class Document(models.Model):
content = models.TextField()
embedding = VectorField(dimensions=3)Document.objects.create(content="dog", embedding=[1, 2, 1])
Document.objects.create(content="fish", embedding=[1, 2, 4])
Document.objects.create(content="tree", embedding=[1, 0, 0])TiDB Vector supports the following distance functions:
L1DistanceL2DistanceCosineDistanceNegativeInnerProduct
Search for the top-3 documents that are semantically closest to the query vector [1, 2, 3] based on the cosine distance function.
results = Document.objects.annotate(
distance=CosineDistance('embedding', [1, 2, 3])
).order_by('distance')[:3]Search for the documents whose cosine distance from the query vector [1, 2, 3] is less than 0.2.
results = Document.objects.annotate(
distance=CosineDistance('embedding', [1, 2, 3])
).filter(distance__lt=0.2).order_by('distance')[:3]