Skip to content

Latest commit

 

History

History
87 lines (62 loc) · 2.72 KB

File metadata and controls

87 lines (62 loc) · 2.72 KB
title summary
Auto Embedding Example
Automatically generate embeddings for your text data using built-in embedding models.

Auto Embedding Example

This example shows how to use the Auto Embedding feature with the pytidb client.

  1. Connect to TiDB using the pytidb client.
  2. Define a table with a VectorField configured for automatic embedding.
  3. Insert plain text data: embeddings are populated automatically in the background.
  4. Run vector searches with natural-language queries: embeddings are generated transparently.

Prerequisites

Before you begin, ensure you have the following:

  • Python (>=3.10): Install Python 3.10 or a later version.
  • A TiDB Cloud Starter cluster: You can create a free TiDB cluster on TiDB Cloud.

How to run

Step 1. Clone the pytidb repository

git clone https://github.com/pingcap/pytidb.git
cd pytidb/examples/auto_embedding/

Step 2. Install the required packages

python -m venv .venv
source .venv/bin/activate
pip install -r reqs.txt

Step 3. Set environment variables

  1. In the TiDB Cloud console, navigate to the Clusters page, and then click the name of your target cluster to go to its overview page.
  2. Click Connect in the upper-right corner. A connection dialog is displayed, with connection parameters listed.
  3. Set environment variables according to the connection parameters as follows:
cat > .env <<EOF
TIDB_HOST={gateway-region}.prod.aws.tidbcloud.com
TIDB_PORT=4000
TIDB_USERNAME={prefix}.root
TIDB_PASSWORD={password}
TIDB_DATABASE=test

# Using TiDB Cloud Free embedding model by default, which does not require setting up any API key
EMBEDDING_PROVIDER=tidbcloud_free
EOF

Step 4. Run the demo

python main.py

Expected output:

=== Define embedding function ===
Embedding function (model id: tidbcloud_free/amazon/titan-embed-text-v2) defined

=== Define table schema ===
Table created

=== Truncate table ===
Table truncated

=== Insert sample data ===
Inserted 3 chunks

=== Perform vector search ===
id: 1, text: TiDB is a distributed database that supports OLTP, OLAP, HTAP and AI workloads., distance: 0.30373281240458805
id: 2, text: PyTiDB is a Python library for developers to connect to TiDB., distance: 0.422506501973434
id: 3, text: LlamaIndex is a Python library for building AI-powered applications., distance: 0.5267239638442787

Related resources