title	Auto Embedding 示例
summary	使用内置嵌入模型为你的文本数据自动生成嵌入向量。

Auto Embedding 示例

本示例展示如何通过 Auto Embedding 功能，结合 pytidb client 使用 Auto Embedding。

使用 pytidb client 连接 TiDB。
定义一个配置了 Auto Embedding 的 VectorField 的表。
插入纯文本数据：嵌入向量会在后台自动填充。
使用自然语言查询进行向量搜索：嵌入向量会透明地生成。

前置条件

在开始之前，请确保你具备以下条件：

Python (>=3.10)：安装 Python 3.10 或以上版本。
TiDB Cloud Starter 集群：你可以在 TiDB Cloud 上创建一个免费的 TiDB 集群。

运行方法

步骤 1. 克隆 `pytidb` 仓库

git clone https://github.com/pingcap/pytidb.git
cd pytidb/examples/auto_embedding/

步骤 2. 安装所需依赖包

python -m venv .venv
source .venv/bin/activate
pip install -r reqs.txt

步骤 3. 设置环境变量

在 TiDB Cloud 控制台中，进入 Clusters 页面，然后点击目标集群名称，进入其概览页面。
点击右上角的 Connect。会弹出连接对话框，显示连接参数。
根据连接参数如下设置环境变量：

cat > .env <<EOF
TIDB_HOST={gateway-region}.prod.aws.tidbcloud.com
TIDB_PORT=4000
TIDB_USERNAME={prefix}.root
TIDB_PASSWORD={password}
TIDB_DATABASE=test

# 默认使用 TiDB Cloud 免费嵌入模型，无需设置任何 API key
EMBEDDING_PROVIDER=tidbcloud_free
EOF

步骤 4. 运行示例

python main.py

预期输出：

=== Define embedding function ===
Embedding function (model id: tidbcloud_free/amazon/titan-embed-text-v2) defined

=== Define table schema ===
Table created

=== Truncate table ===
Table truncated

=== Insert sample data ===
Inserted 3 chunks

=== Perform vector search ===
id: 1, text: TiDB is a distributed database that supports OLTP, OLAP, HTAP and AI workloads., distance: 0.30373281240458805
id: 2, text: PyTiDB is a Python library for developers to connect to TiDB., distance: 0.422506501973434
id: 3, text: LlamaIndex is a Python library for building AI-powered applications., distance: 0.5267239638442787

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Auto Embedding 示例

前置条件

运行方法

步骤 1. 克隆 `pytidb` 仓库

步骤 2. 安装所需依赖包

步骤 3. 设置环境变量

步骤 4. 运行示例

相关资源

FilesExpand file tree

auto-embedding-with-pytidb.md

Latest commit

History

auto-embedding-with-pytidb.md

File metadata and controls

Auto Embedding 示例

前置条件

运行方法

步骤 1. 克隆 pytidb 仓库

步骤 2. 安装所需依赖包

步骤 3. 设置环境变量

步骤 4. 运行示例

相关资源

步骤 1. 克隆 `pytidb` 仓库