Skip to content

Commit b014ea1

Browse files
author
github-actions
committed
update MD by dispatch event pingcap/docs release-cloud
1 parent 2ff77be commit b014ea1

File tree

3 files changed

+321
-0
lines changed

3 files changed

+321
-0
lines changed

markdown-pages/en/tidbcloud/master/TOC.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -237,6 +237,8 @@
237237
- AI Frameworks
238238
- [LlamaIndex](/tidb-cloud/vector-search-integrate-with-llamaindex.md)
239239
- [Langchain](/tidb-cloud/vector-search-integrate-with-langchain.md)
240+
- AI Services
241+
- [Amazon Bedrock](/tidb-cloud/vector-search-integrate-with-amazon-bedrock.md)
240242
- Embedding Models/Services
241243
- [Jina AI](/tidb-cloud/vector-search-integrate-with-jinaai-embedding.md)
242244
- ORM Libraries
Lines changed: 313 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,313 @@
1+
---
2+
title: Integrate TiDB Vector Search with Amazon Bedrock
3+
summary: Learn how to integrate TiDB Vector Search with Amazon Bedrock to build a Retrieval-Augmented Generation (RAG) Q&A bot.
4+
---
5+
6+
# Integrate TiDB Vector Search with Amazon Bedrock
7+
8+
This tutorial demonstrates how to integrate the [vector search](/tidb-cloud/vector-search-overview.md) feature of TiDB with [Amazon Bedrock](https://aws.amazon.com/bedrock/) to build a Retrieval-Augmented Generation (RAG) Q&A bot.
9+
10+
> **Note**
11+
>
12+
> TiDB Vector Search is available for TiDB Self-Managed (TiDB >= v8.4), TiDB Cloud Starter, and TiDB Cloud Essential.
13+
14+
> **Tip**
15+
>
16+
> You can view the complete [sample code](https://github.com/aws-samples/aws-generativeai-partner-samples/blob/main/tidb/samples/tidb-bedrock-boto3-rag.ipynb) in Notebook format.
17+
18+
## Prerequisites
19+
20+
To complete this tutorial, you need:
21+
22+
- [Python 3.11 or later](https://www.python.org/downloads/) installed
23+
- [Pip](https://pypi.org/project/pip/) installed
24+
- [AWS CLI](https://aws.amazon.com/cli/) installed
25+
26+
Ensure your AWS CLI profile is configured to a supported [Amazon Bedrock](https://aws.amazon.com/bedrock/) region for this tutorial. You can find the list of supported regions at [Amazon Bedrock Regions](https://docs.aws.amazon.com/bedrock/latest/userguide/models-regions.html). To switch to a supported region, run the following command:
27+
28+
```shell
29+
aws configure set region <your-region>
30+
```
31+
32+
- A TiDB Cloud Starter cluster or a TiDB Cloud Essential cluster
33+
34+
Follow [creating a TiDB Cloud cluster](/tidb-cloud/create-tidb-cluster-serverless.md) to create your own TiDB Cloud cluster if you don't have one.
35+
36+
- An AWS account with the [required permissions for Amazon Bedrock](https://docs.aws.amazon.com/bedrock/latest/userguide/security_iam_id-based-policy-examples.html) and access to the following models:
37+
38+
- **Amazon Titan Embeddings** (`amazon.titan-embed-text-v2:0`), used for generating text embeddings
39+
- **Meta Llama 3** (`us.meta.llama3-2-3b-instruct-v1:0`), used for text generation
40+
41+
If you don't have access, follow the instructions in [Request access to an Amazon Bedrock foundation model](https://docs.aws.amazon.com/bedrock/latest/userguide/getting-started.html#getting-started-model-access).
42+
43+
## Get started
44+
45+
This section provides step-by-step instructions for integrating TiDB Vector Search with Amazon Bedrock to build a RAG-based Q&A bot.
46+
47+
### Step 1. Set the environment variables
48+
49+
Get the TiDB connection information from the [TiDB Cloud console](https://console.tidb.io/signup?provider_source=alicloud) and set the environment variables in your development environment as follows:
50+
51+
1. Navigate to the [**Clusters**](https://console.tidb.io/project/clusters) page, and then click the name of your target cluster to go to its overview page.
52+
53+
2. Click **Connect** in the upper-right corner. A connection dialog is displayed.
54+
55+
3. Ensure the configurations in the connection dialog match your operating environment.
56+
57+
- **Connection Type** is set to `Public`
58+
- **Branch** is set to `main`
59+
- **Connect With** is set to `General`
60+
- **Operating System** matches your environment.
61+
62+
> **Tip:**
63+
>
64+
> If your program is running in Windows Subsystem for Linux (WSL), switch to the corresponding Linux distribution.
65+
66+
4. Click **Generate Password** to create a random password.
67+
68+
> **Tip:**
69+
>
70+
> If you have created a password before, you can either use the original password or click **Reset Password** to generate a new one.
71+
72+
5. Run the following commands in your terminal to set the environment variables. You need to replace the placeholders in the commands with the corresponding connection parameters obtained from the connection dialog.
73+
74+
```shell
75+
export TIDB_HOST=<your-tidb-host>
76+
export TIDB_PORT=4000
77+
export TIDB_USER=<your-tidb-user>
78+
export TIDB_PASSWORD=<your-tidb-password>
79+
export TIDB_DB_NAME=test
80+
```
81+
82+
### Step 2. Set up the Python virtual environment
83+
84+
1. Create a Python file named `demo.py`:
85+
86+
```shell
87+
touch demo.py
88+
```
89+
90+
2. Create and activate a virtual environment to manage dependencies:
91+
92+
```shell
93+
python3 -m venv env
94+
source env/bin/activate # On Windows, use env\Scripts\activate
95+
```
96+
97+
3. Install the required dependencies:
98+
99+
```shell
100+
pip install SQLAlchemy==2.0.30 PyMySQL==1.1.0 tidb-vector==0.0.9 pydantic==2.7.1 boto3
101+
```
102+
103+
### Step 3. Import required libraries
104+
105+
Add the following code to the beginning of `demo.py` to import the required libraries:
106+
107+
```python
108+
import os
109+
import json
110+
import boto3
111+
from sqlalchemy import Column, Integer, Text, create_engine
112+
from sqlalchemy.orm import declarative_base, Session
113+
from tidb_vector.sqlalchemy import VectorType
114+
```
115+
116+
### Step 4. Configure the database connection
117+
118+
In `demo.py`, add the following code to configure the database connection:
119+
120+
```python
121+
# ---- Configuration Setup ----
122+
# Set environment variables: TIDB_HOST, TIDB_PORT, TIDB_USER, TIDB_PASSWORD, TIDB_DB_NAME
123+
TIDB_HOST = os.environ.get("TIDB_HOST")
124+
TIDB_PORT = os.environ.get("TIDB_PORT")
125+
TIDB_USER = os.environ.get("TIDB_USER")
126+
TIDB_PASSWORD = os.environ.get("TIDB_PASSWORD")
127+
TIDB_DB_NAME = os.environ.get("TIDB_DB_NAME")
128+
129+
# ---- Database Setup ----
130+
def get_db_url():
131+
"""Build the database connection URL."""
132+
return f"mysql+pymysql://{TIDB_USER}:{TIDB_PASSWORD}@{TIDB_HOST}:{TIDB_PORT}/{TIDB_DB_NAME}?ssl_verify_cert=True&ssl_verify_identity=True"
133+
134+
# Create engine
135+
engine = create_engine(get_db_url(), pool_recycle=300)
136+
Base = declarative_base()
137+
```
138+
139+
### Step 5. Invoke the Amazon Titan Text Embeddings V2 model using the Bedrock runtime client
140+
141+
The Amazon Bedrock runtime client provides you with an `invoke_model` API that accepts the following parameters:
142+
143+
- `modelId`: the model ID of the foundation model available in Amazon Bedrock.
144+
- `accept`: the type of the input request.
145+
- `contentType`: the content type of the input.
146+
- `body`: a JSON string payload consisting of the prompt and the configurations.
147+
148+
In `demo.py`, add the following code to invoke the `invoke_model` API to generate text embeddings using Amazon Titan Text Embeddings and get responses from Meta Llama 3:
149+
150+
```python
151+
# Bedrock Runtime Client Setup
152+
bedrock_runtime = boto3.client('bedrock-runtime')
153+
154+
# ---- Model Invocation ----
155+
embedding_model_name = "amazon.titan-embed-text-v2:0"
156+
dim_of_embedding_model = 512
157+
llm_name = "us.meta.llama3-2-3b-instruct-v1:0"
158+
159+
160+
def embedding(content):
161+
"""Invoke Amazon Bedrock to get text embeddings."""
162+
payload = {
163+
"modelId": embedding_model_name,
164+
"contentType": "application/json",
165+
"accept": "*/*",
166+
"body": {
167+
"inputText": content,
168+
"dimensions": dim_of_embedding_model,
169+
"normalize": True,
170+
}
171+
}
172+
173+
body_bytes = json.dumps(payload['body']).encode('utf-8')
174+
175+
response = bedrock_runtime.invoke_model(
176+
body=body_bytes,
177+
contentType=payload['contentType'],
178+
accept=payload['accept'],
179+
modelId=payload['modelId']
180+
)
181+
182+
result_body = json.loads(response.get("body").read())
183+
return result_body.get("embedding")
184+
185+
186+
def generate_result(query: str, info_str: str):
187+
"""Generate answer using Meta Llama 3 model."""
188+
prompt = f"""
189+
ONLY use the content below to generate an answer:
190+
{info_str}
191+
192+
----
193+
Please carefully think about the question: {query}
194+
"""
195+
196+
payload = {
197+
"modelId": llm_name,
198+
"contentType": "application/json",
199+
"accept": "application/json",
200+
"body": {
201+
"prompt": prompt,
202+
"temperature": 0
203+
}
204+
}
205+
206+
body_bytes = json.dumps(payload['body']).encode('utf-8')
207+
208+
response = bedrock_runtime.invoke_model(
209+
body=body_bytes,
210+
contentType=payload['contentType'],
211+
accept=payload['accept'],
212+
modelId=payload['modelId']
213+
)
214+
215+
result_body = json.loads(response.get("body").read())
216+
completion = result_body["generation"]
217+
return completion
218+
```
219+
220+
### Step 6. Create a vector table
221+
222+
In `demo.py`, add the following code to create a vector table to store text and vector embeddings:
223+
224+
```python
225+
# ---- TiDB Setup and Vector Index Creation ----
226+
class Entity(Base):
227+
"""Define the Entity table with a vector index."""
228+
__tablename__ = "entity"
229+
id = Column(Integer, primary_key=True)
230+
content = Column(Text)
231+
content_vec = Column(VectorType(dim=dim_of_embedding_model), comment="hnsw(distance=l2)")
232+
233+
# Create the table in TiDB
234+
Base.metadata.create_all(engine)
235+
```
236+
237+
### Step 7. Save the vector data to TiDB Cloud
238+
239+
In `demo.py`, add the following code to save the vector data to your TiDB Cloud cluster:
240+
241+
```python
242+
# ---- Saving Vectors to TiDB ----
243+
def save_entities_with_embedding(session, contents):
244+
"""Save multiple entities with their embeddings to the TiDB database."""
245+
for content in contents:
246+
entity = Entity(content=content, content_vec=embedding(content))
247+
session.add(entity)
248+
session.commit()
249+
```
250+
251+
### Step 8. Run the application
252+
253+
1. In `demo.py`, add the following code to establish a database session, save embeddings to TiDB, ask an example question (such as "What is TiDB?"), and generate results from the model:
254+
255+
```python
256+
if __name__ == "__main__":
257+
# Establish a database session
258+
with Session(engine) as session:
259+
# Example data
260+
contents = [
261+
"TiDB is a distributed SQL database compatible with MySQL.",
262+
"TiDB supports Hybrid Transactional and Analytical Processing (HTAP).",
263+
"TiDB can scale horizontally and provides high availability.",
264+
"Amazon Bedrock allows seamless integration with foundation models.",
265+
"Meta Llama 3 is a powerful model for text generation."
266+
]
267+
268+
# Save embeddings to TiDB
269+
save_entities_with_embedding(session, contents)
270+
271+
# Example query
272+
query = "What is TiDB?"
273+
info_str = " ".join(contents)
274+
275+
# Generate result from Meta Llama 3
276+
result = generate_result(query, info_str)
277+
print(f"Generated answer: {result}")
278+
```
279+
280+
2. Save all changes to `demo.py` and run the script:
281+
282+
```shell
283+
python3 demo.py
284+
```
285+
286+
The expected output is similar to the following:
287+
288+
```
289+
Generated answer: What is the main purpose of TiDB?
290+
What are the key features of TiDB?
291+
What are the key benefits of TiDB?
292+
293+
----
294+
Based on the provided text, here is the answer to the question:
295+
What is TiDB?
296+
TiDB is a distributed SQL database compatible with MySQL.
297+
298+
## Step 1: Understand the question
299+
The question asks for the definition of TiDB.
300+
301+
## Step 2: Identify the key information
302+
The key information provided in the text is that TiDB is a distributed SQL database compatible with MySQL.
303+
304+
## Step 3: Provide the answer
305+
Based on the provided text, TiDB is a distributed SQL database compatible with MySQL.
306+
307+
The final answer is: TiDB is a distributed SQL database compatible with MySQL.
308+
```
309+
310+
## See also
311+
312+
- [Vector Data Types](/tidb-cloud/vector-search-data-types.md)
313+
- [Vector Search Index](/tidb-cloud/vector-search-index.md)

markdown-pages/en/tidbcloud/master/tidb-cloud/vector-search-integration-overview.md

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -22,6 +22,12 @@ TiDB provides official support for the following AI frameworks, enabling you to
2222

2323
Moreover, you can also use TiDB for various purposes, such as document storage and knowledge graph storage for AI applications.
2424

25+
## AI services
26+
27+
TiDB Vector Search supports integration with the following AI service, enabling you to easily build Retrieval-Augmented Generation (RAG) based applications.
28+
29+
- [Amazon Bedrock](/tidb-cloud/vector-search-integrate-with-amazon-bedrock.md)
30+
2531
## Embedding models and services
2632

2733
TiDB Vector Search supports storing vectors of up to 16383 dimensions, which accommodates most embedding models.

0 commit comments

Comments
 (0)