Skip to content

Commit 42b4ee0

Browse files
authored
docs: add more caution for the OpenAI api (#10901)
* docs: add more caution for the OpenAI apii * fix link * fix link
1 parent f60eb15 commit 42b4ee0

File tree

6 files changed

+85
-18
lines changed

6 files changed

+85
-18
lines changed

docs/doc/01-guides/index.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -86,8 +86,8 @@ These tutorials are intended to help you get started with Databend:
8686

8787
* [Generating SQL with AI](../15-sql-functions/61-ai-functions/01-ai-to-sql.md)
8888
* [Creating Embedding Vectors](../15-sql-functions/61-ai-functions/02-ai-embedding-vector.md)
89-
* [Computing Text Similarities](../15-sql-functions/61-ai-functions/03-ai-cosine-distance.md)
90-
* [Text Completion with AI](../15-sql-functions/61-ai-functions/04-ai-text-completion.md)
89+
* [Text Completion with AI](../15-sql-functions/61-ai-functions/03-ai-text-completion.md)
90+
* [Computing Text Similarities](../15-sql-functions/61-ai-functions/04-ai-cosine-distance.md)
9191

9292
## Backup & Restore
9393

docs/doc/15-sql-functions/61-ai-functions/01-ai-to-sql.md

Lines changed: 11 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -2,14 +2,24 @@
22
title: 'AI_TO_SQL'
33
---
44

5-
Converts natural language instructions into SQL queries with the latest [Codex](https://platform.openai.com/docs/models/codex) model `code-davinci-002`.
5+
Converts natural language instructions into SQL queries with the latest model `text-davinci-003`.
66

77
Databend offers an efficient solution for constructing SQL queries by incorporating OLAP and AI. Through this function, instructions written in a natural language can be converted into SQL query statements that align with the table schema. For example, the function can be provided with a sentence like "Get all items that cost 10 dollars or less" as an input and generate the corresponding SQL query "SELECT * FROM items WHERE price <= 10" as output.
88

9+
The main code implementation can be found [here](https://github.com/datafuselabs/databend/blob/1e93c5b562bd159ecb0f336bb88fd1b7f9dc4a62/src/query/service/src/table_functions/openai/ai_to_sql.rs).
10+
911
:::note
1012
The SQL query statements generated adhere to the PostgreSQL standards, so they might require manual revisions to align with the syntax of Databend.
1113
:::
1214

15+
:::caution
16+
Databend relies on OpenAI for `AI_TO_SQL` but only sends the table schema to OpenAI, not the data.
17+
18+
They will only work when the Databend configuration includes the `openai_api_key`, otherwise they will be inactive.
19+
20+
This function is available by default on [Databend Cloud](https://databend.com) using our self OpenAI key. If you use them, you acknowledge that your table schema will be sent to OpenAI by us.
21+
:::
22+
1323
## Syntax
1424

1525
```sql

docs/doc/15-sql-functions/61-ai-functions/02-ai-embedding-vector.md

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -5,6 +5,16 @@ description: 'Creating embeddings using the ai_embedding_vector function in Data
55

66
This document provides an overview of the ai_embedding_vector function in Databend and demonstrates how to create document embeddings using this function.
77

8+
The main code implementation can be found [here](https://github.com/datafuselabs/databend/blob/1e93c5b562bd159ecb0f336bb88fd1b7f9dc4a62/src/common/openai/src/embedding.rs).
9+
10+
:::caution
11+
Databend relies on OpenAI for `AI_EMBEDDING_VECTOR` and sends the embedding column data to OpenAI.
12+
13+
They will only work when the Databend configuration includes the `openai_api_key`, otherwise they will be inactive.
14+
15+
This function is available by default on [Databend Cloud](https://databend.com) using our self OpenAI key. If you use them, you acknowledge that your table schema will be sent to OpenAI by us.
16+
:::
17+
818
## Overview of ai_embedding_vector
919

1020

docs/doc/15-sql-functions/61-ai-functions/04-ai-text-completion.md renamed to docs/doc/15-sql-functions/61-ai-functions/03-ai-text-completion.md

Lines changed: 13 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -3,7 +3,18 @@ title: 'AI_TEXT_COMPLETION'
33
description: 'Generating text completions using the ai_text_completion function in Databend'
44
---
55

6-
This document provides an overview of the ai_text_completion function in Databend and demonstrates how to generate text completions using this function.
6+
This document provides an overview of the `ai_text_completion` function in Databend and demonstrates how to generate text completions using this function.
7+
8+
The main code implementation can be found [here](https://github.com/datafuselabs/databend/blob/1e93c5b562bd159ecb0f336bb88fd1b7f9dc4a62/src/common/openai/src/completion.rs).
9+
10+
:::caution
11+
Databend relies on OpenAI for `AI_TEXT_COMPLETION` and sends the completion prompt data to OpenAI.
12+
13+
They will only work when the Databend configuration includes the `openai_api_key`, otherwise they will be inactive.
14+
15+
This function is available by default on [Databend Cloud](https://databend.com) using our self OpenAI key. If you use them, you acknowledge that your table schema will be sent to OpenAI by us.
16+
:::
17+
718

819
## Overview of ai_text_completion
920

@@ -27,4 +38,4 @@ Result:
2738
+--------------------------------------------------------------------------------------------------------------------+
2839
```
2940

30-
In this example, we provide the prompt "What is artificial intelligence?" to the ai_text_completion function, and it returns a generated completion that briefly describes artificial intelligence.
41+
In this example, we provide the prompt "What is artificial intelligence?" to the `ai_text_completion` function, and it returns a generated completion that briefly describes artificial intelligence.

docs/doc/15-sql-functions/61-ai-functions/03-ai-cosine-distance.md renamed to docs/doc/15-sql-functions/61-ai-functions/04-ai-cosine-distance.md

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -5,6 +5,10 @@ description: 'Measuring document similarity using the cosine_distance function i
55

66
This document provides an overview of the `cosine_distance` function in Databend and demonstrates how to measure document similarity using this function.
77

8+
:::info
9+
The `cosine_distance` function performs vector computations within Databend and does not rely on the OpenAI API.
10+
:::
11+
812
## Overview of cosine_distance
913

1014
The `cosine_distance` function in Databend is a built-in function that calculates the cosine distance between two vectors. It is commonly used in natural language processing tasks, such as document similarity and recommendation systems.

docs/doc/15-sql-functions/61-ai-functions/index.md

Lines changed: 45 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -1,34 +1,66 @@
11
---
22
title: 'AI Functions'
3-
description: 'SQL-based Knowledge Base Search and Completion using Databend'
3+
description: 'Using SQL-based AI Functions for Knowledge Base Search and Text Completion'
44
---
55

6-
This document demonstrates how to leverage Databend's built-in AI functions for creating document embeddings, searching for similar documents, and generating text completions based on context.
6+
This document demonstrates how to leverage Databend's built-in AI functions for creating document embeddings, searching for similar documents, and generating text completions based on context. We will guide you through a simple example that shows how to create and store embeddings, find related documents, and generate completions using various AI functions.
77

8-
We will guide you through a simple example that shows how to create and store embeddings using the `ai_embedding_vector` function, find related documents with the `cosine_distance` function, and generate completions using the `ai_text_completion` function.
8+
:::caution
9+
10+
Databend relies on OpenAI for embeddings and text completions, which means your data will be sent to OpenAI. Exercise caution when using these functions.
11+
12+
They will only work when the Databend configuration includes the `openai_api_key`, otherwise they will be inactive.
13+
14+
These functions are available by default on [Databend Cloud](https://databend.com) using our self OpenAI key. If you use them, you acknowledge that your data will be sent to OpenAI by us.
15+
16+
:::
917

1018
## Introduction to embeddings
1119

12-
Embeddings are vector representations of text data that capture the semantic meaning and context of the original text. They can be used to compare and analyze text in various natural language processing tasks, such as document similarity, clustering, and recommendation.
20+
Embeddings are vector representations of text data that capture the semantic meaning and context of the original text. They can be used to compare and analyze text in various natural language processing tasks, such as document similarity, clustering, and recommendation systems.
1321

1422
## How do embeddings work?
1523

16-
Embeddings work by converting text into high-dimensional vectors in such a way that similar texts are closer together in the vector space. This is achieved by training a model on a large corpus of text, which learns to represent the words and phrases in a continuous space that captures their semantic relationships.
17-
Embeddings are vector representations of text data that capture the semantic meaning and context of the original text. They are widely used in various natural language processing tasks, such as document similarity, clustering, and recommendation systems.
24+
Embeddings work by converting text into high-dimensional vectors in such a way that similar texts are closer together in the vector space.
25+
26+
This is achieved by training a model on a large corpus of text, which learns to represent the words and phrases in a continuous space that captures their semantic relationships.
27+
28+
To illustrate how embeddings work, let's consider a simple example. Suppose we have the following sentences:
29+
1. `"The cat sat on the mat."`
30+
2. `"The dog sat on the rug."`
31+
3. `"The quick brown fox jumped over the lazy dog."`
32+
33+
When creating embeddings for these sentences, the model will convert the text into high-dimensional vectors in such a way that similar sentences are closer together in the vector space.
34+
35+
For instance, the embeddings of sentences 1 and 2 will be closer to each other because they share a similar structure and meaning (both involve an animal sitting on something). On the other hand, the embedding of sentence 3 will be farther from the embeddings of sentences 1 and 2 because it has a different structure and meaning.
36+
37+
The embeddings could look like this (simplified for illustration purposes):
38+
39+
1. `[0.2, 0.3, 0.1, 0.7, 0.4]`
40+
2. `[0.25, 0.29, 0.11, 0.71, 0.38]`
41+
3. `[-0.1, 0.5, 0.6, -0.3, 0.8]`
42+
43+
In this simplified example, you can see that the embeddings of sentences 1 and 2 are closer to each other in the vector space, while the embedding of sentence 3 is farther away. This illustrates how embeddings can capture semantic relationships and be used to compare and analyze text data.
44+
45+
46+
## What is a Vector Database?
47+
48+
A vector database is a specialized database designed to store, manage, and search high-dimensional vector data efficiently. These databases are optimized for similarity search operations, such as finding the nearest neighbors of a given vector. They are particularly useful in scenarios where the data has high dimensionality, like embeddings in natural language processing tasks, image feature vectors, and more.
49+
50+
Typically, embedding vectors are stored in specialized vector databases like milvus, pinecone, qdrant, or weaviate. Databend can also store embedding vectors using the ARRAY(FLOAT32) data type and perform similarity computations with the cosine_distance function in SQL. To create embeddings for a text document using Databend, you can use the built-in ai_embedding_vector function directly in your SQL query.
1851

1952
## Databend AI Functions
2053

2154
Databend provides built-in AI functions for various natural language processing tasks. The main functions covered in this document are:
2255

2356
- [ai_embedding_vector](./02-ai-embedding-vector.md): Generates embeddings for text documents.
24-
- [cosine_distance](./03-ai-cosine-distance.md): Calculates the cosine distance between two embeddings.
25-
- [ai_text_completion](./04-ai-text-completion.md): Generates text completions based on a given prompt.
26-
These functions are powered by open-source natural language processing models and can be used directly within SQL queries.
57+
- [ai_text_completion](./03-ai-text-completion.md): Generates text completions based on a given prompt.
58+
- [cosine_distance](./04-ai-cosine-distance.md): Calculates the cosine distance between two embeddings.
2759

2860
## Creating and storing embeddings using Databend
2961

30-
To create embeddings for a text document using Databend, you can use the built-in ai_embedding_vector function directly in your SQL query. Here's an example:
3162

63+
Here's an example:
3264
```sql
3365
CREATE TABLE documents (
3466
doc_id INT,
@@ -52,11 +84,11 @@ SELECT doc_id, text_content, ai_embedding_vector(text_content)
5284
FROM documents;
5385
```
5486

55-
This SQL script creates a documents table, inserts the example documents, and then generates embeddings using the ai_embedding_vector function. The embeddings are stored in the embeddings table with the ARRAY(FLOAT32) column type.
87+
This SQL script creates a `documents` table, inserts the example documents, and then generates embeddings using the `ai_embedding_vector` function. The embeddings are stored in the embeddings table with the `ARRAY(FLOAT32)` column type.
5688

57-
## Searching for related documents using cosine distance
89+
## Searching for similarity documents using cosine distance
5890

59-
Suppose you have a question, "What is a subfield of artificial intelligence?", and you want to find the most related document from the stored embeddings. First, generate an embedding for the question using the ai_embedding_vector function:
91+
Suppose you have a question, "What is a subfield of artificial intelligence?", and you want to find the most related document from the stored embeddings. First, generate an embedding for the question using the `ai_embedding_vector` function:
6092
```sql
6193
SELECT doc_id, text_content, cosine_distance(embedding, ai_embedding_vector('What is a subfield of artificial intelligence?')) AS distance
6294
FROM embeddings

0 commit comments

Comments
 (0)