Skip to content

Commit c7329bf

Browse files
authored
docs: add ai function (#10896)
* docs: add ai function * add AI functions to readme
1 parent 0e2dc47 commit c7329bf

File tree

6 files changed

+248
-3
lines changed

6 files changed

+248
-3
lines changed

README.md

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -171,6 +171,15 @@ docker run --net=host datafuselabs/databend
171171
- [How to Drop a View](https://databend.rs/doc/sql-commands/ddl/view/ddl-drop-view)
172172
- [How to Alter a View](https://databend.rs/doc/sql-commands/ddl/view/ddl-alter-view)
173173

174+
175+
## AI Functions
176+
177+
- [Generating SQL with AI](https://databend.rs/doc/sql-functions/ai-functions/ai-to-sql)
178+
- [Creating Embedding Vectors](https://databend.rs/doc/sql-functions/ai-functions/ai-embedding-vector)
179+
- [Computing Text Similarities](https://databend.rs/doc/sql-functions/ai-functions/cosine-distance)
180+
- [Text Completion with AI](https://databend.rs/doc/sql-functions/ai-functions/ai-text-completion)
181+
182+
174183
### Managing User-Defined Functions
175184

176185
- [How to Create a User-Defined Function](https://databend.rs/doc/sql-commands/ddl/udf/ddl-create-function)

docs/doc/01-guides/index.md

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -81,6 +81,14 @@ These tutorials are intended to help you get started with Databend:
8181
* [How to Drop a User-Defined Function](../14-sql-commands/00-ddl/50-udf/ddl-drop-function.md)
8282
* [How to Alter a User-Defined Function](../14-sql-commands/00-ddl/50-udf/ddl-alter-function.md)
8383

84+
85+
## AI Functions
86+
87+
* [Generating SQL with AI](../15-sql-functions/61-ai-functions/01-ai-to-sql.md)
88+
* [Creating Embedding Vectors](../15-sql-functions/61-ai-functions/02-ai-embedding-vector.md)
89+
* [Computing Text Similarities](../15-sql-functions/61-ai-functions/03-ai-cosine-distance.md)
90+
* [Text Completion with AI](../15-sql-functions/61-ai-functions/04-ai-text-completion.md)
91+
8492
## Backup & Restore
8593

8694
* [How to Back Up Meta Data](../10-deploy/06-metasrv/30-metasrv-backup-restore.md)
Lines changed: 53 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,53 @@
1+
---
2+
title: 'AI_EMBEDDING_VECTOR'
3+
description: 'Creating embeddings using the ai_embedding_vector function in Databend'
4+
---
5+
6+
This document provides an overview of the ai_embedding_vector function in Databend and demonstrates how to create document embeddings using this function.
7+
8+
## Overview of ai_embedding_vector
9+
10+
11+
The `ai_embedding_vector` function in Databend is a built-in function that generates vector embeddings for text data. It is useful for natural language processing tasks, such as document similarity, clustering, and recommendation systems.
12+
13+
The function takes a text input and returns a high-dimensional vector that represents the input text's semantic meaning and context. The embeddings are created using pre-trained models on large text corpora, capturing the relationships between words and phrases in a continuous space.
14+
15+
## Creating embeddings using ai_embedding_vector
16+
17+
To create embeddings for a text document using the `ai_embedding_vector` function, follow the example below.
18+
1. Create a table to store the documents:
19+
```sql
20+
CREATE TABLE documents (
21+
doc_id INT,
22+
text_content TEXT
23+
);
24+
```
25+
26+
2. Insert example documents into the table:
27+
```sql
28+
INSERT INTO documents (doc_id, text_content)
29+
VALUES
30+
(1, 'Artificial intelligence is a fascinating field.'),
31+
(2, 'Machine learning is a subset of AI.'),
32+
(3, 'I love going to the beach on weekends.');
33+
```
34+
35+
3. Create a table to store the embeddings:
36+
```sql
37+
CREATE TABLE embeddings (
38+
doc_id INT,
39+
text_content TEXT,
40+
embedding ARRAY(FLOAT32)
41+
);
42+
```
43+
44+
4. Generate embeddings for the text content and store them in the embeddings table:
45+
```sql
46+
INSERT INTO embeddings (doc_id, text_content, embedding)
47+
SELECT doc_id, text_content, ai_embedding_vector(text_content)
48+
FROM documents;
49+
50+
```
51+
After running these SQL queries, the embeddings table will contain the generated embeddings for each document in the documents table. The embeddings are stored as an array of `FLOAT32` values in the embedding column, which has the `ARRAY(FLOAT32)` column type.
52+
53+
You can now use these embeddings for various natural language processing tasks, such as finding similar documents or clustering documents based on their content.
Lines changed: 55 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,55 @@
1+
---
2+
title: 'COSINE_DISTANCE'
3+
description: 'Measuring document similarity using the cosine_distance function in Databend'
4+
---
5+
6+
This document provides an overview of the `cosine_distance` function in Databend and demonstrates how to measure document similarity using this function.
7+
8+
## Overview of cosine_distance
9+
10+
The `cosine_distance` function in Databend is a built-in function that calculates the cosine distance between two vectors. It is commonly used in natural language processing tasks, such as document similarity and recommendation systems.
11+
12+
Cosine distance is a measure of similarity between two vectors, based on the cosine of the angle between them. The function takes two input vectors and returns a value between 0 and 1, with 0 indicating identical vectors and 1 indicating orthogonal (completely dissimilar) vectors.
13+
14+
## Measuring similarity using cosine_distance
15+
16+
To measure document similarity using the cosine_distance function, follow the example below. This example assumes that you have already created document embeddings using the ai_embedding_vector function and stored them in a table with the `ARRAY(FLOAT32)` column type.
17+
18+
1. Create a table to store the documents and their embeddings:
19+
```sql
20+
CREATE TABLE documents (
21+
doc_id INT,
22+
text_content TEXT,
23+
embedding ARRAY(FLOAT32)
24+
);
25+
26+
```
27+
28+
2. Insert example documents and their embeddings into the table:
29+
```sql
30+
INSERT INTO documents (doc_id, text_content, embedding)
31+
VALUES
32+
(1, 'Artificial intelligence is a fascinating field.', ai_embedding_vector('Artificial intelligence is a fascinating field.')),
33+
(2, 'Machine learning is a subset of AI.', ai_embedding_vector('Machine learning is a subset of AI.')),
34+
(3, 'I love going to the beach on weekends.', ai_embedding_vector('I love going to the beach on weekends.'));
35+
```
36+
37+
3. Measure the similarity between a query document and the stored documents using the `cosine_distance` function:
38+
```sql
39+
SELECT doc_id, text_content, cosine_distance(embedding, ai_embedding_vector('What is a subfield of artificial intelligence?')) AS distance
40+
FROM embeddings
41+
ORDER BY distance ASC
42+
LIMIT 5;
43+
```
44+
This SQL query calculates the cosine distance between the query document's embedding and the embeddings of the stored documents. The results are ordered by ascending distance, with the smallest distance indicating the highest similarity.
45+
46+
Result:
47+
```sql
48+
+--------+-------------------------------------------------+------------+
49+
| doc_id | text_content | distance |
50+
+--------+-------------------------------------------------+------------+
51+
| 1 | Artificial intelligence is a fascinating field. | 0.10928339 |
52+
| 2 | Machine learning is a subset of AI. | 0.13584924 |
53+
| 3 | I love going to the beach on weekends. | 0.30774158 |
54+
+--------+-------------------------------------------------+------------+
55+
```
Lines changed: 30 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,30 @@
1+
---
2+
title: 'AI_TEXT_COMPLETION'
3+
description: 'Generating text completions using the ai_text_completion function in Databend'
4+
---
5+
6+
This document provides an overview of the ai_text_completion function in Databend and demonstrates how to generate text completions using this function.
7+
8+
## Overview of ai_text_completion
9+
10+
The `ai_text_completion` function in Databend is a built-in function that generates text completions based on a given prompt. It is useful for natural language processing tasks, such as question answering, text generation, and autocompletion systems.
11+
12+
The function takes a text prompt as input and returns a generated completion for the prompt. The completions are created using pre-trained models on large text corpora, capturing the relationships between words and phrases in a continuous space.
13+
14+
## Generating text completions using ai_text_completion
15+
16+
Here is a simple example using the `ai_text_completion` function in Databend to generate a text completion:
17+
```sql
18+
SELECT ai_text_completion('What is artificial intelligence?') AS completion;
19+
```
20+
21+
Result:
22+
```sql
23+
+--------------------------------------------------------------------------------------------------------------------+
24+
| completion |
25+
+--------------------------------------------------------------------------------------------------------------------+
26+
| Artificial intelligence (AI) is the field of study focused on creating machines and software capable of thinking, learning, and solving problems in a way that mimics human intelligence. This includes areas such as machine learning, natural language processing, computer vision, and robotics. |
27+
+--------------------------------------------------------------------------------------------------------------------+
28+
```
29+
30+
In this example, we provide the prompt "What is artificial intelligence?" to the ai_text_completion function, and it returns a generated completion that briefly describes artificial intelligence.
Lines changed: 93 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,8 +1,98 @@
11
---
22
title: 'AI Functions'
3-
description: 'Learn how to use AI functions in Databend with the help of the OpenAI engine.'
3+
description: 'SQL-based Knowledge Base Search and Completion using Databend'
44
---
55

6-
AI functions refer to the various capabilities within Databend that are powered by the [OpenAI](https://openai.com/) engine, and are designed to make it easier for users to interact with databases using natural language.
6+
This document demonstrates how to leverage Databend's built-in AI functions for creating document embeddings, searching for similar documents, and generating text completions based on context.
77

8-
- [AI_TO_SQL](01-ai-to-sql.md): Converts natural language instructions into SQL queries with the latest [Codex](https://openai.com/blog/openai-codex) model `code-davinci-002`.
8+
We will guide you through a simple example that shows how to create and store embeddings using the `ai_embedding_vector` function, find related documents with the `cosine_distance` function, and generate completions using the `ai_text_completion` function.
9+
10+
## Introduction to embeddings
11+
12+
Embeddings are vector representations of text data that capture the semantic meaning and context of the original text. They can be used to compare and analyze text in various natural language processing tasks, such as document similarity, clustering, and recommendation.
13+
14+
## How do embeddings work?
15+
16+
Embeddings work by converting text into high-dimensional vectors in such a way that similar texts are closer together in the vector space. This is achieved by training a model on a large corpus of text, which learns to represent the words and phrases in a continuous space that captures their semantic relationships.
17+
Embeddings are vector representations of text data that capture the semantic meaning and context of the original text. They are widely used in various natural language processing tasks, such as document similarity, clustering, and recommendation systems.
18+
19+
## Databend AI Functions
20+
21+
Databend provides built-in AI functions for various natural language processing tasks. The main functions covered in this document are:
22+
23+
- [ai_embedding_vector](./02-ai-embedding-vector.md): Generates embeddings for text documents.
24+
- [cosine_distance](./03-ai-cosine-distance.md): Calculates the cosine distance between two embeddings.
25+
- [ai_text_completion](./04-ai-text-completion.md): Generates text completions based on a given prompt.
26+
These functions are powered by open-source natural language processing models and can be used directly within SQL queries.
27+
28+
## Creating and storing embeddings using Databend
29+
30+
To create embeddings for a text document using Databend, you can use the built-in ai_embedding_vector function directly in your SQL query. Here's an example:
31+
32+
```sql
33+
CREATE TABLE documents (
34+
doc_id INT,
35+
text_content TEXT
36+
);
37+
38+
INSERT INTO documents (doc_id, text_content)
39+
VALUES
40+
(1, 'Artificial intelligence is a fascinating field.'),
41+
(2, 'Machine learning is a subset of AI.'),
42+
(3, 'I love going to the beach on weekends.');
43+
44+
CREATE TABLE embeddings (
45+
doc_id INT,
46+
text_content TEXT,
47+
embedding ARRAY(FLOAT32)
48+
);
49+
50+
INSERT INTO embeddings (doc_id, text_content, embedding)
51+
SELECT doc_id, text_content, ai_embedding_vector(text_content)
52+
FROM documents;
53+
```
54+
55+
This SQL script creates a documents table, inserts the example documents, and then generates embeddings using the ai_embedding_vector function. The embeddings are stored in the embeddings table with the ARRAY(FLOAT32) column type.
56+
57+
## Searching for related documents using cosine distance
58+
59+
Suppose you have a question, "What is a subfield of artificial intelligence?", and you want to find the most related document from the stored embeddings. First, generate an embedding for the question using the ai_embedding_vector function:
60+
```sql
61+
SELECT doc_id, text_content, cosine_distance(embedding, ai_embedding_vector('What is a subfield of artificial intelligence?')) AS distance
62+
FROM embeddings
63+
ORDER BY distance ASC
64+
LIMIT 5;
65+
```
66+
This query will return the top 5 most similar documents to the input question, ordered by their cosine distance, with the smallest distance indicating the highest similarity.
67+
68+
Result:
69+
```sql
70+
+--------+-------------------------------------------------+------------+
71+
| doc_id | text_content | distance |
72+
+--------+-------------------------------------------------+------------+
73+
| 1 | Artificial intelligence is a fascinating field. | 0.10928339 |
74+
| 2 | Machine learning is a subset of AI. | 0.13584924 |
75+
| 3 | I love going to the beach on weekends. | 0.30774158 |
76+
+--------+-------------------------------------------------+------------+
77+
```
78+
79+
## Generating text completions with Databend
80+
81+
Databend also supports a text completion function, ai_text_completion. For example, from the above output, we choose the document with the smallest cosine distance: "Artificial intelligence is a fascinating field." We can use this as context and provide the original question to the ai_text_completion function to generate a completion:
82+
83+
```sql
84+
SELECT ai_text_completion('Artificial intelligence is a fascinating field. What is a subfield of artificial intelligence?') AS completion;
85+
```
86+
87+
Result:
88+
```sql
89+
+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
90+
| completion |
91+
+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
92+
|
93+
| A subfield of artificial intelligence is machine learning, which is the study of algorithms that allow computers to learn from data and improve their performance over time. Other subfields include natural language processing, computer vision, robotics, and deep learning. |
94+
+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
95+
```
96+
97+
98+
You can experience these functions on our [Databend Cloud](https://databend.com), where you can sign up for a free trial and start using these AI functions right away. Databend's AI functions are designed to be easy to use, even for users who are not familiar with machine learning or natural language processing. With Databend, you can quickly and easily add powerful AI capabilities to your SQL queries and take your data analysis to the next level.

0 commit comments

Comments
 (0)