You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/doc/15-sql-functions/61-ai-functions/01-ai-to-sql.md
+11-1Lines changed: 11 additions & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -2,14 +2,24 @@
2
2
title: 'AI_TO_SQL'
3
3
---
4
4
5
-
Converts natural language instructions into SQL queries with the latest [Codex](https://platform.openai.com/docs/models/codex)model `code-davinci-002`.
5
+
Converts natural language instructions into SQL queries with the latest model `text-davinci-003`.
6
6
7
7
Databend offers an efficient solution for constructing SQL queries by incorporating OLAP and AI. Through this function, instructions written in a natural language can be converted into SQL query statements that align with the table schema. For example, the function can be provided with a sentence like "Get all items that cost 10 dollars or less" as an input and generate the corresponding SQL query "SELECT * FROM items WHERE price <= 10" as output.
8
8
9
+
The main code implementation can be found [here](https://github.com/datafuselabs/databend/blob/1e93c5b562bd159ecb0f336bb88fd1b7f9dc4a62/src/query/service/src/table_functions/openai/ai_to_sql.rs).
10
+
9
11
:::note
10
12
The SQL query statements generated adhere to the PostgreSQL standards, so they might require manual revisions to align with the syntax of Databend.
11
13
:::
12
14
15
+
:::caution
16
+
Databend relies on OpenAI for `AI_TO_SQL` but only sends the table schema to OpenAI, not the data.
17
+
18
+
They will only work when the Databend configuration includes the `openai_api_key`, otherwise they will be inactive.
19
+
20
+
This function is available by default on [Databend Cloud](https://databend.com) using our self OpenAI key. If you use them, you acknowledge that your table schema will be sent to OpenAI by us.
Copy file name to clipboardExpand all lines: docs/doc/15-sql-functions/61-ai-functions/02-ai-embedding-vector.md
+10Lines changed: 10 additions & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -5,6 +5,16 @@ description: 'Creating embeddings using the ai_embedding_vector function in Data
5
5
6
6
This document provides an overview of the ai_embedding_vector function in Databend and demonstrates how to create document embeddings using this function.
7
7
8
+
The main code implementation can be found [here](https://github.com/datafuselabs/databend/blob/1e93c5b562bd159ecb0f336bb88fd1b7f9dc4a62/src/common/openai/src/embedding.rs).
9
+
10
+
:::caution
11
+
Databend relies on OpenAI for `AI_EMBEDDING_VECTOR` and sends the embedding column data to OpenAI.
12
+
13
+
They will only work when the Databend configuration includes the `openai_api_key`, otherwise they will be inactive.
14
+
15
+
This function is available by default on [Databend Cloud](https://databend.com) using our self OpenAI key. If you use them, you acknowledge that your table schema will be sent to OpenAI by us.
Copy file name to clipboardExpand all lines: docs/doc/15-sql-functions/61-ai-functions/03-ai-text-completion.md
+13-2Lines changed: 13 additions & 2 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -3,7 +3,18 @@ title: 'AI_TEXT_COMPLETION'
3
3
description: 'Generating text completions using the ai_text_completion function in Databend'
4
4
---
5
5
6
-
This document provides an overview of the ai_text_completion function in Databend and demonstrates how to generate text completions using this function.
6
+
This document provides an overview of the `ai_text_completion` function in Databend and demonstrates how to generate text completions using this function.
7
+
8
+
The main code implementation can be found [here](https://github.com/datafuselabs/databend/blob/1e93c5b562bd159ecb0f336bb88fd1b7f9dc4a62/src/common/openai/src/completion.rs).
9
+
10
+
:::caution
11
+
Databend relies on OpenAI for `AI_TEXT_COMPLETION` and sends the completion prompt data to OpenAI.
12
+
13
+
They will only work when the Databend configuration includes the `openai_api_key`, otherwise they will be inactive.
14
+
15
+
This function is available by default on [Databend Cloud](https://databend.com) using our self OpenAI key. If you use them, you acknowledge that your table schema will be sent to OpenAI by us.
In this example, we provide the prompt "What is artificial intelligence?" to the ai_text_completion function, and it returns a generated completion that briefly describes artificial intelligence.
41
+
In this example, we provide the prompt "What is artificial intelligence?" to the `ai_text_completion` function, and it returns a generated completion that briefly describes artificial intelligence.
Copy file name to clipboardExpand all lines: docs/doc/15-sql-functions/61-ai-functions/04-ai-cosine-distance.md
+4Lines changed: 4 additions & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -5,6 +5,10 @@ description: 'Measuring document similarity using the cosine_distance function i
5
5
6
6
This document provides an overview of the `cosine_distance` function in Databend and demonstrates how to measure document similarity using this function.
7
7
8
+
:::info
9
+
The `cosine_distance` function performs vector computations within Databend and does not rely on the OpenAI API.
10
+
:::
11
+
8
12
## Overview of cosine_distance
9
13
10
14
The `cosine_distance` function in Databend is a built-in function that calculates the cosine distance between two vectors. It is commonly used in natural language processing tasks, such as document similarity and recommendation systems.
Copy file name to clipboardExpand all lines: docs/doc/15-sql-functions/61-ai-functions/index.md
+45-13Lines changed: 45 additions & 13 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1,34 +1,66 @@
1
1
---
2
2
title: 'AI Functions'
3
-
description: 'SQL-based Knowledge Base Search and Completion using Databend'
3
+
description: 'Using SQL-based AI Functions for Knowledge Base Search and Text Completion'
4
4
---
5
5
6
-
This document demonstrates how to leverage Databend's built-in AI functions for creating document embeddings, searching for similar documents, and generating text completions based on context.
6
+
This document demonstrates how to leverage Databend's built-in AI functions for creating document embeddings, searching for similar documents, and generating text completions based on context. We will guide you through a simple example that shows how to create and store embeddings, find related documents, and generate completions using various AI functions.
7
7
8
-
We will guide you through a simple example that shows how to create and store embeddings using the `ai_embedding_vector` function, find related documents with the `cosine_distance` function, and generate completions using the `ai_text_completion` function.
8
+
:::caution
9
+
10
+
Databend relies on OpenAI for embeddings and text completions, which means your data will be sent to OpenAI. Exercise caution when using these functions.
11
+
12
+
They will only work when the Databend configuration includes the `openai_api_key`, otherwise they will be inactive.
13
+
14
+
These functions are available by default on [Databend Cloud](https://databend.com) using our self OpenAI key. If you use them, you acknowledge that your data will be sent to OpenAI by us.
15
+
16
+
:::
9
17
10
18
## Introduction to embeddings
11
19
12
-
Embeddings are vector representations of text data that capture the semantic meaning and context of the original text. They can be used to compare and analyze text in various natural language processing tasks, such as document similarity, clustering, and recommendation.
20
+
Embeddings are vector representations of text data that capture the semantic meaning and context of the original text. They can be used to compare and analyze text in various natural language processing tasks, such as document similarity, clustering, and recommendation systems.
13
21
14
22
## How do embeddings work?
15
23
16
-
Embeddings work by converting text into high-dimensional vectors in such a way that similar texts are closer together in the vector space. This is achieved by training a model on a large corpus of text, which learns to represent the words and phrases in a continuous space that captures their semantic relationships.
17
-
Embeddings are vector representations of text data that capture the semantic meaning and context of the original text. They are widely used in various natural language processing tasks, such as document similarity, clustering, and recommendation systems.
24
+
Embeddings work by converting text into high-dimensional vectors in such a way that similar texts are closer together in the vector space.
25
+
26
+
This is achieved by training a model on a large corpus of text, which learns to represent the words and phrases in a continuous space that captures their semantic relationships.
27
+
28
+
To illustrate how embeddings work, let's consider a simple example. Suppose we have the following sentences:
29
+
1.`"The cat sat on the mat."`
30
+
2.`"The dog sat on the rug."`
31
+
3.`"The quick brown fox jumped over the lazy dog."`
32
+
33
+
When creating embeddings for these sentences, the model will convert the text into high-dimensional vectors in such a way that similar sentences are closer together in the vector space.
34
+
35
+
For instance, the embeddings of sentences 1 and 2 will be closer to each other because they share a similar structure and meaning (both involve an animal sitting on something). On the other hand, the embedding of sentence 3 will be farther from the embeddings of sentences 1 and 2 because it has a different structure and meaning.
36
+
37
+
The embeddings could look like this (simplified for illustration purposes):
38
+
39
+
1.`[0.2, 0.3, 0.1, 0.7, 0.4]`
40
+
2.`[0.25, 0.29, 0.11, 0.71, 0.38]`
41
+
3.`[-0.1, 0.5, 0.6, -0.3, 0.8]`
42
+
43
+
In this simplified example, you can see that the embeddings of sentences 1 and 2 are closer to each other in the vector space, while the embedding of sentence 3 is farther away. This illustrates how embeddings can capture semantic relationships and be used to compare and analyze text data.
44
+
45
+
46
+
## What is a Vector Database?
47
+
48
+
A vector database is a specialized database designed to store, manage, and search high-dimensional vector data efficiently. These databases are optimized for similarity search operations, such as finding the nearest neighbors of a given vector. They are particularly useful in scenarios where the data has high dimensionality, like embeddings in natural language processing tasks, image feature vectors, and more.
49
+
50
+
Typically, embedding vectors are stored in specialized vector databases like milvus, pinecone, qdrant, or weaviate. Databend can also store embedding vectors using the ARRAY(FLOAT32) data type and perform similarity computations with the cosine_distance function in SQL. To create embeddings for a text document using Databend, you can use the built-in ai_embedding_vector function directly in your SQL query.
18
51
19
52
## Databend AI Functions
20
53
21
54
Databend provides built-in AI functions for various natural language processing tasks. The main functions covered in this document are:
22
55
23
56
-[ai_embedding_vector](./02-ai-embedding-vector.md): Generates embeddings for text documents.
24
-
-[cosine_distance](./03-ai-cosine-distance.md): Calculates the cosine distance between two embeddings.
25
-
-[ai_text_completion](./04-ai-text-completion.md): Generates text completions based on a given prompt.
26
-
These functions are powered by open-source natural language processing models and can be used directly within SQL queries.
57
+
-[ai_text_completion](./03-ai-text-completion.md): Generates text completions based on a given prompt.
58
+
-[cosine_distance](./04-ai-cosine-distance.md): Calculates the cosine distance between two embeddings.
27
59
28
60
## Creating and storing embeddings using Databend
29
61
30
-
To create embeddings for a text document using Databend, you can use the built-in ai_embedding_vector function directly in your SQL query. Here's an example:
This SQL script creates a documents table, inserts the example documents, and then generates embeddings using the ai_embedding_vector function. The embeddings are stored in the embeddings table with the ARRAY(FLOAT32) column type.
87
+
This SQL script creates a `documents` table, inserts the example documents, and then generates embeddings using the `ai_embedding_vector` function. The embeddings are stored in the embeddings table with the `ARRAY(FLOAT32)` column type.
56
88
57
-
## Searching for related documents using cosine distance
89
+
## Searching for similarity documents using cosine distance
58
90
59
-
Suppose you have a question, "What is a subfield of artificial intelligence?", and you want to find the most related document from the stored embeddings. First, generate an embedding for the question using the ai_embedding_vector function:
91
+
Suppose you have a question, "What is a subfield of artificial intelligence?", and you want to find the most related document from the stored embeddings. First, generate an embedding for the question using the `ai_embedding_vector` function:
60
92
```sql
61
93
SELECT doc_id, text_content, cosine_distance(embedding, ai_embedding_vector('What is a subfield of artificial intelligence?')) AS distance
0 commit comments