|
1 | 1 | --- |
2 | 2 | title: 'AI Functions' |
3 | | -description: 'Learn how to use AI functions in Databend with the help of the OpenAI engine.' |
| 3 | +description: 'SQL-based Knowledge Base Search and Completion using Databend' |
4 | 4 | --- |
5 | 5 |
|
6 | | -AI functions refer to the various capabilities within Databend that are powered by the [OpenAI](https://openai.com/) engine, and are designed to make it easier for users to interact with databases using natural language. |
| 6 | +This document demonstrates how to leverage Databend's built-in AI functions for creating document embeddings, searching for similar documents, and generating text completions based on context. |
7 | 7 |
|
8 | | -- [AI_TO_SQL](01-ai-to-sql.md): Converts natural language instructions into SQL queries with the latest [Codex](https://openai.com/blog/openai-codex) model `code-davinci-002`. |
| 8 | +We will guide you through a simple example that shows how to create and store embeddings using the `ai_embedding_vector` function, find related documents with the `cosine_distance` function, and generate completions using the `ai_text_completion` function. |
| 9 | + |
| 10 | +## Introduction to embeddings |
| 11 | + |
| 12 | +Embeddings are vector representations of text data that capture the semantic meaning and context of the original text. They can be used to compare and analyze text in various natural language processing tasks, such as document similarity, clustering, and recommendation. |
| 13 | + |
| 14 | +## How do embeddings work? |
| 15 | + |
| 16 | +Embeddings work by converting text into high-dimensional vectors in such a way that similar texts are closer together in the vector space. This is achieved by training a model on a large corpus of text, which learns to represent the words and phrases in a continuous space that captures their semantic relationships. |
| 17 | +Embeddings are vector representations of text data that capture the semantic meaning and context of the original text. They are widely used in various natural language processing tasks, such as document similarity, clustering, and recommendation systems. |
| 18 | + |
| 19 | +## Databend AI Functions |
| 20 | + |
| 21 | +Databend provides built-in AI functions for various natural language processing tasks. The main functions covered in this document are: |
| 22 | + |
| 23 | +- [ai_embedding_vector](./02-ai-embedding-vector.md): Generates embeddings for text documents. |
| 24 | +- [cosine_distance](./03-ai-cosine-distance.md): Calculates the cosine distance between two embeddings. |
| 25 | +- [ai_text_completion](./04-ai-text-completion.md): Generates text completions based on a given prompt. |
| 26 | +These functions are powered by open-source natural language processing models and can be used directly within SQL queries. |
| 27 | + |
| 28 | +## Creating and storing embeddings using Databend |
| 29 | + |
| 30 | +To create embeddings for a text document using Databend, you can use the built-in ai_embedding_vector function directly in your SQL query. Here's an example: |
| 31 | + |
| 32 | +```sql |
| 33 | +CREATE TABLE documents ( |
| 34 | + doc_id INT, |
| 35 | + text_content TEXT |
| 36 | +); |
| 37 | + |
| 38 | +INSERT INTO documents (doc_id, text_content) |
| 39 | +VALUES |
| 40 | + (1, 'Artificial intelligence is a fascinating field.'), |
| 41 | + (2, 'Machine learning is a subset of AI.'), |
| 42 | + (3, 'I love going to the beach on weekends.'); |
| 43 | + |
| 44 | +CREATE TABLE embeddings ( |
| 45 | + doc_id INT, |
| 46 | + text_content TEXT, |
| 47 | + embedding ARRAY(FLOAT32) |
| 48 | +); |
| 49 | + |
| 50 | +INSERT INTO embeddings (doc_id, text_content, embedding) |
| 51 | +SELECT doc_id, text_content, ai_embedding_vector(text_content) |
| 52 | +FROM documents; |
| 53 | +``` |
| 54 | + |
| 55 | +This SQL script creates a documents table, inserts the example documents, and then generates embeddings using the ai_embedding_vector function. The embeddings are stored in the embeddings table with the ARRAY(FLOAT32) column type. |
| 56 | + |
| 57 | +## Searching for related documents using cosine distance |
| 58 | + |
| 59 | +Suppose you have a question, "What is a subfield of artificial intelligence?", and you want to find the most related document from the stored embeddings. First, generate an embedding for the question using the ai_embedding_vector function: |
| 60 | +```sql |
| 61 | +SELECT doc_id, text_content, cosine_distance(embedding, ai_embedding_vector('What is a subfield of artificial intelligence?')) AS distance |
| 62 | +FROM embeddings |
| 63 | +ORDER BY distance ASC |
| 64 | +LIMIT 5; |
| 65 | +``` |
| 66 | +This query will return the top 5 most similar documents to the input question, ordered by their cosine distance, with the smallest distance indicating the highest similarity. |
| 67 | + |
| 68 | +Result: |
| 69 | +```sql |
| 70 | ++--------+-------------------------------------------------+------------+ |
| 71 | +| doc_id | text_content | distance | |
| 72 | ++--------+-------------------------------------------------+------------+ |
| 73 | +| 1 | Artificial intelligence is a fascinating field. | 0.10928339 | |
| 74 | +| 2 | Machine learning is a subset of AI. | 0.13584924 | |
| 75 | +| 3 | I love going to the beach on weekends. | 0.30774158 | |
| 76 | ++--------+-------------------------------------------------+------------+ |
| 77 | +``` |
| 78 | + |
| 79 | +## Generating text completions with Databend |
| 80 | + |
| 81 | +Databend also supports a text completion function, ai_text_completion. For example, from the above output, we choose the document with the smallest cosine distance: "Artificial intelligence is a fascinating field." We can use this as context and provide the original question to the ai_text_completion function to generate a completion: |
| 82 | + |
| 83 | +```sql |
| 84 | +SELECT ai_text_completion('Artificial intelligence is a fascinating field. What is a subfield of artificial intelligence?') AS completion; |
| 85 | +``` |
| 86 | + |
| 87 | +Result: |
| 88 | +```sql |
| 89 | ++-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ |
| 90 | +| completion | |
| 91 | ++-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ |
| 92 | +| |
| 93 | +| A subfield of artificial intelligence is machine learning, which is the study of algorithms that allow computers to learn from data and improve their performance over time. Other subfields include natural language processing, computer vision, robotics, and deep learning. | |
| 94 | ++-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ |
| 95 | +``` |
| 96 | + |
| 97 | + |
| 98 | +You can experience these functions on our [Databend Cloud](https://databend.com), where you can sign up for a free trial and start using these AI functions right away. Databend's AI functions are designed to be easy to use, even for users who are not familiar with machine learning or natural language processing. With Databend, you can quickly and easily add powerful AI capabilities to your SQL queries and take your data analysis to the next level. |
0 commit comments