Skip to content

Commit c4bb853

Browse files
authored
Merge pull request #279880 from wmwxwa/patch-24
Update vector-search-overview.md
2 parents f7b20dd + fd676bc commit c4bb853

File tree

3 files changed

+4
-4
lines changed

3 files changed

+4
-4
lines changed

articles/cosmos-db/TOC.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -36,7 +36,7 @@
3636
href: gen-ai/distance-functions.md
3737
- name: kNN vs ANN
3838
href: gen-ai/knn-vs-ann.md
39-
- name: Unified AI database
39+
- name: Generative AI
4040
items:
4141
- name: AI agent
4242
href: ai-agents.md

articles/cosmos-db/gen-ai/vector-embeddings.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -10,7 +10,7 @@ ms.date: 07/01/2024
1010

1111
# What are vector embeddings?
1212

13-
Vectors, also known as embeddings or vector embeddings, are mathematical representations of data in a high-dimensional space. They represent various types of information — text, images, audio — a format that machine learning models can process. When an AI model receives text input, it first tokenizes the text into tokens. Each token is then converted into its corresponding embedding. The model processes these embeddings through multiple layers, capturing complex patterns and relationships within the text. The output embeddings can then be converted back into tokens if needed, generating readable text.
13+
Vectors, also known as embeddings or vector embeddings, are mathematical representations of data in a high-dimensional space. They represent various types of information — text, images, audio — a format that machine learning models can process. When an AI model receives text input, it first tokenizes the text into tokens. Each token is then converted into its corresponding embedding. This conversion process can be done using an embedding generation model, such as [Azure OpenAI Embeddings](../../ai-services/openai/how-to/embeddings.md) or [Hugging Face on Azure](https://azure.microsoft.com/solutions/hugging-face-on-azure). The model processes these embeddings through multiple layers, capturing complex patterns and relationships within the text. The output embeddings can then be converted back into tokens if needed, generating readable text.
1414

1515
Each embedding is a vector of floating-point numbers, such that the distance between two embeddings in the vector space is correlated with semantic similarity between two inputs in the original format. For example, if two texts are similar, then their vector representations should also be similar. These high-dimensional representations capture semantic meaning, making it easier to perform tasks like searching, clustering, and classifying.
1616

@@ -23,7 +23,7 @@ Each box containing floating-point numbers corresponds to a dimension, and each
2323

2424
Between the two vectors in the above example, some dimensions are similar while other dimensions are different, which are due to the similarities and differences in the meaning of the two phrases.
2525

26-
This image shows the spatial closeness of vectors that are similar, constrasting vectors that are drastically different:
26+
This image shows the spatial closeness of vectors that are similar, contrasting vectors that are drastically different:
2727

2828
:::image type="content" source="../media/gen-ai/concepts/vector-closeness.png" lightbox="../media/gen-ai/concepts/vector-closeness.png" alt-text="Screenshot of vector closeness.":::
2929
Image source: [OpenAI](https://openai.com/index/introducing-text-and-code-embeddings/)

articles/cosmos-db/gen-ai/vector-search-overview.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -10,7 +10,7 @@ ms.date: 07/01/2024
1010

1111
# What is vector search?
1212

13-
Vector search is a method that helps you find similar items based on their data characteristics rather than by exact matches on a property field. This technique is useful in applications such as searching for similar text, finding related images, making recommendations, or even detecting anomalies. It works by taking the [vector embeddings](vector-embeddings.md) of your data that you created by using an embedding generation model, such as [Azure OpenAI Embeddings](../../ai-services/openai/how-to/embeddings.md) or [Hugging Face on Azure](https://azure.microsoft.com/solutions/hugging-face-on-azure). It then measures the [distance](distance-functions.md) between the data vectors and your query vector. The data vectors that are closest to your query vector are the ones that are found to be most similar semantically. Some well-known vector search algorithms include Hierarchical Navigable Small World (HNSW), Inverted File (IVF), and the state-of-the-art DiskANN.
13+
Vector search is a method that helps you find similar items based on their data characteristics rather than by exact matches on a property field. This technique is useful in applications such as searching for similar text, finding related images, making recommendations, or even detecting anomalies. It works by taking the [vector embeddings](vector-embeddings.md) of your data and query, and then measuring the [distance](distance-functions.md) between the data vectors and your query vector. The data vectors that are closest to your query vector are the ones that are found to be most similar semantically. Some well-known vector search algorithms include Hierarchical Navigable Small World (HNSW), Inverted File (IVF), and the state-of-the-art DiskANN.
1414

1515
This [interactive visualization](https://openai.com/index/introducing-text-and-code-embeddings/#_1Vr7cWWEATucFxVXbW465e) shows some examples of closeness and distance between vectors.
1616

0 commit comments

Comments
 (0)