You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: articles/azure-cache-for-redis/cache-overview-vector-similarity.md
+7-8Lines changed: 7 additions & 8 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -34,26 +34,25 @@ Redis has access to a wide range of search capabilities through the [RediSearch
34
34
## What are vector embeddings?
35
35
36
36
### Concept
37
-
Vector embeddings are a fundamental concept in machine learning and natural language processing that enable the representation of data, such as words, documents, or even images, as numerical vectors in a high-dimension vector space. The primary idea behind vector embeddings is to capture the underlying relationships and semantics of the data by mapping them to points in this vector space. This allows complex data to be manipulated and analyzed mathematically, making it easier to perform tasks like similarity comparison, recommendation, and classification.
37
+
Vector embeddings are a fundamental concept in machine learning and natural language processing that enable the representation of data, such as words, documents, or images as numerical vectors in a high-dimension vector space. The primary idea behind vector embeddings is to capture the underlying relationships and semantics of the data by mapping them to points in this vector space. In simplier terms, that means converting your text or images into a sequence of numbers that represents the data, and then comparing the different number sequences. This allows complex data to be manipulated and analyzed mathematically, making it easier to perform tasks like similarity comparison, recommendation, and classification.
38
38
39
39
#! TODO - Add image example
40
40
41
-
How a machine learning model classifies data and produces the vector is different for each machine learning model, and it's typically not possible to determine exactly what semantic menaing each vector dimension represents. But because the model is consistent between each block of data inputed, similar words, documents, or images will have vectors that are also similar. For example, the words `basketball` and `baseball` will likely have a embeddings vectors much closer to each other than a word like `rainforest`.
41
+
How a machine learning model classifies data and produces the vector is different for each machine learning model, and it's typically not possible to determine exactly what semantic menaing each vector dimension represents. But because the model is consistent between each block of data inputed, similar words, documents, or images will have vectors that are also similar. For example, the words `basketball` and `baseball` will have embeddings vectors much closer to each other than a word like `rainforest`.
42
42
43
43
### Vector comparison
44
44
Vectors can be compared using a variety of metrics. The most popular way to compare vectors is to use [cosine similarity](https://en.wikipedia.org/wiki/Cosine_similarity), which measures the cosine of the angle between two vectors in a multi-dimensional space. The closer the vectors, the smaller the angle. Other common distance metrics include [euclidean distance](https://en.wikipedia.org/wiki/Euclidean_distance) and [inner product](https://en.wikipedia.org/wiki/Inner_product_space).
45
45
46
46
### Generating embeddings
47
-
Many machine learning models support emeddings APIs. For an example of how to create vector embeddings using Azure OpenAI Service, see [Learn how to generate embeddings with Azure OpenAI](../ai-services/openai/how-to/embeddings.md)
47
+
Many machine learning models support embeddings APIs. For an example of how to create vector embeddings using Azure OpenAI Service, see [Learn how to generate embeddings with Azure OpenAI](../ai-services/openai/how-to/embeddings.md).
48
48
49
49
## What is a vector database?
50
50
51
-
It might be nice to level set here. This would contextualize this article for those
52
-
who may not be familiar with LLMs, embeddings, vectors, nearest neighbor / cosine
53
-
similarity searches, etc. You don't have to go deep.
51
+
A vector database is a database that can store, manage, retrieve, and compare vectors. Vector databases must be able to efficiently store a high-dimensional vector and retrive it with minimal latency and high throughput. Non-relational datastores are most commonly used as vector databases, although it's possible to use relational databases like PostgreSQL, for example, with the [pgvector](https://github.com/pgvector/pgvector) extension.
54
52
55
-
Alternatively, you could create the section "Who is this for?" but that seems a bit
56
-
more uncouth.
53
+
Vector databases need to index data for fast search and retrieval. Common indexing methods include K-Nearest Neighbors (KNN), an exhaustive method that provides the most precision but with higher computational cost, and Approximate Nearest Neighbors (ANN), which is more efficient by trading precision for greater speed and lower processing overhead.
54
+
55
+
Finally, vector databases execute vector searches by using the chosen vector comparison method to return the most similar vectors. Some vector databases can also perform _hybrid_ searches by first narrowing results based on characteristics or metadata before conducting the vector search. This is a way to make the vector search more effective and customizable.
0 commit comments