Skip to content

Commit 24b48dd

Browse files
Merge pull request #2131 from redis/DOC-5734-csharp-vecset-emb-examples
DOC-5734 C# vector set embedding example
2 parents 58d2570 + 477519a commit 24b48dd

File tree

2 files changed

+452
-0
lines changed

2 files changed

+452
-0
lines changed
Lines changed: 189 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,189 @@
1+
---
2+
categories:
3+
- docs
4+
- develop
5+
- stack
6+
- oss
7+
- rs
8+
- rc
9+
- oss
10+
- kubernetes
11+
- clients
12+
description: Index and query embeddings with Redis vector sets
13+
linkTitle: Vector set embeddings
14+
title: Vector set embeddings
15+
weight: 40
16+
bannerText: Vector set is a new data type that is currently in preview and may be subject to change.
17+
bannerChildren: true
18+
---
19+
20+
A Redis [vector set]({{< relref "/develop/data-types/vector-sets" >}}) lets
21+
you store a set of unique keys, each with its own associated vector.
22+
You can then retrieve keys from the set according to the similarity between
23+
their stored vectors and a query vector that you specify.
24+
25+
You can use vector sets to store any type of numeric vector but they are
26+
particularly optimized to work with text embedding vectors (see
27+
[Redis for AI]({{< relref "/develop/ai" >}}) to learn more about text
28+
embeddings). The example below shows how to use the
29+
[`Microsoft.ML`](https://dotnet.microsoft.com/en-us/apps/ai/ml-dotnet)
30+
library to generate vector embeddings and then
31+
store and retrieve them using a vector set with `StackExchange.Redis`.
32+
33+
## Initialize
34+
35+
Start by installing `StackExchange.Redis` with the following
36+
command (version 2.9.17 or later is required for vector sets):
37+
38+
```bash
39+
dotnet add package StackExchange.Redis --version 2.9.17
40+
```
41+
42+
Also, install `Microsoft.ML`:
43+
44+
```bash
45+
dotnet add package Microsoft.ML
46+
```
47+
48+
In a new C# file, import the required classes. Note that the `#pragma`
49+
directive suppresses warnings about the experimental status of the vector set API:
50+
51+
{{< clients-example set="home_vecsets" step="import" lang_filter="C#" >}}
52+
{{< /clients-example >}}
53+
54+
## Access the model
55+
56+
Use the `GetPredictionEngine()` helper function declared in the example below to load the model that creates the embeddings:
57+
58+
{{< clients-example set="home_vecsets" step="model" lang_filter="C#" >}}
59+
{{< /clients-example >}}
60+
61+
The `GetPredictionEngine()` function uses two classes, `TextData` and `TransformedTextData`,
62+
to specify the `PredictionEngine` model. These have a very simple definition
63+
and are required because the model expects the input and output to be
64+
passed in named object fields:
65+
66+
{{< clients-example set="home_vecsets" step="data_classes" lang_filter="C#" >}}
67+
{{< /clients-example >}}
68+
69+
Note that you must declare these classes at the end of the source file
70+
if you are using a console app without a main class.
71+
72+
The `GetEmbedding()` function declared below can then use this model to
73+
generate an embedding from a section of text and return it as a `float[]` array,
74+
which is the format required by the vector set API:
75+
76+
{{< clients-example set="home_vecsets" step="get_embedding" lang_filter="C#" >}}
77+
{{< /clients-example >}}
78+
79+
## Create the data
80+
81+
The example data is contained a `Dictionary` object with some brief
82+
descriptions of famous people:
83+
84+
{{< clients-example set="home_vecsets" step="data" lang_filter="C#" >}}
85+
{{< /clients-example >}}
86+
87+
## Add the data to a vector set
88+
89+
The next step is to connect to Redis and add the data to a new vector set.
90+
91+
The code below iterates through `peopleData` and adds corresponding
92+
elements to a vector set called `famousPeople`.
93+
94+
Use the `GetEmbedding()` function declared above to generate the
95+
embedding as a `byte` array that you can pass to the
96+
[`VectorSetAdd()`]({{< relref "/commands/vadd" >}}) command to set the embedding.
97+
98+
The call to `VectorSetAdd()` also adds the `born` and `died` values from the
99+
original dictionary as attribute data. You can access this during a query
100+
or by using the [`VectorSetGetAttributesJson()`]({{< relref "/commands/vgetattr" >}}) method.
101+
102+
{{< clients-example set="home_vecsets" step="add_data" lang_filter="C#" >}}
103+
{{< /clients-example >}}
104+
105+
## Query the vector set
106+
107+
You can now query the data in the set. The basic approach is to use the
108+
`GetEmbedding()` function to generate another embedding vector for the query text.
109+
(This is the same method used to add the elements to the set.) Then, pass
110+
the query vector to [`VectorSetSimilaritySearch()`]({{< relref "/commands/vsim" >}}) to
111+
return elements of the set, ranked in order of similarity to the query.
112+
113+
Start with a simple query for "actors":
114+
115+
{{< clients-example set="home_vecsets" step="basic_query" lang_filter="C#" >}}
116+
{{< /clients-example >}}
117+
118+
This returns the following list of elements (formatted slightly for clarity):
119+
120+
```
121+
'actors': ['Masako Natsume', 'Chaim Topol', 'Linus Pauling',
122+
'Marie Fredriksson', 'Maryam Mirzakhani', 'Marie Curie',
123+
'Freddie Mercury', 'Paul Erdos']
124+
```
125+
126+
The first two people in the list are the two actors, as expected, but none of the
127+
people from Linus Pauling onward was especially well-known for acting (and there certainly
128+
isn't any information about that in the short description text).
129+
As it stands, the search attempts to rank all the elements in the set, based
130+
on the information contained in the embedding model.
131+
You can use the `Count` property of `VectorSetSimilaritySearchRequest` to limit the
132+
list of elements to just the most relevant few items:
133+
134+
{{< clients-example set="home_vecsets" step="limited_query" lang_filter="C#" >}}
135+
{{< /clients-example >}}
136+
137+
The reason for using text embeddings rather than simple text search
138+
is that the embeddings represent semantic information. This allows a query
139+
to find elements with a similar meaning even if the text is
140+
different. For example, the word "entertainer" doesn't appear in any of the
141+
descriptions but if you use it as a query, the actors and musicians are ranked
142+
highest in the results list:
143+
144+
{{< clients-example set="home_vecsets" step="entertainer_query" lang_filter="C#" >}}
145+
{{< /clients-example >}}
146+
147+
Similarly, if you use "science" as a query, you get the following results:
148+
149+
```
150+
'science': ['Marie Curie', 'Linus Pauling', 'Maryam Mirzakhani',
151+
'Paul Erdos', 'Marie Fredriksson', 'Freddie Mercury', 'Masako Natsume',
152+
'Chaim Topol']
153+
```
154+
155+
The scientists are ranked highest but they are then followed by the
156+
mathematicians. This seems reasonable given the connection between mathematics
157+
and science.
158+
159+
You can also use
160+
[filter expressions]({{< relref "/develop/data-types/vector-sets/filtered-search" >}})
161+
with `VectorSetSimilaritySearch()` to restrict the search further. For example,
162+
repeat the "science" query, but this time limit the results to people
163+
who died before the year 2000:
164+
165+
{{< clients-example set="home_vecsets" step="filtered_query" lang_filter="C#" >}}
166+
{{< /clients-example >}}
167+
168+
Note that the boolean filter expression is applied to items in the list
169+
before the vector distance calculation is performed. Items that don't
170+
pass the filter test are removed from the results completely, rather
171+
than just reduced in rank. This can help to improve the performance of the
172+
search because there is no need to calculate the vector distance for
173+
elements that have already been filtered out of the search.
174+
175+
## More information
176+
177+
See the [vector sets]({{< relref "/develop/data-types/vector-sets" >}})
178+
docs for more information and code examples. See the
179+
[Redis for AI]({{< relref "/develop/ai" >}}) section for more details
180+
about text embeddings and other AI techniques you can use with Redis.
181+
182+
You may also be interested in
183+
[vector search]({{< relref "/develop/clients/dotnet/vecsearch" >}}).
184+
This is a feature of the
185+
[Redis query engine]({{< relref "/develop/ai/search-and-query" >}})
186+
that lets you retrieve
187+
[JSON]({{< relref "/develop/data-types/json" >}}) and
188+
[hash]({{< relref "/develop/data-types/hashes" >}}) documents based on
189+
vector data stored in their fields.

0 commit comments

Comments
 (0)