@@ -47,12 +47,8 @@ pip install sentence-transformers
47
47
48
48
In a new Python file, import the required classes:
49
49
50
- ``` python
51
- from sentence_transformers import SentenceTransformer
52
-
53
- import redis
54
- import numpy as np
55
- ```
50
+ {{< clients-example set="home_vecsets" step="import" >}}
51
+ {{< /clients-example >}}
56
52
57
53
The first of these imports is the
58
54
` SentenceTransformer ` class, which generates an embedding from a section of text.
@@ -65,77 +61,16 @@ tokens (see
65
61
at the [ Hugging Face] ( https://huggingface.co/ ) docs to learn more about the way tokens
66
62
are related to the original text).
67
63
68
- ``` python
69
- model = SentenceTransformer(" sentence-transformers/all-MiniLM-L6-v2" )
70
- ```
64
+ {{< clients-example set="home_vecsets" step="model" >}}
65
+ {{< /clients-example >}}
71
66
72
67
## Create the data
73
68
74
69
The example data is contained a dictionary with some brief
75
70
descriptions of famous people:
76
71
77
- ``` python
78
- peopleData = {
79
- " Marie Curie" : {
80
- " born" : 1867 , " died" : 1934 ,
81
- " description" : """
82
- Polish-French chemist and physicist. The only person ever to win
83
- two Nobel prizes for two different sciences.
84
- """
85
- },
86
- " Linus Pauling" : {
87
- " born" : 1901 , " died" : 1994 ,
88
- " description" : """
89
- American chemist and peace activist. One of only two people to win two
90
- Nobel prizes in different fields (chemistry and peace).
91
- """
92
- },
93
- " Freddie Mercury" : {
94
- " born" : 1946 , " died" : 1991 ,
95
- " description" : """
96
- British musician, best known as the lead singer of the rock band
97
- Queen.
98
- """
99
- },
100
- " Marie Fredriksson" : {
101
- " born" : 1958 , " died" : 2019 ,
102
- " description" : """
103
- Swedish multi-instrumentalist, mainly known as the lead singer and
104
- keyboardist of the band Roxette.
105
- """
106
- },
107
- " Paul Erdos" : {
108
- " born" : 1913 , " died" : 1996 ,
109
- " description" : """
110
- Hungarian mathematician, known for his eccentric personality almost
111
- as much as his contributions to many different fields of mathematics.
112
- """
113
- },
114
- " Maryam Mirzakhani" : {
115
- " born" : 1977 , " died" : 2017 ,
116
- " description" : """
117
- Iranian mathematician. The first woman ever to win the Fields medal
118
- for her contributions to mathematics.
119
- """
120
- },
121
- " Masako Natsume" : {
122
- " born" : 1957 , " died" : 1985 ,
123
- " description" : """
124
- Japanese actress. She was very famous in Japan but was primarily
125
- known elsewhere in the world for her portrayal of Tripitaka in the
126
- TV series Monkey.
127
- """
128
- },
129
- " Chaim Topol" : {
130
- " born" : 1935 , " died" : 2023 ,
131
- " description" : """
132
- Israeli actor and singer, usually credited simply as 'Topol'. He was
133
- best known for his many appearances as Tevye in the musical Fiddler
134
- on the Roof.
135
- """
136
- }
137
- }
138
- ```
72
+ {{< clients-example set="home_vecsets" step="data" >}}
73
+ {{< /clients-example >}}
139
74
140
75
## Add the data to a vector set
141
76
@@ -164,22 +99,8 @@ The call to `vadd()` also adds the `born` and `died` values from the
164
99
original dictionary as attribute data. You can access this during a query
165
100
or by using the [ ` vgetattr() ` ] ({{< relref "/commands/vgetattr" >}}) method.
166
101
167
- ``` py
168
- r = redis.Redis(decode_responses = True )
169
-
170
- for name, details in peopleData.items():
171
- emb = model.encode(details[" description" ]).astype(np.float32).tobytes()
172
-
173
- r.vset().vadd(
174
- " famousPeople" ,
175
- emb,
176
- name,
177
- attributes = {
178
- " born" : details[" born" ],
179
- " died" : details[" died" ]
180
- }
181
- )
182
- ```
102
+ {{< clients-example set="home_vecsets" step="add_data" >}}
103
+ {{< /clients-example >}}
183
104
184
105
## Query the vector set
185
106
@@ -191,16 +112,8 @@ of the set, ranked in order of similarity to the query.
191
112
192
113
Start with a simple query for "actors":
193
114
194
- ``` py
195
- query_value = " actors"
196
-
197
- actors_results = r.vset().vsim(
198
- " famousPeople" ,
199
- model.encode(query_value).astype(np.float32).tobytes(),
200
- )
201
-
202
- print (f " 'actors': { actors_results} " )
203
- ```
115
+ {{< clients-example set="home_vecsets" step="basic_query" >}}
116
+ {{< /clients-example >}}
204
117
205
118
This returns the following list of elements (formatted slightly for clarity):
206
119
@@ -218,18 +131,8 @@ on the information contained in the embedding model.
218
131
You can use the ` count ` parameter of ` vsim() ` to limit the list of elements
219
132
to just the most relevant few items:
220
133
221
- ``` py
222
- query_value = " actors"
223
-
224
- two_actors_results = r.vset().vsim(
225
- " famousPeople" ,
226
- model.encode(query_value).astype(np.float32).tobytes(),
227
- count = 2
228
- )
229
-
230
- print (f " 'actors (2)': { two_actors_results} " )
231
- # >>> 'actors (2)': ['Masako Natsume', 'Chaim Topol']
232
- ```
134
+ {{< clients-example set="home_vecsets" step="limited_query" >}}
135
+ {{< /clients-example >}}
233
136
234
137
The reason for using text embeddings rather than simple text search
235
138
is that the embeddings represent semantic information. This allows a query
@@ -238,19 +141,8 @@ different. For example, the word "entertainer" doesn't appear in any of the
238
141
descriptions but if you use it as a query, the actors and musicians are ranked
239
142
highest in the results list:
240
143
241
- ``` py
242
- query_value = " entertainer"
243
-
244
- entertainer_results = r.vset().vsim(
245
- " famousPeople" ,
246
- model.encode(query_value).astype(np.float32).tobytes()
247
- )
248
-
249
- print (f " 'entertainer': { entertainer_results} " )
250
- # >>> 'entertainer': ['Chaim Topol', 'Freddie Mercury',
251
- # >>> 'Marie Fredriksson', 'Masako Natsume', 'Linus Pauling',
252
- # 'Paul Erdos', 'Maryam Mirzakhani', 'Marie Curie']
253
- ```
144
+ {{< clients-example set="home_vecsets" step="entertainer_query" >}}
145
+ {{< /clients-example >}}
254
146
255
147
Similarly, if you use "science" as a query, you get the following results:
256
148
@@ -270,19 +162,8 @@ with `vsim()` to restrict the search further. For example,
270
162
repeat the "science" query, but this time limit the results to people
271
163
who died before the year 2000:
272
164
273
- ``` py
274
- query_value = " science"
275
-
276
- science2000_results = r.vset().vsim(
277
- " famousPeople" ,
278
- model.encode(query_value).astype(np.float32).tobytes(),
279
- filter = " .died < 2000"
280
- )
281
-
282
- print (f " 'science2000': { science2000_results} " )
283
- # >>> 'science2000': ['Marie Curie', 'Linus Pauling',
284
- # 'Paul Erdos', 'Freddie Mercury', 'Masako Natsume']
285
- ```
165
+ {{< clients-example set="home_vecsets" step="filtered_query" >}}
166
+ {{< /clients-example >}}
286
167
287
168
Note that the boolean filter expression is applied to items in the list
288
169
before the vector distance calculation is performed. Items that don't
0 commit comments