Skip to content

Commit 875e49d

Browse files
DOC-5537 Python vector set examples
1 parent b35e164 commit 875e49d

File tree

2 files changed

+171
-135
lines changed

2 files changed

+171
-135
lines changed

content/develop/clients/redis-py/vecsets.md

Lines changed: 16 additions & 135 deletions
Original file line numberDiff line numberDiff line change
@@ -47,12 +47,8 @@ pip install sentence-transformers
4747

4848
In a new Python file, import the required classes:
4949

50-
```python
51-
from sentence_transformers import SentenceTransformer
52-
53-
import redis
54-
import numpy as np
55-
```
50+
{{< clients-example set="home_vecsets" step="import" >}}
51+
{{< /clients-example >}}
5652

5753
The first of these imports is the
5854
`SentenceTransformer` class, which generates an embedding from a section of text.
@@ -65,77 +61,16 @@ tokens (see
6561
at the [Hugging Face](https://huggingface.co/) docs to learn more about the way tokens
6662
are related to the original text).
6763

68-
```python
69-
model = SentenceTransformer("sentence-transformers/all-MiniLM-L6-v2")
70-
```
64+
{{< clients-example set="home_vecsets" step="model" >}}
65+
{{< /clients-example >}}
7166

7267
## Create the data
7368

7469
The example data is contained a dictionary with some brief
7570
descriptions of famous people:
7671

77-
```python
78-
peopleData = {
79-
"Marie Curie": {
80-
"born": 1867, "died": 1934,
81-
"description": """
82-
Polish-French chemist and physicist. The only person ever to win
83-
two Nobel prizes for two different sciences.
84-
"""
85-
},
86-
"Linus Pauling": {
87-
"born": 1901, "died": 1994,
88-
"description": """
89-
American chemist and peace activist. One of only two people to win two
90-
Nobel prizes in different fields (chemistry and peace).
91-
"""
92-
},
93-
"Freddie Mercury": {
94-
"born": 1946, "died": 1991,
95-
"description": """
96-
British musician, best known as the lead singer of the rock band
97-
Queen.
98-
"""
99-
},
100-
"Marie Fredriksson": {
101-
"born": 1958, "died": 2019,
102-
"description": """
103-
Swedish multi-instrumentalist, mainly known as the lead singer and
104-
keyboardist of the band Roxette.
105-
"""
106-
},
107-
"Paul Erdos": {
108-
"born": 1913, "died": 1996,
109-
"description": """
110-
Hungarian mathematician, known for his eccentric personality almost
111-
as much as his contributions to many different fields of mathematics.
112-
"""
113-
},
114-
"Maryam Mirzakhani": {
115-
"born": 1977, "died": 2017,
116-
"description": """
117-
Iranian mathematician. The first woman ever to win the Fields medal
118-
for her contributions to mathematics.
119-
"""
120-
},
121-
"Masako Natsume": {
122-
"born": 1957, "died": 1985,
123-
"description": """
124-
Japanese actress. She was very famous in Japan but was primarily
125-
known elsewhere in the world for her portrayal of Tripitaka in the
126-
TV series Monkey.
127-
"""
128-
},
129-
"Chaim Topol": {
130-
"born": 1935, "died": 2023,
131-
"description": """
132-
Israeli actor and singer, usually credited simply as 'Topol'. He was
133-
best known for his many appearances as Tevye in the musical Fiddler
134-
on the Roof.
135-
"""
136-
}
137-
}
138-
```
72+
{{< clients-example set="home_vecsets" step="data" >}}
73+
{{< /clients-example >}}
13974

14075
## Add the data to a vector set
14176

@@ -164,22 +99,8 @@ The call to `vadd()` also adds the `born` and `died` values from the
16499
original dictionary as attribute data. You can access this during a query
165100
or by using the [`vgetattr()`]({{< relref "/commands/vgetattr" >}}) method.
166101

167-
```py
168-
r = redis.Redis(decode_responses=True)
169-
170-
for name, details in peopleData.items():
171-
emb = model.encode(details["description"]).astype(np.float32).tobytes()
172-
173-
r.vset().vadd(
174-
"famousPeople",
175-
emb,
176-
name,
177-
attributes={
178-
"born": details["born"],
179-
"died": details["died"]
180-
}
181-
)
182-
```
102+
{{< clients-example set="home_vecsets" step="add_data" >}}
103+
{{< /clients-example >}}
183104

184105
## Query the vector set
185106

@@ -191,16 +112,8 @@ of the set, ranked in order of similarity to the query.
191112

192113
Start with a simple query for "actors":
193114

194-
```py
195-
query_value = "actors"
196-
197-
actors_results = r.vset().vsim(
198-
"famousPeople",
199-
model.encode(query_value).astype(np.float32).tobytes(),
200-
)
201-
202-
print(f"'actors': {actors_results}")
203-
```
115+
{{< clients-example set="home_vecsets" step="basic_query" >}}
116+
{{< /clients-example >}}
204117

205118
This returns the following list of elements (formatted slightly for clarity):
206119

@@ -218,18 +131,8 @@ on the information contained in the embedding model.
218131
You can use the `count` parameter of `vsim()` to limit the list of elements
219132
to just the most relevant few items:
220133

221-
```py
222-
query_value = "actors"
223-
224-
two_actors_results = r.vset().vsim(
225-
"famousPeople",
226-
model.encode(query_value).astype(np.float32).tobytes(),
227-
count=2
228-
)
229-
230-
print(f"'actors (2)': {two_actors_results}")
231-
# >>> 'actors (2)': ['Masako Natsume', 'Chaim Topol']
232-
```
134+
{{< clients-example set="home_vecsets" step="limited_query" >}}
135+
{{< /clients-example >}}
233136

234137
The reason for using text embeddings rather than simple text search
235138
is that the embeddings represent semantic information. This allows a query
@@ -238,19 +141,8 @@ different. For example, the word "entertainer" doesn't appear in any of the
238141
descriptions but if you use it as a query, the actors and musicians are ranked
239142
highest in the results list:
240143

241-
```py
242-
query_value = "entertainer"
243-
244-
entertainer_results = r.vset().vsim(
245-
"famousPeople",
246-
model.encode(query_value).astype(np.float32).tobytes()
247-
)
248-
249-
print(f"'entertainer': {entertainer_results}")
250-
# >>> 'entertainer': ['Chaim Topol', 'Freddie Mercury',
251-
# >>> 'Marie Fredriksson', 'Masako Natsume', 'Linus Pauling',
252-
# 'Paul Erdos', 'Maryam Mirzakhani', 'Marie Curie']
253-
```
144+
{{< clients-example set="home_vecsets" step="entertainer_query" >}}
145+
{{< /clients-example >}}
254146

255147
Similarly, if you use "science" as a query, you get the following results:
256148

@@ -270,19 +162,8 @@ with `vsim()` to restrict the search further. For example,
270162
repeat the "science" query, but this time limit the results to people
271163
who died before the year 2000:
272164

273-
```py
274-
query_value = "science"
275-
276-
science2000_results = r.vset().vsim(
277-
"famousPeople",
278-
model.encode(query_value).astype(np.float32).tobytes(),
279-
filter=".died < 2000"
280-
)
281-
282-
print(f"'science2000': {science2000_results}")
283-
# >>> 'science2000': ['Marie Curie', 'Linus Pauling',
284-
# 'Paul Erdos', 'Freddie Mercury', 'Masako Natsume']
285-
```
165+
{{< clients-example set="home_vecsets" step="filtered_query" >}}
166+
{{< /clients-example >}}
286167

287168
Note that the boolean filter expression is applied to items in the list
288169
before the vector distance calculation is performed. Items that don't
Lines changed: 155 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,155 @@
1+
# EXAMPLE: home_vecsets
2+
# STEP_START import
3+
from sentence_transformers import SentenceTransformer
4+
5+
import redis
6+
import numpy as np
7+
# STEP_END
8+
9+
# STEP_START model
10+
model = SentenceTransformer("sentence-transformers/all-MiniLM-L6-v2")
11+
# STEP_END
12+
13+
# STEP_START data
14+
peopleData = {
15+
"Marie Curie": {
16+
"born": 1867, "died": 1934,
17+
"description": """
18+
Polish-French chemist and physicist. The only person ever to win
19+
two Nobel prizes for two different sciences.
20+
"""
21+
},
22+
"Linus Pauling": {
23+
"born": 1901, "died": 1994,
24+
"description": """
25+
American chemist and peace activist. One of only two people to win two
26+
Nobel prizes in different fields (chemistry and peace).
27+
"""
28+
},
29+
"Freddie Mercury": {
30+
"born": 1946, "died": 1991,
31+
"description": """
32+
British musician, best known as the lead singer of the rock band
33+
Queen.
34+
"""
35+
},
36+
"Marie Fredriksson": {
37+
"born": 1958, "died": 2019,
38+
"description": """
39+
Swedish multi-instrumentalist, mainly known as the lead singer and
40+
keyboardist of the band Roxette.
41+
"""
42+
},
43+
"Paul Erdos": {
44+
"born": 1913, "died": 1996,
45+
"description": """
46+
Hungarian mathematician, known for his eccentric personality almost
47+
as much as his contributions to many different fields of mathematics.
48+
"""
49+
},
50+
"Maryam Mirzakhani": {
51+
"born": 1977, "died": 2017,
52+
"description": """
53+
Iranian mathematician. The first woman ever to win the Fields medal
54+
for her contributions to mathematics.
55+
"""
56+
},
57+
"Masako Natsume": {
58+
"born": 1957, "died": 1985,
59+
"description": """
60+
Japanese actress. She was very famous in Japan but was primarily
61+
known elsewhere in the world for her portrayal of Tripitaka in the
62+
TV series Monkey.
63+
"""
64+
},
65+
"Chaim Topol": {
66+
"born": 1935, "died": 2023,
67+
"description": """
68+
Israeli actor and singer, usually credited simply as 'Topol'. He was
69+
best known for his many appearances as Tevye in the musical Fiddler
70+
on the Roof.
71+
"""
72+
}
73+
}
74+
# STEP_END
75+
76+
# STEP_START add_data
77+
r = redis.Redis(decode_responses=True)
78+
79+
for name, details in peopleData.items():
80+
emb = model.encode(details["description"]).astype(np.float32).tobytes()
81+
82+
r.vset().vadd(
83+
"famousPeople",
84+
emb,
85+
name,
86+
attributes={
87+
"born": details["born"],
88+
"died": details["died"]
89+
}
90+
)
91+
# STEP_END
92+
93+
# STEP_START basic_query
94+
query_value = "actors"
95+
96+
actors_results = r.vset().vsim(
97+
"famousPeople",
98+
model.encode(query_value).astype(np.float32).tobytes(),
99+
)
100+
101+
print(f"'actors': {actors_results}")
102+
# STEP_END
103+
104+
# STEP_START limited_query
105+
query_value = "actors"
106+
107+
two_actors_results = r.vset().vsim(
108+
"famousPeople",
109+
model.encode(query_value).astype(np.float32).tobytes(),
110+
count=2
111+
)
112+
113+
print(f"'actors (2)': {two_actors_results}")
114+
# >>> 'actors (2)': ['Masako Natsume', 'Chaim Topol']
115+
# STEP_END
116+
117+
# STEP_START entertainer_query
118+
query_value = "entertainer"
119+
120+
entertainer_results = r.vset().vsim(
121+
"famousPeople",
122+
model.encode(query_value).astype(np.float32).tobytes()
123+
)
124+
125+
print(f"'entertainer': {entertainer_results}")
126+
# >>> 'entertainer': ['Chaim Topol', 'Freddie Mercury',
127+
# 'Marie Fredriksson', 'Masako Natsume', 'Linus Pauling',
128+
# 'Paul Erdos', 'Maryam Mirzakhani', 'Marie Curie']
129+
# STEP_END
130+
131+
query_value = "science"
132+
133+
science_results = r.vset().vsim(
134+
"famousPeople",
135+
model.encode(query_value).astype(np.float32).tobytes()
136+
)
137+
138+
print(f"'science': {science_results}")
139+
# >>> 'science': ['Marie Curie', 'Linus Pauling',
140+
# 'Maryam Mirzakhani', 'Paul Erdos', 'Marie Fredriksson',
141+
# 'Freddie Mercury', 'Masako Natsume', 'Chaim Topol']
142+
143+
# STEP_START filtered_query
144+
query_value = "science"
145+
146+
science2000_results = r.vset().vsim(
147+
"famousPeople",
148+
model.encode(query_value).astype(np.float32).tobytes(),
149+
filter=".died < 2000"
150+
)
151+
152+
print(f"'science2000': {science2000_results}")
153+
# >>> 'science2000': ['Marie Curie', 'Linus Pauling',
154+
# 'Paul Erdos', 'Freddie Mercury', 'Masako Natsume']
155+
# STEP_END

0 commit comments

Comments
 (0)