Skip to content

Commit b35e164

Browse files
DOC-5537 added local Python vec query examples
1 parent b93b1c3 commit b35e164

File tree

2 files changed

+172
-138
lines changed

2 files changed

+172
-138
lines changed

content/develop/clients/redis-py/vecsearch.md

Lines changed: 18 additions & 138 deletions
Original file line numberDiff line numberDiff line change
@@ -56,16 +56,8 @@ pip install sentence-transformers
5656

5757
In a new Python source file, start by importing the required classes:
5858

59-
```python
60-
from sentence_transformers import SentenceTransformer
61-
from redis.commands.search.query import Query
62-
from redis.commands.search.field import TextField, TagField, VectorField
63-
from redis.commands.search.indexDefinition import IndexDefinition, IndexType
64-
from redis.commands.json.path import Path
65-
66-
import numpy as np
67-
import redis
68-
```
59+
{{< clients-example set="home_query_vec" step="import" >}}
60+
{{< /clients-example >}}
6961

7062
The first of these imports is the
7163
`SentenceTransformer` class, which generates an embedding from a section of text.
@@ -78,9 +70,8 @@ tokens (see
7870
at the [Hugging Face](https://huggingface.co/) docs to learn more about the way tokens
7971
are related to the original text).
8072

81-
```python
82-
model = SentenceTransformer("sentence-transformers/all-MiniLM-L6-v2")
83-
```
73+
{{< clients-example set="home_query_vec" step="model" >}}
74+
{{< /clients-example >}}
8475

8576
## Create the index
8677

@@ -89,14 +80,8 @@ name `vector_idx`. (The `dropindex()` call throws an exception if
8980
the index doesn't already exist, which is why you need the
9081
`try: except:` block.)
9182

92-
```python
93-
r = redis.Redis(decode_responses=True)
94-
95-
try:
96-
r.ft("vector_idx").dropindex(True)
97-
except redis.exceptions.ResponseError:
98-
pass
99-
```
83+
{{< clients-example set="home_query_vec" step="connect" >}}
84+
{{< /clients-example >}}
10085

10186
Next, create the index.
10287
The schema in the example below specifies hash objects for storage and includes
@@ -110,24 +95,8 @@ indexing, the
11095
vector distance metric, `Float32` values to represent the vector's components,
11196
and 384 dimensions, as required by the `all-MiniLM-L6-v2` embedding model.
11297

113-
```python
114-
schema = (
115-
TextField("content"),
116-
TagField("genre"),
117-
VectorField("embedding", "HNSW", {
118-
"TYPE": "FLOAT32",
119-
"DIM": 384,
120-
"DISTANCE_METRIC":"L2"
121-
})
122-
)
123-
124-
r.ft("vector_idx").create_index(
125-
schema,
126-
definition=IndexDefinition(
127-
prefix=["doc:"], index_type=IndexType.HASH
128-
)
129-
)
130-
```
98+
{{< clients-example set="home_query_vec" step="create_index" >}}
99+
{{< /clients-example >}}
131100

132101
## Add data
133102

@@ -144,31 +113,8 @@ Use the binary string representation when you are indexing hashes
144113
or running a query (but use a list of `float` for
145114
[JSON documents](#differences-with-json-documents)).
146115

147-
```python
148-
content = "That is a very happy person"
149-
150-
r.hset("doc:0", mapping={
151-
"content": content,
152-
"genre": "persons",
153-
"embedding": model.encode(content).astype(np.float32).tobytes(),
154-
})
155-
156-
content = "That is a happy dog"
157-
158-
r.hset("doc:1", mapping={
159-
"content": content,
160-
"genre": "pets",
161-
"embedding": model.encode(content).astype(np.float32).tobytes(),
162-
})
163-
164-
content = "Today is a sunny day"
165-
166-
r.hset("doc:2", mapping={
167-
"content": content,
168-
"genre": "weather",
169-
"embedding": model.encode(content).astype(np.float32).tobytes(),
170-
})
171-
```
116+
{{< clients-example set="home_query_vec" step="add_data" >}}
117+
{{< /clients-example >}}
172118

173119
## Run a query
174120

@@ -184,21 +130,8 @@ the indexing, and passes it as a parameter when the query executes
184130
[Vector search]({{< relref "/develop/ai/search-and-query/query/vector-search" >}})
185131
for more information about using query parameters with embeddings).
186132

187-
```python
188-
q = Query(
189-
"*=>[KNN 3 @embedding $vec AS vector_distance]"
190-
).return_field("score").dialect(2)
191-
192-
query_text = "That is a happy person"
193-
194-
res = r.ft("vector_idx").search(
195-
q, query_params={
196-
"vec": model.encode(query_text).astype(np.float32).tobytes()
197-
}
198-
)
199-
200-
print(res)
201-
```
133+
{{< clients-example set="home_query_vec" step="query" >}}
134+
{{< /clients-example >}}
202135

203136
The code is now ready to run, but note that it may take a while to complete when
204137
you run it for the first time (which happens because RedisVL must download the
@@ -250,27 +183,8 @@ every query. Also, you must specify `IndexType.JSON` when you create the index.
250183
The code below shows these differences, but the index is otherwise very similar to
251184
the one created previously for hashes:
252185

253-
```py
254-
schema = (
255-
TextField("$.content", as_name="content"),
256-
TagField("$.genre", as_name="genre"),
257-
VectorField(
258-
"$.embedding", "HNSW", {
259-
"TYPE": "FLOAT32",
260-
"DIM": 384,
261-
"DISTANCE_METRIC": "L2"
262-
},
263-
as_name="embedding"
264-
)
265-
)
266-
267-
r.ft("vector_json_idx").create_index(
268-
schema,
269-
definition=IndexDefinition(
270-
prefix=["jdoc:"], index_type=IndexType.JSON
271-
)
272-
)
273-
```
186+
{{< clients-example set="home_query_vec" step="json_index" >}}
187+
{{< /clients-example >}}
274188

275189
Use [`json().set()`]({{< relref "/commands/json.set" >}}) to add the data
276190
instead of [`hset()`]({{< relref "/commands/hset" >}}). The dictionaries
@@ -283,31 +197,8 @@ specified using lists instead of binary strings. Generate the list
283197
using the `tolist()` method instead of `tobytes()` as you would with a
284198
hash.
285199

286-
```py
287-
content = "That is a very happy person"
288-
289-
r.json().set("jdoc:0", Path.root_path(), {
290-
"content": content,
291-
"genre": "persons",
292-
"embedding": model.encode(content).astype(np.float32).tolist(),
293-
})
294-
295-
content = "That is a happy dog"
296-
297-
r.json().set("jdoc:1", Path.root_path(), {
298-
"content": content,
299-
"genre": "pets",
300-
"embedding": model.encode(content).astype(np.float32).tolist(),
301-
})
302-
303-
content = "Today is a sunny day"
304-
305-
r.json().set("jdoc:2", Path.root_path(), {
306-
"content": content,
307-
"genre": "weather",
308-
"embedding": model.encode(content).astype(np.float32).tolist(),
309-
})
310-
```
200+
{{< clients-example set="home_query_vec" step="json_data" >}}
201+
{{< /clients-example >}}
311202

312203
The query is almost identical to the one for the hash documents. This
313204
demonstrates how the right choice of aliases for the JSON paths can
@@ -316,19 +207,8 @@ is that the vector parameter for the query is still specified as a
316207
binary string (using the `tobytes()` method), even though the data for
317208
the `embedding` field of the JSON was specified as a list.
318209

319-
```py
320-
q = Query(
321-
"*=>[KNN 3 @embedding $vec AS vector_distance]"
322-
).return_field("vector_distance").return_field("content").dialect(2)
323-
324-
query_text = "That is a happy person"
325-
326-
res = r.ft("vector_json_idx").search(
327-
q, query_params={
328-
"vec": model.encode(query_text).astype(np.float32).tobytes()
329-
}
330-
)
331-
```
210+
{{< clients-example set="home_query_vec" step="json_query" >}}
211+
{{< /clients-example >}}
332212

333213
Apart from the `jdoc:` prefixes for the keys, the result from the JSON
334214
query is the same as for hash:
Lines changed: 154 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,154 @@
1+
# EXAMPLE: home_query_vec
2+
# STEP_START import
3+
from sentence_transformers import SentenceTransformer
4+
from redis.commands.search.query import Query
5+
from redis.commands.search.field import TextField, TagField, VectorField
6+
from redis.commands.search.indexDefinition import IndexDefinition, IndexType
7+
from redis.commands.json.path import Path
8+
9+
import numpy as np
10+
import redis
11+
# STEP_END
12+
13+
# STEP_START model
14+
model = SentenceTransformer("sentence-transformers/all-MiniLM-L6-v2")
15+
# STEP_END
16+
17+
# STEP_START connect
18+
r = redis.Redis(decode_responses=True)
19+
20+
try:
21+
r.ft("vector_idx").dropindex(True)
22+
except redis.exceptions.ResponseError:
23+
pass
24+
# STEP_END
25+
26+
# STEP_START create_index
27+
schema = (
28+
TextField("content"),
29+
TagField("genre"),
30+
VectorField("embedding", "HNSW", {
31+
"TYPE": "FLOAT32",
32+
"DIM": 384,
33+
"DISTANCE_METRIC":"L2"
34+
})
35+
)
36+
37+
r.ft("vector_idx").create_index(
38+
schema,
39+
definition=IndexDefinition(
40+
prefix=["doc:"], index_type=IndexType.HASH
41+
)
42+
)
43+
# STEP_END
44+
45+
# STEP_START add_data
46+
content = "That is a very happy person"
47+
48+
r.hset("doc:0", mapping={
49+
"content": content,
50+
"genre": "persons",
51+
"embedding": model.encode(content).astype(np.float32).tobytes(),
52+
})
53+
54+
content = "That is a happy dog"
55+
56+
r.hset("doc:1", mapping={
57+
"content": content,
58+
"genre": "pets",
59+
"embedding": model.encode(content).astype(np.float32).tobytes(),
60+
})
61+
62+
content = "Today is a sunny day"
63+
64+
r.hset("doc:2", mapping={
65+
"content": content,
66+
"genre": "weather",
67+
"embedding": model.encode(content).astype(np.float32).tobytes(),
68+
})
69+
# STEP_END
70+
71+
# STEP_START query
72+
q = Query(
73+
"*=>[KNN 3 @embedding $vec AS vector_distance]"
74+
).return_field("vector_distance").return_field("content").dialect(2)
75+
76+
query_text = "That is a happy person"
77+
78+
res = r.ft("vector_idx").search(
79+
q, query_params={
80+
"vec": model.encode(query_text).astype(np.float32).tobytes()
81+
}
82+
)
83+
84+
print(res)
85+
# STEP_END
86+
87+
try:
88+
r.ft("vector_json_idx").dropindex(True)
89+
except redis.exceptions.ResponseError:
90+
pass
91+
92+
# STEP_START json_index
93+
schema = (
94+
TextField("$.content", as_name="content"),
95+
TagField("$.genre", as_name="genre"),
96+
VectorField(
97+
"$.embedding", "HNSW", {
98+
"TYPE": "FLOAT32",
99+
"DIM": 384,
100+
"DISTANCE_METRIC": "L2"
101+
},
102+
as_name="embedding"
103+
)
104+
)
105+
106+
r.ft("vector_json_idx").create_index(
107+
schema,
108+
definition=IndexDefinition(
109+
prefix=["jdoc:"], index_type=IndexType.JSON
110+
)
111+
)
112+
# STEP_END
113+
114+
# STEP_START json_data
115+
content = "That is a very happy person"
116+
117+
r.json().set("jdoc:0", Path.root_path(), {
118+
"content": content,
119+
"genre": "persons",
120+
"embedding": model.encode(content).astype(np.float32).tolist(),
121+
})
122+
123+
content = "That is a happy dog"
124+
125+
r.json().set("jdoc:1", Path.root_path(), {
126+
"content": content,
127+
"genre": "pets",
128+
"embedding": model.encode(content).astype(np.float32).tolist(),
129+
})
130+
131+
content = "Today is a sunny day"
132+
133+
r.json().set("jdoc:2", Path.root_path(), {
134+
"content": content,
135+
"genre": "weather",
136+
"embedding": model.encode(content).astype(np.float32).tolist(),
137+
})
138+
# STEP_END
139+
140+
# STEP_START json_query
141+
q = Query(
142+
"*=>[KNN 3 @embedding $vec AS vector_distance]"
143+
).return_field("vector_distance").return_field("content").dialect(2)
144+
145+
query_text = "That is a happy person"
146+
147+
res = r.ft("vector_json_idx").search(
148+
q, query_params={
149+
"vec": model.encode(query_text).astype(np.float32).tobytes()
150+
}
151+
)
152+
153+
print(repr(res))
154+
# STEP_END

0 commit comments

Comments
 (0)