Skip to content

Commit 1fe05d9

Browse files
DOC-5556 added Lettuce doc query page and new example folders
1 parent df12352 commit 1fe05d9

File tree

10 files changed

+585
-8
lines changed

10 files changed

+585
-8
lines changed
Lines changed: 123 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,123 @@
1+
---
2+
categories:
3+
- docs
4+
- develop
5+
- stack
6+
- oss
7+
- rs
8+
- rc
9+
- oss
10+
- kubernetes
11+
- clients
12+
description: Learn how to use the Redis query engine with JSON and hash documents.
13+
linkTitle: Index and query documents
14+
title: Index and query documents
15+
weight: 2
16+
---
17+
18+
This example shows how to create a
19+
[search index]({{< relref "/develop/ai/search-and-query/indexing" >}})
20+
for [JSON]({{< relref "/develop/data-types/json" >}}) documents and
21+
run queries against the index. It then goes on to show the slight differences
22+
in the equivalent code for [hash]({{< relref "/develop/data-types/hashes" >}})
23+
documents.
24+
25+
## Initialize
26+
27+
Make sure that you have [Redis Open Source]({{< relref "/operate/oss_and_stack/" >}})
28+
or another Redis server available. Also install the
29+
[Lettuce]({{< relref "/develop/clients/lettuce" >}}) client library if you
30+
haven't already done so.
31+
32+
Add the following dependencies. All of them are applicable to both JSON and hash,
33+
except for the `JsonParser`, `JsonPath`, and `JsonObject` classes.
34+
35+
{{< clients-example lettuce_home_json import >}}
36+
{{< /clients-example >}}
37+
38+
## Create data
39+
40+
Create some test data to add to the database:
41+
42+
{{< clients-example lettuce_home_json create_data >}}
43+
{{< /clients-example >}}
44+
45+
## Add the index
46+
47+
Connect to your Redis database. The code below shows the most
48+
basic connection but see
49+
[Connect to the server]({{< relref "/develop/clients/lettuce/connect" >}})
50+
to learn more about the available connection options.
51+
52+
{{< clients-example lettuce_home_json connect >}}
53+
{{< /clients-example >}}
54+
55+
Create an index. In this example, only JSON documents with the key prefix `user:` are indexed. For more information, see [Query syntax]({{< relref "/develop/ai/search-and-query/query/" >}}).
56+
57+
{{< clients-example lettuce_home_json make_index >}}
58+
{{< /clients-example >}}
59+
60+
## Add the data
61+
62+
Add the three sets of user data to the database as
63+
[JSON]({{< relref "/develop/data-types/json" >}}) objects.
64+
If you use keys with the `user:` prefix then Redis will index the
65+
objects automatically as you add them:
66+
67+
{{< clients-example lettuce_home_json add_data >}}
68+
{{< /clients-example >}}
69+
70+
## Query the data
71+
72+
You can now use the index to search the JSON objects. The
73+
[query]({{< relref "/develop/ai/search-and-query/query" >}})
74+
below searches for objects that have the text "Paul" in any field
75+
and have an `age` value in the range 30 to 40:
76+
77+
{{< clients-example lettuce_home_json query1 >}}
78+
{{< /clients-example >}}
79+
80+
Specify query options to return only the `city` field:
81+
82+
{{< clients-example lettuce_home_json query2 >}}
83+
{{< /clients-example >}}
84+
85+
Use an
86+
[aggregation query]({{< relref "/develop/ai/search-and-query/query/aggregation" >}})
87+
to count all users in each city.
88+
89+
{{< clients-example lettuce_home_json query3 >}}
90+
{{< /clients-example >}}
91+
92+
## Differences with hash documents
93+
94+
Indexing for hash documents is very similar to JSON indexing but you
95+
need to specify some slightly different options.
96+
97+
When you create the schema for a hash index, you don't need to
98+
add aliases for the fields, since you use the basic names to access
99+
the fields anyway. Also, you must use `CreateArgs.TargetType.HASH` for the `On()`
100+
option of `CreateArgs` when you create the index. The code below shows these
101+
changes with a new index called `hash-idx:users`, which is otherwise the same as
102+
the `idx:users` index used for JSON documents in the previous examples.
103+
104+
{{< clients-example lettuce_home_json make_hash_index >}}
105+
{{< /clients-example >}}
106+
107+
Use [`hset()`]({{< relref "/commands/hset" >}}) to add the hash
108+
documents instead of [`jsonSet()`]({{< relref "/commands/json.set" >}}).
109+
110+
{{< clients-example lettuce_home_json add_hash_data >}}
111+
{{< /clients-example >}}
112+
113+
The query commands work the same here for hash as they do for JSON (but
114+
the name of the hash index is different). The results are returned in
115+
a `List` of `SearchReply.SearchResult<String, String>` objects, as with JSON:
116+
117+
{{< clients-example lettuce_home_json query1_hash >}}
118+
{{< /clients-example >}}
119+
120+
## More information
121+
122+
See the [Redis query engine]({{< relref "/develop/ai/search-and-query" >}}) docs
123+
for a full description of all query features with examples.

content/develop/clients/redis-py/vecsets.md

Lines changed: 8 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -47,7 +47,7 @@ pip install sentence-transformers
4747

4848
In a new Python file, import the required classes:
4949

50-
{{< clients-example set="home_vecsets" step="import" >}}
50+
{{< clients-example set="home_vecsets" step="import" lang_filter="Python" >}}
5151
{{< /clients-example >}}
5252

5353
The first of these imports is the
@@ -61,15 +61,15 @@ tokens (see
6161
at the [Hugging Face](https://huggingface.co/) docs to learn more about the way tokens
6262
are related to the original text).
6363

64-
{{< clients-example set="home_vecsets" step="model" >}}
64+
{{< clients-example set="home_vecsets" step="model" lang_filter="Python" >}}
6565
{{< /clients-example >}}
6666

6767
## Create the data
6868

6969
The example data is contained a dictionary with some brief
7070
descriptions of famous people:
7171

72-
{{< clients-example set="home_vecsets" step="data" >}}
72+
{{< clients-example set="home_vecsets" step="data" lang_filter="Python" >}}
7373
{{< /clients-example >}}
7474

7575
## Add the data to a vector set
@@ -99,7 +99,7 @@ The call to `vadd()` also adds the `born` and `died` values from the
9999
original dictionary as attribute data. You can access this during a query
100100
or by using the [`vgetattr()`]({{< relref "/commands/vgetattr" >}}) method.
101101

102-
{{< clients-example set="home_vecsets" step="add_data" >}}
102+
{{< clients-example set="home_vecsets" step="add_data" lang_filter="Python" >}}
103103
{{< /clients-example >}}
104104

105105
## Query the vector set
@@ -112,7 +112,7 @@ of the set, ranked in order of similarity to the query.
112112

113113
Start with a simple query for "actors":
114114

115-
{{< clients-example set="home_vecsets" step="basic_query" >}}
115+
{{< clients-example set="home_vecsets" step="basic_query" lang_filter="Python" >}}
116116
{{< /clients-example >}}
117117

118118
This returns the following list of elements (formatted slightly for clarity):
@@ -131,7 +131,7 @@ on the information contained in the embedding model.
131131
You can use the `count` parameter of `vsim()` to limit the list of elements
132132
to just the most relevant few items:
133133

134-
{{< clients-example set="home_vecsets" step="limited_query" >}}
134+
{{< clients-example set="home_vecsets" step="limited_query" lang_filter="Python" >}}
135135
{{< /clients-example >}}
136136

137137
The reason for using text embeddings rather than simple text search
@@ -141,7 +141,7 @@ different. For example, the word "entertainer" doesn't appear in any of the
141141
descriptions but if you use it as a query, the actors and musicians are ranked
142142
highest in the results list:
143143

144-
{{< clients-example set="home_vecsets" step="entertainer_query" >}}
144+
{{< clients-example set="home_vecsets" step="entertainer_query" lang_filter="Python" >}}
145145
{{< /clients-example >}}
146146

147147
Similarly, if you use "science" as a query, you get the following results:
@@ -162,7 +162,7 @@ with `vsim()` to restrict the search further. For example,
162162
repeat the "science" query, but this time limit the results to people
163163
who died before the year 2000:
164164

165-
{{< clients-example set="home_vecsets" step="filtered_query" >}}
165+
{{< clients-example set="home_vecsets" step="filtered_query" lang_filter="Python" >}}
166166
{{< /clients-example >}}
167167

168168
Note that the boolean filter expression is applied to items in the list
Lines changed: 184 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,184 @@
1+
// EXAMPLE: home_vecsets
2+
// STEP_START import
3+
// Redis client (Lettuce) and vector set APIs
4+
import io.lettuce.core.RedisClient;
5+
import io.lettuce.core.RedisURI;
6+
import io.lettuce.core.api.StatefulRedisConnection;
7+
import io.lettuce.core.api.sync.RedisCommands;
8+
import io.lettuce.core.VAddArgs;
9+
import io.lettuce.core.VSimArgs;
10+
11+
// Tokenizer to generate vectors (kept consistent with HomeQueryVec.java)
12+
import ai.djl.huggingface.tokenizers.HuggingFaceTokenizer;
13+
14+
// Data & utils
15+
import java.util.*;
16+
// STEP_END
17+
18+
public class HomeVecSets {
19+
// Keep the same tokenizer model style as HomeQueryVec.java
20+
private static final int DIM = 768; // fixed dimension to pad/truncate token ids
21+
22+
// Helper: convert tokenizer ids to a fixed-length Double[] of size DIM
23+
private static Double[] idsToDoubleVector(long[] ids) {
24+
Double[] out = new Double[DIM];
25+
int n = Math.min(ids.length, DIM);
26+
for (int i = 0; i < n; i++) out[i] = (double) ids[i];
27+
for (int i = n; i < DIM; i++) out[i] = 0.0d; // pad
28+
return out;
29+
}
30+
31+
// Simple container for people data
32+
private static class Person {
33+
final int born;
34+
final int died;
35+
final String description;
36+
Person(int born, int died, String description) {
37+
this.born = born; this.died = died; this.description = description;
38+
}
39+
}
40+
41+
public static void main(String[] args) throws Exception {
42+
// STEP_START model
43+
// Tokenizer configured like HomeQueryVec.java (acts as a simple, deterministic vectorizer here)
44+
HuggingFaceTokenizer tokenizer = HuggingFaceTokenizer.newInstance(
45+
"sentence-transformers/all-mpnet-base-v2",
46+
Map.of("maxLength", String.valueOf(DIM), "modelMaxLength", String.valueOf(DIM))
47+
);
48+
// STEP_END
49+
50+
// STEP_START data
51+
Map<String, Person> peopleData = new LinkedHashMap<>();
52+
peopleData.put("Marie Curie", new Person(
53+
1867, 1934,
54+
"""
55+
Polish-French chemist and physicist. The only person ever to win
56+
two Nobel prizes for two different sciences.
57+
""".trim()
58+
));
59+
peopleData.put("Linus Pauling", new Person(
60+
1901, 1994,
61+
"""
62+
American chemist and peace activist. One of only two people to win two
63+
Nobel prizes in different fields (chemistry and peace).
64+
""".trim()
65+
));
66+
peopleData.put("Freddie Mercury", new Person(
67+
1946, 1991,
68+
"""
69+
British musician, best known as the lead singer of the rock band
70+
Queen.
71+
""".trim()
72+
));
73+
peopleData.put("Marie Fredriksson", new Person(
74+
1958, 2019,
75+
"""
76+
Swedish multi-instrumentalist, mainly known as the lead singer and
77+
keyboardist of the band Roxette.
78+
""".trim()
79+
));
80+
peopleData.put("Paul Erdos", new Person(
81+
1913, 1996,
82+
"""
83+
Hungarian mathematician, known for his eccentric personality almost
84+
as much as his contributions to many different fields of mathematics.
85+
""".trim()
86+
));
87+
peopleData.put("Maryam Mirzakhani", new Person(
88+
1977, 2017,
89+
"""
90+
Iranian mathematician. The first woman ever to win the Fields medal
91+
for her contributions to mathematics.
92+
""".trim()
93+
));
94+
peopleData.put("Masako Natsume", new Person(
95+
1957, 1985,
96+
"""
97+
Japanese actress. She was very famous in Japan but was primarily
98+
known elsewhere in the world for her portrayal of Tripitaka in the
99+
TV series Monkey.
100+
""".trim()
101+
));
102+
peopleData.put("Chaim Topol", new Person(
103+
1935, 2023,
104+
"""
105+
Israeli actor and singer, usually credited simply as 'Topol'. He was
106+
best known for his many appearances as Tevye in the musical Fiddler
107+
on the Roof.
108+
""".trim()
109+
));
110+
// STEP_END
111+
112+
// STEP_START add_data
113+
RedisClient client = RedisClient.create(RedisURI.Builder.redis("localhost", 6379).build());
114+
StatefulRedisConnection<String, String> conn = null;
115+
try {
116+
conn = client.connect();
117+
RedisCommands<String, String> cmd = conn.sync();
118+
119+
for (Map.Entry<String, Person> e : peopleData.entrySet()) {
120+
String name = e.getKey();
121+
Person p = e.getValue();
122+
123+
// Vector from description
124+
Double[] vec = idsToDoubleVector(tokenizer.encode(p.description).getIds());
125+
126+
// Add with attributes using VADD (vector sets API)
127+
VAddArgs addArgs = new VAddArgs()
128+
.attributes(String.format("{\"born\": %d, \"died\": %d}", p.born, p.died));
129+
130+
// Create set and add element + vector in one call
131+
Boolean added = cmd.vadd("famousPeople", name, addArgs, vec);
132+
if (Boolean.FALSE.equals(added)) {
133+
// If element exists, you could update attributes via vsetattr
134+
cmd.vsetattr("famousPeople", name, String.format("{\"born\": %d, \"died\": %d}", p.born, p.died));
135+
}
136+
}
137+
} finally {
138+
if (conn != null) conn.close();
139+
client.shutdown();
140+
}
141+
// STEP_END
142+
143+
// Reconnect for queries (explicitly, to mirror example flow)
144+
client = RedisClient.create(RedisURI.Builder.redis("localhost", 6379).build());
145+
try (StatefulRedisConnection<String, String> qconn = client.connect()) {
146+
RedisCommands<String, String> q = qconn.sync();
147+
148+
// STEP_START basic_query
149+
String queryValue = "actors";
150+
List<String> actors = q.vsim("famousPeople", idsToDoubleVector(tokenizer.encode(queryValue).getIds()));
151+
System.out.println("'actors': " + String.join(", ", actors));
152+
// STEP_END
153+
154+
// STEP_START limited_query
155+
queryValue = "actors";
156+
VSimArgs twoCount = new VSimArgs().count(2L);
157+
List<String> twoActors = q.vsim("famousPeople", twoCount, idsToDoubleVector(tokenizer.encode(queryValue).getIds()));
158+
System.out.println("'actors (2)': " + String.join(", ", twoActors));
159+
// >>> 'actors (2)': Masako Natsume, Chaim Topol
160+
// STEP_END
161+
162+
// STEP_START entertainer_query
163+
queryValue = "entertainer";
164+
List<String> entertainers = q.vsim("famousPeople", idsToDoubleVector(tokenizer.encode(queryValue).getIds()));
165+
System.out.println("'entertainer': " + String.join(", ", entertainers));
166+
// >>> 'entertainer': Chaim Topol, Freddie Mercury, Marie Fredriksson, ...
167+
// STEP_END
168+
169+
queryValue = "science";
170+
List<String> science = q.vsim("famousPeople", idsToDoubleVector(tokenizer.encode(queryValue).getIds()));
171+
System.out.println("'science': " + String.join(", ", science));
172+
173+
// STEP_START filtered_query
174+
queryValue = "science";
175+
VSimArgs filtered = new VSimArgs().filter(".died < 2000");
176+
List<String> science2000 = q.vsim("famousPeople", filtered, idsToDoubleVector(tokenizer.encode(queryValue).getIds()));
177+
System.out.println("'science2000': " + String.join(", ", science2000));
178+
// STEP_END
179+
} finally {
180+
client.shutdown();
181+
}
182+
}
183+
}
184+

0 commit comments

Comments
 (0)