Skip to content

Commit 14c09c8

Browse files
DOC-5557 fixes to text
1 parent 6e7e920 commit 14c09c8

File tree

1 file changed

+19
-14
lines changed

1 file changed

+19
-14
lines changed

content/develop/clients/lettuce/vecsearch.md

Lines changed: 19 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -94,13 +94,15 @@ shown below does this for you:
9494
{{< clients-example set="home_query_vec" step="helper_method" lang_filter="Java-Async,Java-Reactive" >}}
9595
{{< /clients-example >}}
9696

97-
## Create a embedding model instance
97+
## Create an embedding model instance
9898

9999
The example below uses the
100100
[`all-MiniLM-L6-v2`](https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2)
101-
model to generate the embeddings. The vectors that represent the
102-
embeddings have 384 components, regardless of the length of the input
103-
text.
101+
model to generate the embeddings. This model generates vectors with 384 dimensions, regardless of the length of the input text, but note that the input is truncated to 256
102+
tokens (see
103+
[Word piece tokenization](https://huggingface.co/learn/nlp-course/en/chapter6/6)
104+
at the [Hugging Face](https://huggingface.co/) docs to learn more about the way tokens
105+
are related to the original text).
104106

105107
The [`Predictor`](https://javadoc.io/doc/ai.djl/api/latest/ai/djl/inference/Predictor.html)
106108
class implements the model to generate the embeddings. The code below
@@ -112,15 +114,18 @@ creates an instance of `Predictor` that uses the `all-MiniLM-L6-v2` model:
112114
## Create the index
113115

114116
As noted in [Define a helper method](#define-a-helper-method) above, you must
115-
pass the embeddings to the hash and query commands as a binary string. Lettuce lets
116-
you specify a `ByteBufferCodec` for the connection to Redis, which lets you construct
117-
the binary strings for Redis keys and values conveniently using
117+
pass the embeddings to the hash and query commands as a binary string.
118+
119+
Lettuce has an option to specify a `ByteBufferCodec` for the connection to Redis.
120+
This lets you construct binary strings for Redis keys and values conveniently using
118121
the standard
119122
[`ByteBuffer`](https://docs.oracle.com/javase/8/docs/api/java/nio/ByteBuffer.html)
120123
class (see [Codecs](https://redis.github.io/lettuce/integration-extension/#codecs)
121-
in the Lettuce documentation for more information). However, it is more convenient
122-
to use the default `StringCodec` for commands that don't require binary strings.
123-
The code below shows how to declare both connections in the try-with-resources
124+
in the Lettuce documentation for more information). However, you will probably find
125+
it more convenient to use the default `StringCodec` for commands that don't require binary strings. It is therefore helpful to have two connections available, one using `ByteBufferCodec` and one using `StringCodec`.
126+
127+
The code below shows how to declare one connection with the
128+
`ByteBufferCodec` and another without in the try-with-resources
124129
block. You also need two separate instances of `RedisAsyncCommands` to
125130
use the two connections:
126131

@@ -143,7 +148,7 @@ vector distance metric, `Float32` values to represent the vector's components,
143148
and 384 dimensions, as required by the `all-MiniLM-L6-v2` embedding model.
144149

145150
The `CreateArgs` object specifies hash objects for storage and a
146-
prefix `doc:` that identifies the hash objects we want to index.
151+
prefix `doc:` that identifies the hash objects to index.
147152

148153
{{< clients-example set="home_query_vec" step="create_index" lang_filter="Java-Async,Java-Reactive" >}}
149154
{{< /clients-example >}}
@@ -158,13 +163,13 @@ Use the `predict()` method of the `Predictor` object
158163
as shown below to create the embedding that represents the `content` field
159164
and use the `floatArrayToByteBuffer()` helper method to convert it to a binary string.
160165
Use the binary string representation when you are
161-
indexing hash objects (as we are here), but use an array of `float` for
166+
indexing hash objects, but use an array of `float` for
162167
JSON objects (see [Differences with JSON objects](#differences-with-json-documents)
163168
below).
164169

165170
You must use instances of `Map<ByteBuffer, ByteBuffer>` to supply the data to `hset()`
166171
when using the `ByteBufferCodec` connection, which adds a little complexity. Note
167-
that the `predict()` method is in a `try`/`catch` block because it can throw
172+
that the `predict()` call is in a `try`/`catch` block because it will throw
168173
exceptions if it can't download the embedding model (you should add code to handle
169174
the exceptions in production code).
170175

@@ -228,7 +233,7 @@ the one created previously for hashes:
228233
An important difference with JSON indexing is that the vectors are
229234
specified using arrays of `float` instead of binary strings. This means
230235
you don't need to use the `ByteBufferCodec` connection, and you can use
231-
`Arrays.toString()` to convert the `float` array to a suitable JSON string.
236+
[`Arrays.toString()`](https://docs.oracle.com/javase/8/docs/api/java/util/Arrays.html#toString-float:A-) to convert the `float` array to a suitable JSON string.
232237

233238
Use [`jsonSet()`]({{< relref "/commands/json.set" >}}) to add the data
234239
instead of [`hset()`]({{< relref "/commands/hset" >}}). Use instances

0 commit comments

Comments
 (0)