@@ -31,6 +31,9 @@ of their meaning.
3131The example below uses the [ HuggingFace] ( https://huggingface.co/ ) model
3232[ ` all-MiniLM-L6-v2 ` ] ( https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2 )
3333to generate the vector embeddings to store and index with Redis Query Engine.
34+ The code is first demonstrated for hash documents with a
35+ separate section to explain the
36+ [ differences with JSON documents] ( #differences-with-json-documents ) .
3437
3538## Initialize
3639
@@ -155,7 +158,8 @@ array of `float` values. Note that if you are using
155158[ JSON] ({{< relref "/develop/data-types/json" >}})
156159objects to store your documents instead of hashes, then you should store
157160the ` float ` array directly without first converting it to a binary
158- string.
161+ string (see [ Differences with JSON documents] ( #differences-with-json-documents )
162+ below).
159163
160164``` php
161165$content = "That is a very happy person";
@@ -264,6 +268,134 @@ document)
264268is the result judged to be most similar in meaning to the query text
265269* "That is a happy person"* .
266270
271+ ## Differences with JSON documents
272+
273+ Indexing JSON documents is similar to hash indexing, but there are some
274+ important differences. JSON allows much richer data modeling with nested fields, so
275+ you must supply a [ path] ({{< relref "/develop/data-types/json/path" >}}) in the schema
276+ to identify each field you want to index. However, you can declare a short alias for each
277+ of these paths to avoid typing it in full for
278+ every query. Also, you must specify ` JSON ` with the ` on() ` option when you create the index.
279+
280+ The code below shows these differences, but the index is otherwise very similar to
281+ the one created previously for hashes:
282+
283+ ``` php
284+ $jsonSchema = [
285+ new TextField("$.content", "content"),
286+ new TagField("$.genre", "genre"),
287+ new VectorField(
288+ "$.embedding",
289+ "HNSW",
290+ [
291+ "TYPE", "FLOAT32",
292+ "DIM", 384,
293+ "DISTANCE_METRIC", "L2"
294+ ],
295+ "embedding",
296+ )
297+ ];
298+
299+ $client->ftcreate("vector_json_idx", $jsonSchema,
300+ (new CreateArguments())
301+ ->on('JSON')
302+ ->prefix(["jdoc:"])
303+ );
304+ ```
305+
306+ Use [ ` jsonset() ` ] ({{< relref "/commands/json.set" >}}) to add the data
307+ instead of [ ` hmset() ` ] ({{< relref "/commands/hset" >}}). The arrays
308+ that specify the fields have roughly the same structure as the ones used for
309+ ` hmset() ` but you should use the standard library function
310+ [ ` json_encode() ` ] ( https://www.php.net/manual/en/function.json-encode.php )
311+ to generate a JSON string representation of the array.
312+
313+ An important difference with JSON indexing is that the vectors are
314+ specified using arrays instead of binary strings. Simply add the
315+ embedding as an array field without using the ` pack() ` function as you
316+ would with a hash.
317+
318+ ``` php
319+ $content = "That is a very happy person";
320+ $emb = $extractor($content, normalize: true, pooling: 'mean');
321+
322+ $client->jsonset("jdoc:0", "$",
323+ json_encode(
324+ [
325+ "content" => $content,
326+ "genre" => "persons",
327+ "embedding" => $emb[0]
328+ ],
329+ JSON_THROW_ON_ERROR
330+ )
331+ );
332+
333+ $content = "That is a happy dog";
334+ $emb = $extractor($content, normalize: true, pooling: 'mean');
335+
336+ $client->jsonset("jdoc:1","$",
337+ json_encode(
338+ [
339+ "content" => $content,
340+ "genre" => "pets",
341+ "embedding" => $emb[0]
342+ ],
343+ JSON_THROW_ON_ERROR
344+ )
345+ );
346+
347+ $content = "Today is a sunny day";
348+ $emb = $extractor($content, normalize: true, pooling: 'mean');
349+
350+ $client->jsonset("jdoc:2", "$",
351+ json_encode(
352+ [
353+ "content" => $content,
354+ "genre" => "weather",
355+ "embedding" => $emb[0]
356+ ],
357+ JSON_THROW_ON_ERROR
358+ )
359+ );
360+ ```
361+
362+ The query is almost identical to the one for the hash documents. This
363+ demonstrates how the right choice of aliases for the JSON paths can
364+ save you having to write complex queries. An important thing to notice
365+ is that the vector parameter for the query is still specified as a
366+ binary string (using the ` pack() ` function), even though the data for
367+ the ` embedding ` field of the JSON was specified as an array.
368+
369+ ``` php
370+ $queryText = "That is a happy person";
371+ $queryEmb = $extractor($queryText, normalize: true, pooling: 'mean');
372+
373+ $result = $client->ftsearch(
374+ "vector_json_idx",
375+ '*=>[KNN 3 @embedding $vec AS vector_distance]',
376+ new SearchArguments()
377+ ->addReturn(1, "vector_distance")
378+ ->dialect("2")
379+ ->params([
380+ "vec", pack('g*', ...$queryEmb[0])
381+ ])
382+ ->sortBy("vector_distance")
383+ );
384+ ```
385+
386+ Apart from the ` jdoc: ` prefixes for the keys, the result from the JSON
387+ query is the same as for hash:
388+
389+ ```
390+ Number of results: 3
391+ Key: jdoc:0
392+ Field: vector_distance, Value: 3.76152896881
393+ Key: jdoc:1
394+ Field: vector_distance, Value: 18.6544265747
395+ Key: jdoc:2
396+ Field: vector_distance, Value: 44.6189727783
397+ ```
398+
267399## Learn more
268400
269401See
0 commit comments