@@ -32,7 +32,9 @@ In the example below, we use [Microsoft.ML](https://dotnet.microsoft.com/en-us/a
3232to generate the vector embeddings to store and index with Redis Query Engine.
3333We also show how to adapt the code to use
3434[ Azure OpenAI] ( https://learn.microsoft.com/en-us/azure/ai-services/openai/how-to/embeddings?tabs=csharp )
35- for the embeddings.
35+ for the embeddings. The code is first demonstrated for hash documents with a
36+ separate section to explain the
37+ [ differences with JSON documents] ( #differences-with-json-documents ) .
3638
3739## Initialize
3840
@@ -89,7 +91,6 @@ using Azure;
8991using Azure .AI .OpenAI ;
9092```
9193
92-
9394## Define a function to obtain the embedding model
9495
9596{{< note >}}Ignore this step if you are using an Azure OpenAI
@@ -154,7 +155,9 @@ array as a `byte` string. To simplify this, we declare a
154155then encodes the returned ` float ` array as a ` byte ` string. If you are
155156storing your documents as JSON objects instead of hashes, then you should
156157use the ` float ` array for the embedding directly, without first converting
157- it to a ` byte ` string.
158+ it to a ` byte ` string (see [ Differences with JSON documents] ( #differences-with-json-documents )
159+ below).
160+
158161
159162``` csharp
160163static byte [] GetEmbedding (
@@ -414,6 +417,156 @@ As you would expect, the result for `doc:1` with the content text
414417is the result that is most similar in meaning to the query text
415418*"That is a happy person "*.
416419
420+ ## Differences with JSON documents
421+
422+ Indexing JSON documents is similar to hash indexing , but there are some
423+ important differences . JSON allows much richer data modeling with nested fields , so
424+ you must supply a [path ]({{< relref " /develop/data-types/json/path" > }}) in the schema
425+ to identify each field you want to index . However , you can declare a short alias for each
426+ of these paths to avoid typing it in full for
427+ every query . Also , you must specify `IndexType .JSON ` with the `On ()` option when you
428+ create the index .
429+
430+ The code below shows these differences , but the index is otherwise very similar to
431+ the one created previously for hashes :
432+
433+ ```cs
434+ var jsonSchema = new Schema ()
435+ .AddTextField (new FieldName (" $.content" , " content" ))
436+ .AddTagField (new FieldName (" $.genre" , " genre" ))
437+ .AddVectorField (
438+ new FieldName (" $.embedding" , " embedding" ),
439+ VectorField .VectorAlgo .HNSW ,
440+ new Dictionary <string , object >()
441+ {
442+ [" TYPE" ] = " FLOAT32" ,
443+ [" DIM" ] = " 150" ,
444+ [" DISTANCE_METRIC" ] = " L2"
445+ }
446+ );
447+
448+
449+ db .FT ().Create (
450+ " vector_json_idx" ,
451+ new FTCreateParams ()
452+ .On (IndexDataType .JSON )
453+ .Prefix (" jdoc:" ),
454+ jsonSchema
455+ );
456+ ```
457+
458+ An important difference with JSON indexing is that the vectors are
459+ specified using arrays of `float ` instead of binary strings . This requires a modification
460+ to the `GetEmbedding ()` function declared in
461+ [Define a function to generate an embedding ](#define - a - function - to - generate - an - embedding )
462+ above :
463+
464+ ```cs
465+ static float [] GetFloatEmbedding (
466+ PredictionEngine < TextData , TransformedTextData > model , string sentence
467+ )
468+ {
469+ // Call the prediction API to convert the text into embedding vector.
470+ var data = new TextData ()
471+ {
472+ Text = sentence
473+ };
474+
475+ var prediction = model .Predict (data );
476+
477+ float [] floatArray = Array .ConvertAll (prediction .Features , x => (float )x );
478+ return floatArray ;
479+ }
480+ ```
481+
482+ You should make a similar modification to the `GetEmbeddingFromAzure ()` function
483+ if you are using Azure OpenAI with JSON .
484+
485+ Use [`JSON ().set ()`]({{< relref " /commands/json.set" > }}) to add the data
486+ instead of [`HashSet ()`]({{< relref " /commands/hset" > }}):
487+
488+ ```cs
489+ var jSentence1 = " That is a very happy person" ;
490+
491+ var jdoc1 = new {
492+ content = jSentence1 ,
493+ genre = " persons" ,
494+ embedding = GetFloatEmbedding (predEngine , jSentence1 ),
495+ };
496+
497+ db .JSON ().Set (" jdoc:1" , " $" , jdoc1 );
498+
499+ var jSentence2 = " That is a happy dog" ;
500+
501+ var jdoc2 = new {
502+ content = jSentence2 ,
503+ genre = " pets" ,
504+ embedding = GetFloatEmbedding (predEngine , jSentence2 ),
505+ };
506+
507+ db .JSON ().Set (" jdoc:2" , " $" , jdoc2 );
508+
509+ var jSentence3 = " Today is a sunny day" ;
510+
511+ var jdoc3 = new {
512+ content = jSentence3 ,
513+ genre = " weather" ,
514+ embedding = GetFloatEmbedding (predEngine , jSentence3 ),
515+ };
516+
517+ db .JSON ().Set (" jdoc:3" , " $" , jdoc3 );
518+ ```
519+
520+ The query is almost identical to the one for the hash documents . This
521+ demonstrates how the right choice of aliases for the JSON paths can
522+ save you having to write complex queries . The only significant difference is
523+ that the `FieldName ` objects created for the `ReturnFields ()` option must
524+ include the JSON path for the field .
525+
526+ An important thing to notice
527+ is that the vector parameter for the query is still specified as a
528+ binary string (using the `GetEmbedding ()` method ), even though the data for
529+ the `embedding ` field of the JSON was specified as a `float ` array .
530+
531+ ```cs
532+ var jRes = db .FT ().Search (" vector_json_idx" ,
533+ new Query (" *=>[KNN 3 @embedding $query_vec AS score]" )
534+ .AddParam (" query_vec" , GetEmbedding (predEngine , " That is a happy person" ))
535+ .ReturnFields (
536+ new FieldName (" $.content" , " content" ),
537+ new FieldName (" $.score" , " score" )
538+ )
539+ .SetSortBy (" score" )
540+ .Dialect (2 ));
541+
542+ foreach (var doc in jRes .Documents ) {
543+ var props = doc .GetProperties ();
544+ var propText = string .Join (
545+ " , " ,
546+ props .Select (p => $" {p .Key }: '{p .Value }'" )
547+ );
548+
549+ Console .WriteLine (
550+ $" ID: {doc .Id }, Properties: [\n {propText }\n ]"
551+ );
552+ }
553+ ```
554+
555+ Apart from the `jdoc : ` prefixes for the keys , the result from the JSON
556+ query is the same as for hash :
557+
558+ ```
559+ ID : jdoc : 1 , Properties : [
560+ score : '4.30777168274' , content : 'That is a very happy person'
561+ ]
562+ ID : jdoc : 2 , Properties : [
563+ score : '25.9752807617' , content : 'That is a happy dog'
564+ ]
565+ ID : jdoc : 3 , Properties : [
566+ score : '68.8638000488' , content : 'Today is a sunny day'
567+ ]
568+ ```
569+
417570## Learn more
418571
419572See
0 commit comments