@@ -66,19 +66,8 @@ go get github.com/henomis/lingoose/embedder/huggingface
66
66
67
67
Add the following imports to your module's main program file:
68
68
69
- ``` go
70
- package main
71
-
72
- import (
73
- " context"
74
- " encoding/binary"
75
- " fmt"
76
- " math"
77
-
78
- huggingfaceembedder " github.com/henomis/lingoose/embedder/huggingface"
79
- " github.com/redis/go-redis/v9"
80
- )
81
- ```
69
+ {{< clients-example set="home_query_vec" step="import" lang_filter="Go" >}}
70
+ {{< /clients-example >}}
82
71
83
72
You must also create a [ HuggingFace account] ( https://huggingface.co/join )
84
73
and add a new access token to use the embedding model. See the
@@ -96,18 +85,8 @@ must convert this array to a `byte` string before adding it as a hash field.
96
85
The function shown below uses Go's [ ` binary ` ] ( https://pkg.go.dev/encoding/binary )
97
86
package to produce the ` byte ` string:
98
87
99
- ``` go
100
- func floatsToBytes (fs []float32 ) []byte {
101
- buf := make ([]byte , len (fs)*4 )
102
-
103
- for i , f := range fs {
104
- u := math.Float32bits (f)
105
- binary.NativeEndian .PutUint32 (buf[i*4 :], u)
106
- }
107
-
108
- return buf
109
- }
110
- ```
88
+ {{< clients-example set="home_query_vec" step="helper" lang_filter="Go" >}}
89
+ {{< /clients-example >}}
111
90
112
91
Note that if you are using [ JSON] ({{< relref "/develop/data-types/json" >}})
113
92
objects to store your documents instead of hashes, then you should store
@@ -120,22 +99,8 @@ below).
120
99
In the ` main() ` function, connect to Redis and delete any index previously
121
100
created with the name ` vector_idx ` :
122
101
123
- ``` go
124
- ctx := context.Background ()
125
- rdb := redis.NewClient (&redis.Options {
126
- Addr : " localhost:6379" ,
127
- Password : " " , // no password docs
128
- DB : 0 , // use default DB
129
- Protocol : 2 ,
130
- })
131
-
132
- rdb.FTDropIndexWithArgs (ctx,
133
- " vector_idx" ,
134
- &redis.FTDropIndexOptions {
135
- DeleteDocs : true ,
136
- },
137
- )
138
- ```
102
+ {{< clients-example set="home_query_vec" step="connect" lang_filter="Go" >}}
103
+ {{< /clients-example >}}
139
104
140
105
Next, create the index.
141
106
The schema in the example below specifies hash objects for storage and includes
@@ -149,38 +114,8 @@ indexing, the
149
114
vector distance metric, ` Float32 ` values to represent the vector's components,
150
115
and 384 dimensions, as required by the ` all-MiniLM-L6-v2 ` embedding model.
151
116
152
- ``` go
153
- _ , err := rdb.FTCreate (ctx,
154
- " vector_idx" ,
155
- &redis.FTCreateOptions {
156
- OnHash : true ,
157
- Prefix : []any{" doc:" },
158
- },
159
- &redis.FieldSchema {
160
- FieldName : " content" ,
161
- FieldType : redis.SearchFieldTypeText ,
162
- },
163
- &redis.FieldSchema {
164
- FieldName : " genre" ,
165
- FieldType : redis.SearchFieldTypeTag ,
166
- },
167
- &redis.FieldSchema {
168
- FieldName : " embedding" ,
169
- FieldType : redis.SearchFieldTypeVector ,
170
- VectorArgs : &redis.FTVectorArgs {
171
- HNSWOptions: &redis.FTHNSWOptions {
172
- Dim: 384 ,
173
- DistanceMetric: " L2" ,
174
- Type: " FLOAT32" ,
175
- },
176
- },
177
- },
178
- ).Result ()
179
-
180
- if err != nil {
181
- panic (err)
182
- }
183
- ```
117
+ {{< clients-example set="home_query_vec" step="create_index" lang_filter="Go" >}}
118
+ {{< /clients-example >}}
184
119
185
120
## Create an embedder instance
186
121
@@ -190,11 +125,8 @@ instance that uses the `sentence-transformers/all-MiniLM-L6-v2`
190
125
model, passing your HuggingFace access token to the ` WithToken() `
191
126
method.
192
127
193
- ``` go
194
- hf := huggingfaceembedder.New ().
195
- WithToken (" <your-access-token>" ).
196
- WithModel (" sentence-transformers/all-MiniLM-L6-v2" )
197
- ```
128
+ {{< clients-example set="home_query_vec" step="embedder" lang_filter="Go" >}}
129
+ {{< /clients-example >}}
198
130
199
131
## Add data
200
132
@@ -210,44 +142,8 @@ Use the `ToFloat32()` method of `Embedding` to produce the array of float
210
142
values that we need, and use the ` floatsToBytes() ` function we defined
211
143
above to convert this array to a ` byte ` string.
212
144
213
- ``` go
214
- sentences := []string {
215
- " That is a very happy person" ,
216
- " That is a happy dog" ,
217
- " Today is a sunny day" ,
218
- }
219
-
220
- tags := []string {
221
- " persons" , " pets" , " weather" ,
222
- }
223
-
224
- embeddings , err := hf.Embed (ctx, sentences)
225
-
226
- if err != nil {
227
- panic (err)
228
- }
229
-
230
- for i , emb := range embeddings {
231
- buffer := floatsToBytes (emb.ToFloat32 ())
232
-
233
- if err != nil {
234
- panic (err)
235
- }
236
-
237
- _, err = rdb.HSet (ctx,
238
- fmt.Sprintf (" doc:%v " , i),
239
- map [string ]any{
240
- " content" : sentences[i],
241
- " genre" : tags[i],
242
- " embedding" : buffer,
243
- },
244
- ).Result ()
245
-
246
- if err != nil {
247
- panic (err)
248
- }
249
- }
250
- ```
145
+ {{< clients-example set="home_query_vec" step="add_data" lang_filter="Go" >}}
146
+ {{< /clients-example >}}
251
147
252
148
## Run a query
253
149
@@ -263,47 +159,8 @@ the indexing, and passes it as a parameter when the query executes
263
159
[ Vector search] ({{< relref "/develop/ai/search-and-query/query/vector-search" >}})
264
160
for more information about using query parameters with embeddings).
265
161
266
- ``` go
267
- queryEmbedding , err := hf.Embed (ctx, []string {
268
- " That is a happy person" ,
269
- })
270
-
271
- if err != nil {
272
- panic (err)
273
- }
274
-
275
- buffer := floatsToBytes (queryEmbedding[0 ].ToFloat32 ())
276
-
277
- if err != nil {
278
- panic (err)
279
- }
280
-
281
- results , err := rdb.FTSearchWithArgs (ctx,
282
- " vector_idx" ,
283
- " *=>[KNN 3 @embedding $vec AS vector_distance]" ,
284
- &redis.FTSearchOptions {
285
- Return : []redis.FTSearchReturn {
286
- {FieldName: " vector_distance" },
287
- {FieldName: " content" },
288
- },
289
- DialectVersion : 2 ,
290
- Params : map [string ]any{
291
- " vec" : buffer,
292
- },
293
- },
294
- ).Result ()
295
-
296
- if err != nil {
297
- panic (err)
298
- }
299
-
300
- for _ , doc := range results.Docs {
301
- fmt.Printf (
302
- " ID: %v , Distance:%v , Content:'%v '\n " ,
303
- doc.ID , doc.Fields [" vector_distance" ], doc.Fields [" content" ],
304
- )
305
- }
306
- ```
162
+ {{< clients-example set="home_query_vec" step="query" lang_filter="Go" >}}
163
+ {{< /clients-example >}}
307
164
308
165
The code is now ready to run, but note that it may take a while to complete when
309
166
you run it for the first time (which happens because ` huggingfacetransformer `
@@ -334,37 +191,8 @@ every query. Also, you must set `OnJSON` to `true` when you create the index.
334
191
The code below shows these differences, but the index is otherwise very similar to
335
192
the one created previously for hashes:
336
193
337
- ``` go
338
- _, err = rdb.FTCreate (ctx,
339
- " vector_json_idx" ,
340
- &redis.FTCreateOptions {
341
- OnJSON : true ,
342
- Prefix : []any{" jdoc:" },
343
- },
344
- &redis.FieldSchema {
345
- FieldName : " $.content" ,
346
- As : " content" ,
347
- FieldType : redis.SearchFieldTypeText ,
348
- },
349
- &redis.FieldSchema {
350
- FieldName : " $.genre" ,
351
- As : " genre" ,
352
- FieldType : redis.SearchFieldTypeTag ,
353
- },
354
- &redis.FieldSchema {
355
- FieldName : " $.embedding" ,
356
- As : " embedding" ,
357
- FieldType : redis.SearchFieldTypeVector ,
358
- VectorArgs : &redis.FTVectorArgs {
359
- HNSWOptions: &redis.FTHNSWOptions {
360
- Dim: 384 ,
361
- DistanceMetric: " L2" ,
362
- Type: " FLOAT32" ,
363
- },
364
- },
365
- },
366
- ).Result ()
367
- ```
194
+ {{< clients-example set="home_query_vec" step="json_index" lang_filter="Go" >}}
195
+ {{< /clients-example >}}
368
196
369
197
Use [ ` JSONSet() ` ] ({{< relref "/commands/json.set" >}}) to add the data
370
198
instead of [ ` HSet() ` ] ({{< relref "/commands/hset" >}}). The maps
@@ -375,23 +203,8 @@ specified using lists instead of binary strings. The loop below is similar
375
203
to the one used previously to add the hash data, but it doesn't use the
376
204
` floatsToBytes() ` function to encode the ` float32 ` array.
377
205
378
- ``` go
379
- for i , emb := range embeddings {
380
- _, err = rdb.JSONSet (ctx,
381
- fmt.Sprintf (" jdoc:%v " , i),
382
- " $" ,
383
- map [string ]any{
384
- " content" : sentences[i],
385
- " genre" : tags[i],
386
- " embedding" : emb.ToFloat32 (),
387
- },
388
- ).Result ()
389
-
390
- if err != nil {
391
- panic (err)
392
- }
393
- }
394
- ```
206
+ {{< clients-example set="home_query_vec" step="json_data" lang_filter="Go" >}}
207
+ {{< /clients-example >}}
395
208
396
209
The query is almost identical to the one for the hash documents. This
397
210
demonstrates how the right choice of aliases for the JSON paths can
@@ -400,32 +213,8 @@ is that the vector parameter for the query is still specified as a
400
213
binary string (using the ` floatsToBytes() ` method), even though the data for
401
214
the ` embedding ` field of the JSON was specified as an array.
402
215
403
- ``` go
404
- jsonQueryEmbedding , err := hf.Embed (ctx, []string {
405
- " That is a happy person" ,
406
- })
407
-
408
- if err != nil {
409
- panic (err)
410
- }
411
-
412
- jsonBuffer := floatsToBytes (jsonQueryEmbedding[0 ].ToFloat32 ())
413
-
414
- jsonResults , err := rdb.FTSearchWithArgs (ctx,
415
- " vector_json_idx" ,
416
- " *=>[KNN 3 @embedding $vec AS vector_distance]" ,
417
- &redis.FTSearchOptions {
418
- Return : []redis.FTSearchReturn {
419
- {FieldName: " vector_distance" },
420
- {FieldName: " content" },
421
- },
422
- DialectVersion : 2 ,
423
- Params : map [string ]any{
424
- " vec" : jsonBuffer,
425
- },
426
- },
427
- ).Result ()
428
- ```
216
+ {{< clients-example set="home_query_vec" step="json_query" lang_filter="Go" >}}
217
+ {{< /clients-example >}}
429
218
430
219
Apart from the ` jdoc: ` prefixes for the keys, the result from the JSON
431
220
query is the same as for hash:
0 commit comments