@@ -29,9 +29,8 @@ or JSON fields, Redis can retrieve documents that closely match the query in ter
29
29
of their meaning.
30
30
31
31
In the example below, we use the
32
- [ ` huggingfaceembedder ` ] ( https://pkg.go.dev/github.com/henomis/[email protected] /embedder/huggingface )
33
- package from the
[ ` LinGoose ` ] ( https://pkg.go.dev/github.com/henomis/[email protected] )
34
- framework to generate vector embeddings to store and index with
32
+ [ ` Hugot ` ] ( https://pkg.go.dev/github.com/knights-analytics/hugot )
33
+ library to generate vector embeddings to store and index with
35
34
Redis Query Engine. The code is first demonstrated for hash documents with a
36
35
separate section to explain the
37
36
[ differences with JSON documents] ( #differences-with-json-documents ) .
@@ -47,38 +46,23 @@ for more information.
47
46
48
47
## Initialize
49
48
50
- Start a new Go module with the following command:
51
-
52
- ``` bash
53
- go mod init vecexample
54
- ```
55
-
56
- Then, in your module folder, install
57
- [ ` go-redis ` ] ({{< relref "/develop/clients/go" >}})
58
- and the
59
- [ ` huggingfaceembedder ` ] ( https://pkg.go.dev/github.com/henomis/[email protected] /embedder/huggingface )
60
- package:
49
+ First, install [ ` go-redis ` ] ({{< relref "/develop/clients/go" >}})
50
+ if you haven't already done so. Then, install
51
+ [ ` Hugot ` ] ( https://pkg.go.dev/github.com/knights-analytics/hugot )
52
+ using the following command:
61
53
62
54
``` bash
63
- go get github.com/redis/go-redis/v9
64
- go get github.com/henomis/lingoose/embedder/huggingface
55
+ go get github.com/knights-analytics/hugot
65
56
```
66
57
67
58
Add the following imports to your module's main program file:
68
59
69
60
{{< clients-example set="home_query_vec" step="import" lang_filter="Go" >}}
70
61
{{< /clients-example >}}
71
62
72
- You must also create a [ HuggingFace account] ( https://huggingface.co/join )
73
- and add a new access token to use the embedding model. See the
74
- [ HuggingFace] ( https://huggingface.co/docs/hub/en/security-tokens )
75
- docs to learn how to create and manage access tokens. Note that the
76
- account and the ` all-MiniLM-L6-v2 ` model that we will use to produce
77
- the embeddings for this example are both available for free.
78
-
79
63
## Add a helper function
80
64
81
- The ` huggingfaceembedder ` model outputs the embeddings as a
65
+ The ` Hugot ` model outputs the embeddings as a
82
66
` []float32 ` array. If you are storing your documents as
83
67
[ hash] ({{< relref "/develop/data-types/hashes" >}}) objects, then you
84
68
must convert this array to a ` byte ` string before adding it as a hash field.
@@ -119,11 +103,10 @@ and 384 dimensions, as required by the `all-MiniLM-L6-v2` embedding model.
119
103
120
104
## Create an embedder instance
121
105
122
- You need an instance of the ` huggingfaceembedder ` class to
106
+ You need an instance of the ` FeatureExtractionPipeline ` class to
123
107
generate the embeddings. Use the code below to create an
124
108
instance that uses the ` sentence-transformers/all-MiniLM-L6-v2 `
125
- model, passing your HuggingFace access token to the ` WithToken() `
126
- method.
109
+ model:
127
110
128
111
{{< clients-example set="home_query_vec" step="embedder" lang_filter="Go" >}}
129
112
{{< /clients-example >}}
@@ -134,12 +117,12 @@ You can now supply the data objects, which will be indexed automatically
134
117
when you add them with [ ` HSet() ` ] ({{< relref "/commands/hset" >}}), as long as
135
118
you use the ` doc: ` prefix specified in the index definition.
136
119
137
- Use the ` Embed ()` method of ` huggingfacetransformer `
120
+ Use the ` RunPipeline ()` method of ` FeatureExtractionPipeline `
138
121
as shown below to create the embeddings that represent the ` content ` fields.
139
122
This method takes an array of strings and outputs a corresponding
140
- array of ` Embedding ` objects.
141
- Use the ` ToFloat32() ` method of ` Embedding ` to produce the array of float
142
- values that we need, and use the ` floatsToBytes() ` function we defined
123
+ array of ` FeatureExtractionOutput ` objects.
124
+ The ` Embeddings ` field of ` FeatureExtractionOutput ` contains the array of float
125
+ values that you need for the index. Use the ` floatsToBytes() ` function defined
143
126
above to convert this array to a ` byte ` string.
144
127
145
128
{{< clients-example set="home_query_vec" step="add_data" lang_filter="Go" >}}
@@ -153,7 +136,7 @@ text. Redis calculates the similarity between the query vector and each
153
136
embedding vector in the index as it runs the query. It then ranks the
154
137
results in order of this numeric similarity value.
155
138
156
- The code below creates the query embedding using ` Embed ()` , as with
139
+ The code below creates the query embedding using ` RunPipeline ()` , as with
157
140
the indexing, and passes it as a parameter when the query executes
158
141
(see
159
142
[ Vector search] ({{< relref "/develop/ai/search-and-query/query/vector-search" >}})
@@ -163,14 +146,14 @@ for more information about using query parameters with embeddings).
163
146
{{< /clients-example >}}
164
147
165
148
The code is now ready to run, but note that it may take a while to complete when
166
- you run it for the first time (which happens because ` huggingfacetransformer `
149
+ you run it for the first time (which happens because ` Hugot `
167
150
must download the ` all-MiniLM-L6-v2 ` model data before it can
168
151
generate the embeddings). When you run the code, it outputs the following text:
169
152
170
153
```
171
- ID: doc:0, Distance:0.114169843495 , Content:'That is a very happy person'
172
- ID: doc:1, Distance:0.610845327377 , Content:'That is a happy dog'
173
- ID: doc:2, Distance:1.48624765873 , Content:'Today is a sunny day'
154
+ ID: doc:0, Distance:2.96992516518 , Content:'That is a very happy person'
155
+ ID: doc:1, Distance:17.3678302765 , Content:'That is a happy dog'
156
+ ID: doc:2, Distance:43.7771987915 , Content:'Today is a sunny day'
174
157
```
175
158
176
159
The results are ordered according to the value of the ` vector_distance `
@@ -220,9 +203,9 @@ Apart from the `jdoc:` prefixes for the keys, the result from the JSON
220
203
query is the same as for hash:
221
204
222
205
```
223
- ID: jdoc:0, Distance:0.114169843495 , Content:'That is a very happy person'
224
- ID: jdoc:1, Distance:0.610845327377 , Content:'That is a happy dog'
225
- ID: jdoc:2, Distance:1.48624765873 , Content:'Today is a sunny day'
206
+ ID: jdoc:0, Distance:2.96992516518 , Content:'That is a very happy person'
207
+ ID: jdoc:1, Distance:17.3678302765 , Content:'That is a happy dog'
208
+ ID: jdoc:2, Distance:43.7771987915 , Content:'Today is a sunny day'
226
209
```
227
210
228
211
## Learn more
0 commit comments