You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
@@ -42,7 +42,7 @@ Dump files can be imported for both link:{neo4j-docs-base-uri}/aura/auradb/impor
42
42
The embeddings on this are generated using link:https://platform.openai.com/docs/guides/embeddings[OpenAI] (model `text-embedding-ada-002`), producing 1536-dimensional vectors.
43
43
44
44
[[single-embedding]]
45
-
== Generate a single embedding and store it
45
+
== Generate a single embedding
46
46
47
47
Use the `genai.vector.encode()` function to generate a vector embedding for a single value.
48
48
@@ -66,15 +66,64 @@ This function sends one API request every time it is called, which may result in
66
66
If you want to generate many embeddings at once, use xref:genai-integrations.adoc#multiple-embeddings[].
67
67
====
68
68
69
-
Use the `db.create.setNodeVectorProperty` procedure to store an embedding to a node property.
To convert this value to a xref:values-and-types/vector.adoc[`VECTOR` value], use the xref:functions/vector.adoc#functions-vector[`vector()`] function.
The embeddings are stored as properties on nodes or relationships with the type `LIST<INTEGER | FLOAT>`.
90
139
91
-
.Create an embedding from a single property and store it
140
+
.Create an embedding from a single property and store it as a `LIST<FLOAT>` property value
92
141
====
93
142
94
-
.Create an embedding property for the Godfather
143
+
.Create a `LIST<FLOAT>` embedding property for the Godfather
95
144
[source,cypher,role=test-skip]
96
145
----
97
146
MATCH (m:Movie {title:'Godfather, The'})
@@ -104,7 +153,7 @@ RETURN m.embedding AS embedding
104
153
105
154
<1> Concatenate the `title` and `plot` of the `Movie` into a single `STRING`.
106
155
<2> Create a 1536 dimensional embedding from the `titleAndPlot`.
107
-
<3> Store the `propertyVector` as a new `embedding` property on The Godfather node.
156
+
<3> Store the `propertyVector` as a new `LIST<FLOAT> `embedding` property on The Godfather node.
108
157
109
158
.Result
110
159
[source, "queryresult"]
@@ -118,12 +167,13 @@ RETURN m.embedding AS embedding
118
167
119
168
[NOTE]
120
169
This result only shows the first 4 of the 1536 numbers in the embedding.
170
+
121
171
====
122
172
123
173
[[multiple-embeddings]]
124
-
== Generating a batch of embeddings and store them
174
+
== Generate a batch of embeddings
125
175
126
-
Use the `genai.vector.encodeBatch` procedure to generate many vector embeddings with a single API request.
176
+
Use the `genai.vector.encodeBatch()` procedure to generate many vector embeddings with a single API request.
127
177
This procedure takes a list of resources as an input, and returns the same number of result rows, instead of a single one.
128
178
129
179
[IMPORTANT]
@@ -132,7 +182,7 @@ This procedure attempts to generate embeddings for all supplied resources in a s
132
182
Therefore, it is recommended to see the respective provider's documentation for details on, for example, the maximum number of embeddings that can be generated per request.
133
183
====
134
184
135
-
.Signature for `genai.vector.encodeBatch` label:procedure[]
185
+
.Signature for `genai.vector.encodeBatch()` label:procedure[]
=== Store multiple embeddings as vector properties
208
+
209
+
`genai.vector.encodeBatch()` returns a `LIST<FLOAT>` `vector` value.
210
+
To convert this value to a xref:values-and-types/vector.adoc[`VECTOR` value], use the xref:functions/vector.adoc#functions-vector[`vector()`] function.
211
+
The full function signature can be seen xref:genai-integrations.adoc#store-single-embedding-vector[above].
212
+
213
+
.Create embeddings from a limited number of properties and store them as `VECTOR` properties
214
+
====
215
+
216
+
[source, cypher, role=test-skip]
217
+
----
218
+
MATCH (m:Movie WHERE m.plot IS NOT NULL)
219
+
WITH m
220
+
LIMIT 20
221
+
WITH collect(m) AS moviesList // <1>
222
+
WITH moviesList, [movie IN moviesList | movie.title || ': ' || movie.plot] AS batch // <2>
<1> xref:functions/aggregating.adoc#functions-collect[Collect] all 20 `Movie` nodes into a `LIST<NODE>`.
229
+
<2> Use a xref:expressions/list-expressions.adoc#list-comprehension[list comprehension] (`[]`) to extract the `title` and `plot` properties of the movies in `moviesList` into a new `LIST<STRING>`.
230
+
<3> `db.create.setNodeVectorProperty` is run for each `vector` returned by `genai.vector.encodeBatch()`, and stores that vector as a property named `embedding` on the corresponding node.
231
+
====
232
+
233
+
.Create embeddings from a large number properties and store them as `VECTOR` properties
234
+
====
235
+
[source, cypher, role=test-skip]
236
+
----
237
+
MATCH (m:Movie WHERE m.plot IS NOT NULL)
238
+
WITH collect(m) AS moviesList, // <1>
239
+
count(*) AS total,
240
+
100 AS batchSize // <2>
241
+
UNWIND range(0, total-1, batchSize) AS batchStart // <3>
242
+
CALL (moviesList, batchStart, batchSize) { // <4>
243
+
WITH [movie IN moviesList[batchStart .. batchStart + batchSize] | movie.title || ': ' || movie.plot] AS batch // <5>
<1> xref:functions/aggregating.adoc#functions-collect[Collect] all returned `Movie` nodes into a `LIST<NODE>`.
250
+
<2> `batchSize` defines the number of nodes in `moviesList` to be processed at once.
251
+
Because vector embeddings can be very large, a larger batch size may require significantly more memory on the Neo4j server.
252
+
Too large a batch size may also exceed the provider's threshold.
253
+
<3> Process `Movie` nodes in increments of `batchSize`.
254
+
The end range `total-1` is due to `range` being inclusive on both ends.
255
+
<4> A xref:subqueries/subqueries-in-transactions.adoc[`CALL` subquery] executes a separate transaction for each batch.
256
+
Note that this `CALL` subquery uses a xref:subqueries/call-subquery.adoc#variable-scope-clause[variable scope clause].
257
+
<5> `batch` is a list of strings, each being the concatenation of `title` and `plot` of one movie.
258
+
<6> The procedure sets `vector` as value for the property named `embedding` for the node at position `batchStart + index` in the `moviesList`.
259
+
<7> Set to `1` the amount of batches to be processed at once.
260
+
For more information on concurrency in transactions, see xref:subqueries/subqueries-in-transactions.adoc#concurrent-transactions[`CALL` subqueries -> Concurrent transactions]).
261
+
262
+
[NOTE]
263
+
This example may not scale to larger datasets, as `collect(m)` requires the whole result set to be loaded in memory.
264
+
For an alternative method more suitable to processing large amounts of data, see link:https://neo4j.com/docs/genai/tutorials/embeddings-vector-indexes/[GenAI documentation - Embeddings & Vector Indexes Tutorial -> Create embeddings with cloud AI providers].
265
+
266
+
====
267
+
268
+
[[store-multiple-embeddings-list-float]]
269
+
=== Store multiple embeddings as list of floats properties
270
+
271
+
Use the `db.create.setNodeVectorProperty` procedure to store an embedding as `LIST<FLOAT>` value to a node property.
272
+
Use the `db.create.setRelationshipVectorProperty` procedure to store an embedding as a `LIST<FLOAT>` value to a relationship property.
273
+
The full procedure signatures can be seen xref:genai-integrations.adoc#store-single-embedding-list-float[above].
274
+
275
+
.Create embeddings from a limited number of properties and store them as `LIST<FLOAT>` properties
<1> xref:functions/aggregating.adoc#functions-collect[Collect] all 20 `Movie` nodes into a `LIST<NODE>`.
171
291
<2> Use a xref:expressions/list-expressions.adoc#list-comprehension[list comprehension] (`[]`) to extract the `title` and `plot` properties of the movies in `moviesList` into a new `LIST<STRING>`.
172
-
<3> `db.create.setNodeVectorProperty` is run for each `vector` returned by `genai.vector.encodeBatch`, and stores that vector as a property named `embedding` on the corresponding node.
292
+
<3> `db.create.setNodeVectorProperty` is run for each `vector` returned by `genai.vector.encodeBatch()`, and stores that vector as a property named `embedding` on the corresponding node.
173
293
====
174
294
175
-
.Create embeddings from a large number of properties and store them
295
+
.Create embeddings from a large number properties and store them as `LIST<FLOAT>` values
176
296
====
177
297
[source, cypher, role=test-skip]
178
298
----
@@ -211,7 +331,7 @@ For an alternative method more suitable to processing large amounts of data, see
211
331
== GenAI providers
212
332
213
333
The following GenAI providers are supported for generating vector embeddings.
214
-
Each provider has its own configuration map that can be passed to `genai.vector.encode` or `genai.vector.encodeBatch`.
334
+
Each provider has its own configuration map that can be passed to `genai.vector.encode` or `genai.vector.encodeBatch()`.
Copy file name to clipboardExpand all lines: modules/ROOT/pages/indexes/semantic-indexes/vector-indexes.adoc
+3-2Lines changed: 3 additions & 2 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -40,7 +40,8 @@ An embedding is a numerical representation of a data object, such as a text, ima
40
40
Each word or token in a text is typically represented as high-dimensional vector where each dimension represents a certain aspect of the word’s meaning.
41
41
42
42
The embedding for a particular data object can be created by both proprietary (such as https://cloud.google.com/vertex-ai[Vertex AI] or https://openai.com/[OpenAI]) and open source (such as https://github.com/UKPLab/sentence-transformers[sentence-transformers]) embedding generators, which can produce vector embeddings with dimensions such as 256, 768, 1536, and 3072.
43
-
In Neo4j, vector embeddings are stored as `LIST<INTEGER | FLOAT>` properties on a node or relationship.
43
+
Vector embeddings are stored as `LIST<INTEGER | FLOAT>` properties on a node or relationship.
44
+
As of Neo4j 2025.xx, they can also be more efficiently stored as xref:values-and-types/vector.adoc[`VECTOR` types].
44
45
45
46
[NOTE]
46
47
====
@@ -126,7 +127,7 @@ For more information about the values accepted by different index providers, see
126
127
==== `vector.dimensions`
127
128
The dimensions of the vectors to be indexed.
128
129
For more information, see xref:indexes/semantic-indexes/vector-indexes.adoc#embeddings[].
129
-
This setting can be omitted, and any `LIST<INTEGER | FLOAT>` can be indexed and queried, separated by their dimensions, _though only vectors of the same dimension can be compared._
130
+
This setting can be omitted, and any `LIST<INTEGER | FLOAT>` or, as of Neo4j 2025.xx, `VECTOR` value can be indexed and queried, separated by their dimensions, _though only vectors of the same dimension can be compared._
130
131
Setting this value adds additional checks that ensure only vectors with the configured dimensions are indexed, and querying the index with a vector of a different dimensions returns an error.
0 commit comments