Skip to content

Commit 106472f

Browse files
progress on functions, genai procedures etc
1 parent 4edc094 commit 106472f

File tree

5 files changed

+434
-77
lines changed

5 files changed

+434
-77
lines changed

modules/ROOT/pages/functions/vector.adoc

Lines changed: 124 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -129,19 +129,139 @@ This returns the two nearest neighbors.
129129
130130
======
131131

132+
132133
[role=label--new-2025.xx]
133134
[[functions-vector_dimension_count]]
134135
== vector_dimension_count()
135136

136-
* `(vector :: VECTOR) :: INTEGER`
137-
* Calculates the size of a vector.
137+
.Details
138+
|===
139+
| *Syntax* 3+| `vector_dimension_count(vector)`
140+
| *Description* 3+| Calculates the dimension of a `VECTOR`.
141+
.2+| *Arguments* | *Name* | *Type* | *Description*
142+
| `vector` | `VECTOR` | The vector to calculate the dimension of.
143+
| *Returns* 3+| `INTEGER`
144+
|===
145+
146+
.Considerations
147+
|===
148+
149+
| If `vector` is not a xref:values-and-types/vector.adoc[`VECTOR`] value, an error will be thrown.
150+
| Alias to the xref:functions/scalar.adoc#functions-size[`size()`] function.
151+
152+
|===
153+
154+
.vector_dimension_count()
155+
=====
156+
157+
.Calculate the size of a `VECTOR`
158+
[source, cypher]
159+
----
160+
RETURN vector_dimension_count(vector([1, 2, 3], 3, INTEGER)) AS size
161+
----
162+
163+
164+
.Result
165+
[role="queryresult",options="header,footer",cols="1*<m"]
166+
|===
167+
168+
| size
169+
| 3
170+
171+
1+d|Rows: 1
172+
173+
|===
174+
=====
175+
138176

139177
[role=label--new-2025.xx]
140178
[[functions-vector_distance]]
141179
== vector_distance()
142180

143-
* `(vector1 :: VECTOR, vector2 :: VECTOR, vectorDistanceMetric :: [EUCLIDEAN, EUCLIDEAN_SQUARED, MANHATTAN, COSINE, DOT, HAMMING]) :: FLOAT`
144-
* Returns a `FLOAT` representing the distance between the two vector values based on the selected `vectorDistanceMetric` algorithm.
181+
.Details
182+
|===
183+
| *Syntax* 3+| `vector_dimension(vector1, vector2, vectorDistanceMetric)`
184+
| *Description* 3+| Returns a `FLOAT` representing the distance between the two vector values based on the selected `vectorDistanceMetric` algorithm.
185+
.4+| *Arguments* | *Name* | *Type* | *Description*
186+
| `vector1` | `VECTOR` | The first vector.
187+
| `vector2` | `VECTOR` | The second vector.
188+
| `vectorDistanceMetric` | `[EUCLIDEAN, EUCLIDEAN_SQUARED, MANHATTAN, COSINE, DOT, HAMMING]` | The vector distance algorithm to calculate the distance by.
189+
| *Returns* 3+| `FLOAT`
190+
|===
191+
192+
.`vectorDistanceMetric` algorithms
193+
[cols="1,3", options="header"]
194+
|===
195+
| Distance Type | Formula
196+
197+
| `EUCLIDEAN`
198+
| √( (A₁ - B₁)² + (A₂ - B₂)² + ... + (Aᴰ - Bᴰ)² )
199+
200+
| `EUCLIDEAN_SQUARED`
201+
| (A₁ - B₁)² + (A₂ - B₂)² + ... + (Aᴰ - Bᴰ)²
202+
203+
| `MANHATTAN`
204+
| \|A₁ - B₁\| + \|A₂ - B₂\| + ... + \|Aᴰ - Bᴰ\|
205+
206+
| `COSINE`
207+
| 1 - ( (A₁×B₁ + A₂×B₂ + ... + Aᴰ×Bᴰ) / ( √(A₁² + A₂² + ... + Aᴰ²) × √(B₁² + B₂² + ... + Bᴰ²) ) )
208+
209+
| `DOT`
210+
| - (A₁×B₁ + A₂×B₂ + ... + Aᴰ×Bᴰ)
211+
212+
| `HAMMING`
213+
| Number of dimensions in which `vector1` and `vector2` differ.
214+
|===
215+
216+
.Considerations
217+
|===
218+
219+
| The smaller the returned number, the more similar the `VECTOR` values.
220+
The larger the number, the more distant the vectors.
221+
This is in contrast to the similarity functions where the closer to `1` the result is the higher the degree of similarity.
222+
223+
|===
224+
225+
226+
.vector_distance()
227+
=====
228+
229+
.Calculate the distance between two `VECTOR` values using the `COSINE` vector distance algorithm
230+
[source, cypher]
231+
----
232+
RETURN vector_distance(vector([1, 2, 3], 3, INT), vector([1, 2, 4], 3, INT), COSINE) AS distance
233+
----
234+
235+
.Result
236+
[role="queryresult",options="header,footer",cols="1*<m"]
237+
|===
238+
239+
| distance
240+
| 0.008539795875549316
241+
242+
1+d|Rows: 1
243+
244+
|===
245+
246+
.Calculate the distance between two `VECTOR` values using the `EUCLIDEAN` vector distance algorithm
247+
[source, cypher]
248+
----
249+
RETURN vector_distance(vector([1.0, 5.0, 3.0, 6.7], 4, FLOAT), vector([5.0, 2.5, 3.1, 9.0], 4, FLOAT), EUCLIDEAN)
250+
----
251+
252+
.Result
253+
[role="queryresult",options="header,footer",cols="1*<m"]
254+
|===
255+
256+
| distance
257+
| 5.248809388804284
258+
259+
1+d|Rows: 1
260+
261+
|===
262+
263+
=====
264+
145265

146266

147267
[role=label--new-2025.xx]

modules/ROOT/pages/genai-integrations.adoc

Lines changed: 133 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -42,7 +42,7 @@ Dump files can be imported for both link:{neo4j-docs-base-uri}/aura/auradb/impor
4242
The embeddings on this are generated using link:https://platform.openai.com/docs/guides/embeddings[OpenAI] (model `text-embedding-ada-002`), producing 1536-dimensional vectors.
4343

4444
[[single-embedding]]
45-
== Generate a single embedding and store it
45+
== Generate a single embedding
4646

4747
Use the `genai.vector.encode()` function to generate a vector embedding for a single value.
4848

@@ -66,15 +66,64 @@ This function sends one API request every time it is called, which may result in
6666
If you want to generate many embeddings at once, use xref:genai-integrations.adoc#multiple-embeddings[].
6767
====
6868

69-
Use the `db.create.setNodeVectorProperty` procedure to store an embedding to a node property.
69+
[role=label--new-2025.xx label--enterprise-edition]
70+
[[store-single-embedding-vector]]
71+
=== Store a single embedding as a vector property
72+
73+
`genai.vector.encode()` returns a `LIST<FLOAT>`.
74+
To convert this value to a xref:values-and-types/vector.adoc[`VECTOR` value], use the xref:functions/vector.adoc#functions-vector[`vector()`] function.
75+
76+
.Signature for `vector()` label:function[]
77+
[source]
78+
----
79+
vector(vectorValue :: STRING | LIST<INTEGER | FLOAT>, dimension :: INTEGER, coordinateType :: [INTEGER64, INTEGER32, INTEGER16, INTEGER8, FLOAT64, FLOAT32]) :: VECTOR
80+
----
81+
82+
.Create an embedding from a single property and store it as a `VECTOR` property value
83+
====
84+
85+
.Create an `VECTOR` embedding property for the Godfather
86+
[source,cypher,role=test-skip]
87+
----
88+
MATCH (m:Movie {title:'Godfather, The'})
89+
WHERE m.plot IS NOT NULL AND m.title IS NOT NULL
90+
WITH m, m.title || ' ' || m.plot AS titleAndPlot // <1>
91+
WITH m, genai.vector.encode(titleAndPlot, 'OpenAI', { token: $token }) AS propertyVector // <2>
92+
SET m.embedding = vector(propertyVector, 1536, FLOAT32) // <3>
93+
RETURN m.embedding AS embedding
94+
----
95+
96+
<1> Concatenate the `title` and `plot` of the `Movie` into a single `STRING`.
97+
<2> Create a 1536 dimensional embedding from the `titleAndPlot`.
98+
<3> Store the `propertyVector` as a new `VECTOR` `embedding` property on The Godfather node.
99+
100+
.Result
101+
[source, "queryresult"]
102+
----
103+
+----------------------------------------------------------------------------------------------------+
104+
| embedding |
105+
+----------------------------------------------------------------------------------------------------+
106+
| [0.005239539314061403, -0.039358530193567276, -0.0005175105179660022, -0.038706034421920776, ... ] |
107+
+----------------------------------------------------------------------------------------------------+
108+
----
109+
110+
[NOTE]
111+
This result only shows the first 4 of the 1536 numbers in the embedding.
112+
113+
====
114+
115+
[[store-single-embedding-list-float]]
116+
=== Store a single embedding as a list of floats property
117+
118+
Use the `db.create.setNodeVectorProperty` procedure to store an embedding as `LIST<FLOAT>` value to a node property.
70119

71120
.Signature for `db.create.setNodeVectorProperty` label:procedure[]
72121
[source,syntax]
73122
----
74123
db.create.setNodeVectorProperty(node :: NODE, key :: STRING, vector :: ANY)
75124
----
76125

77-
Use the `db.create.setRelationshipVectorProperty` procedure to store an embedding to a relationship property.
126+
Use the `db.create.setRelationshipVectorProperty` procedure to store an embedding as a `LIST<FLOAT>` value to a relationship property.
78127

79128
.Signature for `db.create.setRelationshipVectorProperty` label:procedure[]
80129
[source,syntax]
@@ -88,10 +137,10 @@ db.create.setRelationshipVectorProperty(relationship :: RELATIONSHIP, key :: STR
88137

89138
The embeddings are stored as properties on nodes or relationships with the type `LIST<INTEGER | FLOAT>`.
90139

91-
.Create an embedding from a single property and store it
140+
.Create an embedding from a single property and store it as a `LIST<FLOAT>` property value
92141
====
93142
94-
.Create an embedding property for the Godfather
143+
.Create a `LIST<FLOAT>` embedding property for the Godfather
95144
[source,cypher,role=test-skip]
96145
----
97146
MATCH (m:Movie {title:'Godfather, The'})
@@ -104,7 +153,7 @@ RETURN m.embedding AS embedding
104153
105154
<1> Concatenate the `title` and `plot` of the `Movie` into a single `STRING`.
106155
<2> Create a 1536 dimensional embedding from the `titleAndPlot`.
107-
<3> Store the `propertyVector` as a new `embedding` property on The Godfather node.
156+
<3> Store the `propertyVector` as a new `LIST<FLOAT> `embedding` property on The Godfather node.
108157
109158
.Result
110159
[source, "queryresult"]
@@ -118,12 +167,13 @@ RETURN m.embedding AS embedding
118167
119168
[NOTE]
120169
This result only shows the first 4 of the 1536 numbers in the embedding.
170+
121171
====
122172

123173
[[multiple-embeddings]]
124-
== Generating a batch of embeddings and store them
174+
== Generate a batch of embeddings
125175

126-
Use the `genai.vector.encodeBatch` procedure to generate many vector embeddings with a single API request.
176+
Use the `genai.vector.encodeBatch()` procedure to generate many vector embeddings with a single API request.
127177
This procedure takes a list of resources as an input, and returns the same number of result rows, instead of a single one.
128178

129179
[IMPORTANT]
@@ -132,7 +182,7 @@ This procedure attempts to generate embeddings for all supplied resources in a s
132182
Therefore, it is recommended to see the respective provider's documentation for details on, for example, the maximum number of embeddings that can be generated per request.
133183
====
134184

135-
.Signature for `genai.vector.encodeBatch` label:procedure[]
185+
.Signature for `genai.vector.encodeBatch()` label:procedure[]
136186
[source,syntax]
137187
----
138188
genai.vector.encodeBatch(resources :: LIST<STRING>, provider :: STRING, configuration :: MAP = {}) :: (index :: INTEGER, resource :: STRING, vector :: LIST<FLOAT>)
@@ -152,7 +202,77 @@ Each returned row contains the following columns:
152202
* The `resource` (a `STRING`) is the name of the input resource.
153203
* The `vector` (a `LIST<FLOAT>`) is the generated vector embedding for this resource.
154204

155-
.Create embeddings from a limited number of properties and store them
205+
[[store-multiple-embedding-vector]]
206+
[role=label--new-2025.xx label--enterprise-edition]
207+
=== Store multiple embeddings as vector properties
208+
209+
`genai.vector.encodeBatch()` returns a `LIST<FLOAT>` `vector` value.
210+
To convert this value to a xref:values-and-types/vector.adoc[`VECTOR` value], use the xref:functions/vector.adoc#functions-vector[`vector()`] function.
211+
The full function signature can be seen xref:genai-integrations.adoc#store-single-embedding-vector[above].
212+
213+
.Create embeddings from a limited number of properties and store them as `VECTOR` properties
214+
====
215+
216+
[source, cypher, role=test-skip]
217+
----
218+
MATCH (m:Movie WHERE m.plot IS NOT NULL)
219+
WITH m
220+
LIMIT 20
221+
WITH collect(m) AS moviesList // <1>
222+
WITH moviesList, [movie IN moviesList | movie.title || ': ' || movie.plot] AS batch // <2>
223+
CALL genai.vector.encodeBatch(batch, 'OpenAI', { token: $token }) YIELD index, vector
224+
WITH moviesList, index, vector
225+
CALL db.create.setNodeVectorProperty(moviesList[index], 'embedding', vector) // <3>
226+
----
227+
228+
<1> xref:functions/aggregating.adoc#functions-collect[Collect] all 20 `Movie` nodes into a `LIST<NODE>`.
229+
<2> Use a xref:expressions/list-expressions.adoc#list-comprehension[list comprehension] (`[]`) to extract the `title` and `plot` properties of the movies in `moviesList` into a new `LIST<STRING>`.
230+
<3> `db.create.setNodeVectorProperty` is run for each `vector` returned by `genai.vector.encodeBatch()`, and stores that vector as a property named `embedding` on the corresponding node.
231+
====
232+
233+
.Create embeddings from a large number properties and store them as `VECTOR` properties
234+
====
235+
[source, cypher, role=test-skip]
236+
----
237+
MATCH (m:Movie WHERE m.plot IS NOT NULL)
238+
WITH collect(m) AS moviesList, // <1>
239+
count(*) AS total,
240+
100 AS batchSize // <2>
241+
UNWIND range(0, total-1, batchSize) AS batchStart // <3>
242+
CALL (moviesList, batchStart, batchSize) { // <4>
243+
WITH [movie IN moviesList[batchStart .. batchStart + batchSize] | movie.title || ': ' || movie.plot] AS batch // <5>
244+
CALL genai.vector.encodeBatch(batch, 'OpenAI', { token: $token }) YIELD index, vector
245+
CALL db.create.setNodeVectorProperty(moviesList[batchStart + index], 'embedding', vector) // <6>
246+
} IN CONCURRENT TRANSACTIONS OF 1 ROW <7>
247+
----
248+
249+
<1> xref:functions/aggregating.adoc#functions-collect[Collect] all returned `Movie` nodes into a `LIST<NODE>`.
250+
<2> `batchSize` defines the number of nodes in `moviesList` to be processed at once.
251+
Because vector embeddings can be very large, a larger batch size may require significantly more memory on the Neo4j server.
252+
Too large a batch size may also exceed the provider's threshold.
253+
<3> Process `Movie` nodes in increments of `batchSize`.
254+
The end range `total-1` is due to `range` being inclusive on both ends.
255+
<4> A xref:subqueries/subqueries-in-transactions.adoc[`CALL` subquery] executes a separate transaction for each batch.
256+
Note that this `CALL` subquery uses a xref:subqueries/call-subquery.adoc#variable-scope-clause[variable scope clause].
257+
<5> `batch` is a list of strings, each being the concatenation of `title` and `plot` of one movie.
258+
<6> The procedure sets `vector` as value for the property named `embedding` for the node at position `batchStart + index` in the `moviesList`.
259+
<7> Set to `1` the amount of batches to be processed at once.
260+
For more information on concurrency in transactions, see xref:subqueries/subqueries-in-transactions.adoc#concurrent-transactions[`CALL` subqueries -> Concurrent transactions]).
261+
262+
[NOTE]
263+
This example may not scale to larger datasets, as `collect(m)` requires the whole result set to be loaded in memory.
264+
For an alternative method more suitable to processing large amounts of data, see link:https://neo4j.com/docs/genai/tutorials/embeddings-vector-indexes/[GenAI documentation - Embeddings & Vector Indexes Tutorial -> Create embeddings with cloud AI providers].
265+
266+
====
267+
268+
[[store-multiple-embeddings-list-float]]
269+
=== Store multiple embeddings as list of floats properties
270+
271+
Use the `db.create.setNodeVectorProperty` procedure to store an embedding as `LIST<FLOAT>` value to a node property.
272+
Use the `db.create.setRelationshipVectorProperty` procedure to store an embedding as a `LIST<FLOAT>` value to a relationship property.
273+
The full procedure signatures can be seen xref:genai-integrations.adoc#store-single-embedding-list-float[above].
274+
275+
.Create embeddings from a limited number of properties and store them as `LIST<FLOAT>` properties
156276
====
157277
158278
[source, cypher, role=test-skip]
@@ -169,10 +289,10 @@ CALL db.create.setNodeVectorProperty(moviesList[index], 'embedding', vector) //
169289
170290
<1> xref:functions/aggregating.adoc#functions-collect[Collect] all 20 `Movie` nodes into a `LIST<NODE>`.
171291
<2> Use a xref:expressions/list-expressions.adoc#list-comprehension[list comprehension] (`[]`) to extract the `title` and `plot` properties of the movies in `moviesList` into a new `LIST<STRING>`.
172-
<3> `db.create.setNodeVectorProperty` is run for each `vector` returned by `genai.vector.encodeBatch`, and stores that vector as a property named `embedding` on the corresponding node.
292+
<3> `db.create.setNodeVectorProperty` is run for each `vector` returned by `genai.vector.encodeBatch()`, and stores that vector as a property named `embedding` on the corresponding node.
173293
====
174294

175-
.Create embeddings from a large number of properties and store them
295+
.Create embeddings from a large number properties and store them as `LIST<FLOAT>` values
176296
====
177297
[source, cypher, role=test-skip]
178298
----
@@ -211,7 +331,7 @@ For an alternative method more suitable to processing large amounts of data, see
211331
== GenAI providers
212332

213333
The following GenAI providers are supported for generating vector embeddings.
214-
Each provider has its own configuration map that can be passed to `genai.vector.encode` or `genai.vector.encodeBatch`.
334+
Each provider has its own configuration map that can be passed to `genai.vector.encode` or `genai.vector.encodeBatch()`.
215335

216336
[[vertex-ai]]
217337
=== Vertex AI

modules/ROOT/pages/indexes/semantic-indexes/vector-indexes.adoc

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -40,7 +40,8 @@ An embedding is a numerical representation of a data object, such as a text, ima
4040
Each word or token in a text is typically represented as high-dimensional vector where each dimension represents a certain aspect of the word’s meaning.
4141

4242
The embedding for a particular data object can be created by both proprietary (such as https://cloud.google.com/vertex-ai[Vertex AI] or https://openai.com/[OpenAI]) and open source (such as https://github.com/UKPLab/sentence-transformers[sentence-transformers]) embedding generators, which can produce vector embeddings with dimensions such as 256, 768, 1536, and 3072.
43-
In Neo4j, vector embeddings are stored as `LIST<INTEGER | FLOAT>` properties on a node or relationship.
43+
Vector embeddings are stored as `LIST<INTEGER | FLOAT>` properties on a node or relationship.
44+
As of Neo4j 2025.xx, they can also be more efficiently stored as xref:values-and-types/vector.adoc[`VECTOR` types].
4445

4546
[NOTE]
4647
====
@@ -126,7 +127,7 @@ For more information about the values accepted by different index providers, see
126127
==== `vector.dimensions`
127128
The dimensions of the vectors to be indexed.
128129
For more information, see xref:indexes/semantic-indexes/vector-indexes.adoc#embeddings[].
129-
This setting can be omitted, and any `LIST<INTEGER | FLOAT>` can be indexed and queried, separated by their dimensions, _though only vectors of the same dimension can be compared._
130+
This setting can be omitted, and any `LIST<INTEGER | FLOAT>` or, as of Neo4j 2025.xx, `VECTOR` value can be indexed and queried, separated by their dimensions, _though only vectors of the same dimension can be compared._
130131
Setting this value adds additional checks that ensure only vectors with the configured dimensions are indexed, and querying the index with a vector of a different dimensions returns an error.
131132

132133
[NOTE]

0 commit comments

Comments
 (0)