Skip to content

Commit 3a83594

Browse files
gcxmlJonathan-Zollingerrderbiershivaji-kharseryanfoxtyler
authored
DQL V24.0 (#662)
Adding documentation for DQL enhancements in order to support vector type and similarity search. --------- Co-authored-by: Jonathan Zollinger <[email protected]> Co-authored-by: Raphael <[email protected]> Co-authored-by: ShivajiKharse <[email protected]> Co-authored-by: Ryan Fox-Tyler <[email protected]>
1 parent 953335f commit 3a83594

File tree

5 files changed

+67
-2
lines changed

5 files changed

+67
-2
lines changed

.github/CODEOWNERS

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
11
# CODEOWNERS info: https://help.github.com/en/articles/about-code-owners
22
# Owners are automatically requested for review for PRs that changes code
33
# that they own.
4-
* @rderbier @MichelDiz @damonfeldman @rarvikar @Rajakavitha1
4+
* @dgraph-io/committers @rderbier

README.md

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -106,5 +106,6 @@ Pass custom Go-GRPC example to the runnable by passing a `customExampleGoGRPC` t
106106
**Note:** Runnable doesn't support passing a multiline string as an argument to a shortcode. Therefore, you have to create the whole custom example in a single line string by replacing newlines with `\n`.
107107

108108
## History
109+
v24.0:
110+
=======
109111
add Hypermode banner by updating the hugo-docs repository with topbat template.
110-
v24.0:

content/dql/dql-schema.md

Lines changed: 14 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -16,6 +16,9 @@ revenue: float .
1616
running_time: int .
1717
starring: [uid] .
1818
director: [uid] .
19+
description: string .
20+
21+
description_vector: float32vector @index(hnsw(metric:"cosine")) .
1922
2023
type Person {
2124
name
@@ -28,6 +31,8 @@ type Film {
2831
running_time
2932
starring
3033
director
34+
description
35+
description_vector
3136
}
3237
```
3338

@@ -112,6 +117,15 @@ For all triples with a predicate of scalar types the object is a literal.
112117
are RFC 3339 compatible which is different from ISO 8601(as defined in the RDF spec). You should
113118
convert your values to RFC 3339 format before sending them to Dgraph.{{% /notice %}}
114119

120+
### Vector Type
121+
122+
The `float32vector` type denotes a vector of floating point numbers, i.e an ordered array of float32. A node type can contain more than one vector predicate.
123+
124+
Vectors are normaly used to store embeddings obtained from other information through an ML model. When a `float32vector` is [indexed]({{<relref "dql/predicate-indexing.md">}}), the DQL [similar_to]({{<relref "query-language/functions#vector-similarity-search">}}) function can be used for similarity search.
125+
126+
127+
128+
115129
### UID Type
116130

117131
The `uid` type denotes a relationship; internally each node is identified by it's UID which is a `uint64`.

content/dql/predicate-indexing.md

Lines changed: 35 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -9,6 +9,15 @@ weight = 4
99

1010
Filtering on a predicate by applying a [function]({{< relref "query-language/functions.md" >}}) requires an index.
1111

12+
Indices are defined in the [Dgraph types schema]({{<relref "dql/dql-schema.md" >}}) using `@index` directive.
13+
14+
Here are some examples:
15+
```
16+
name: string @index(term) .
17+
release_date: datetime @index(year) .
18+
description_vector: float32vector @index(hnsw(metric:"cosine")) .
19+
```
20+
1221
When filtering by applying a function, Dgraph uses the index to make the search through a potentially large dataset efficient.
1322

1423
All scalar types can be indexed.
@@ -17,6 +26,8 @@ Types `int`, `float`, `bool` and `geo` have only a default index each: with toke
1726

1827
Types `string` and `dateTime` have a number of indices.
1928

29+
Type `float32vector` supports `hsnw` index.
30+
2031
## String Indices
2132
The indices available for strings are as follows.
2233

@@ -34,6 +45,30 @@ transaction conflict rate. Use only the minimum number of and simplest indexes
3445
that your application needs.
3546
{{% /notice %}}
3647

48+
## Vector Indices
49+
50+
The indices available for `float32vector` are as follows.
51+
52+
| Dgraph function | Required index / tokenizer | Notes |
53+
| :----------------------- | :------------ | :--- |
54+
| `similar_to` | `hsnw` | HSNW index supports parameters `metric` and `exponent`. |
55+
56+
57+
#
58+
59+
`hsnw` (**Hierarchical Navigable Small World**) index supports the following parameters
60+
- metric : indicate the metric to use to compute vector similarity. One of `cosine`, `euclidean`, and `dotproduct`. Default is `euclidean`.
61+
62+
- exponent : An integer, represented as a string, roughly representing the number of vectors expected in the index in power of 10. The exponent value,is used to set "reasonable defaults" for HSNW internal tuning parameters. Default is "4" (10^4 vectors).
63+
64+
65+
Here are some examples:
66+
```
67+
simple_vector: float32vector @index(hnsw) .
68+
description_vector: float32vector @index(hnsw(metric:"cosine")) .
69+
large_vector: float32vector @index(hnsw(metric:"euclidean",exponent:"6")) .
70+
```
71+
3772
## DateTime Indices
3873

3974
The indices available for `dateTime` are as follows.

content/query-language/functions.md

Lines changed: 15 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -177,6 +177,21 @@ Same query with a Levenshtein distance of 3.
177177
}
178178
{{< /runnable >}}
179179

180+
## Vector Similarity Search
181+
182+
Syntax Examples: `similar_to(predicate, 3, "[0.9, 0.8, 0, 0]")`
183+
184+
Alternatively the vector can be passed as a variable: `similar_to(predicate, 3, $vec)`
185+
186+
This function finds the nodes that have `predicate` close to the provided vector. The search is based on the distance metric specified in the index (`cosine`, `euclidean`, or `dotproduct`). The shorter distance indicates more similarity.
187+
The second parameter, `3` specifies that top 3 matches be returned.
188+
189+
Schema Types: `float32vector`
190+
191+
Index Required: `hnsw`
192+
193+
194+
180195
## Full-Text Search
181196

182197
Syntax Examples: `alloftext(predicate, "space-separated text")` and `anyoftext(predicate, "space-separated text")`

0 commit comments

Comments
 (0)