Skip to content

Commit 3f2abbd

Browse files
Add ngram updates (#189)
1 parent ac361fe commit 3f2abbd

File tree

5 files changed

+80
-32
lines changed

5 files changed

+80
-32
lines changed

dgraph/concepts/index-tokenize.mdx

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -27,6 +27,6 @@ property. E.g. if a Book Node has a Title attribute, and you add a "term" index,
2727
each word (term) in the text will be indexed. The word "Tokenizer" derives its
2828
name from tokenizing operations to create this index type.
2929

30-
Similary if the Book has a publicationDateTime you can add a day or year index.
31-
The "tokenizer" here extracts the value to be indexed, which may be the day or
32-
hour of the dateTime, or only the year.
30+
Similarly, if the Book has a publicationDateTime you can add a day or year
31+
index. The "tokenizer" here extracts the value to be indexed, which may be the
32+
day or hour of the dateTime, or only the year.

dgraph/dql/functions.mdx

Lines changed: 28 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -80,8 +80,8 @@ Schema Types: `string`
8080

8181
Index Required: `term`
8282

83-
Matches strings that have any of the specified terms in any order; case
84-
insensitive.
83+
Matches strings that have any of the specified terms in any order (case
84+
insensitive).
8585

8686
#### Usage at root
8787

@@ -117,6 +117,31 @@ Steven Spielberg.
117117
}
118118
```
119119

120+
## N-gram search
121+
122+
Syntax Examples: `ngram(predicate, "a string of text")`
123+
124+
Schema Types: `string`
125+
126+
Index Required: `ngram`
127+
128+
The `ngram` index tokenizes a string into shingles (contiguous sequences of n
129+
words), with support for stop word removal and stemming. The `ngram` function
130+
matches strings that contain the given sequence of terms.
131+
132+
#### Usage at root
133+
134+
Query example: all nodes that have a `name` containing `quick`, `brown`, and
135+
`fox`.
136+
137+
```json
138+
{
139+
me(func: ngram(name@en, "quick brown fox")) {
140+
name@en
141+
}
142+
}
143+
```
144+
120145
## Regular expressions
121146

122147
Syntax Examples: `regexp(predicate, /regular-expression/)` or case insensitive
@@ -474,7 +499,7 @@ Query Example: Movies initially released in 1977, listed by genre.
474499
}
475500
```
476501

477-
## uid
502+
## UID
478503

479504
Syntax Examples:
480505

dgraph/dql/indexes.mdx

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -43,6 +43,7 @@ The indices available for strings are as follows.
4343
| `le`, `ge`, `lt`, `gt` | `exact` | Allows faster sorting. |
4444
| `allofterms`, `anyofterms` | `term` | Allows searching by a term in a sentence. |
4545
| `alloftext`, `anyoftext` | `fulltext` | Matching with language specific stemming and stopwords. |
46+
| `ngram` | `ngram` | Contiguous sequence matching (shingles) with stop word removal and stemming. |
4647
| `regexp` | `trigram` | Regular expression matching. Can also be used for equality checking. |
4748

4849
<Warning>

dgraph/graphql/schema/dgraph-schema.mdx

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -67,6 +67,7 @@ enum DgraphIndex {
6767
term
6868
fulltext
6969
trigram
70+
ngram
7071
regexp
7172
year
7273
month

dgraph/graphql/schema/directives/search.mdx

Lines changed: 47 additions & 26 deletions
Original file line numberDiff line numberDiff line change
@@ -85,15 +85,13 @@ contain the term "GraphQL".
8585

8686
```graphql
8787
queryAuthor(filter: { name: { eq: "Diggy" } } ) {
88-
posts(filter: { title: { anyofterms: "GraphQL" }}) {
89-
title
9088
}
9189
}
9290
```
9391

9492
Dgraph can build search types with the ability to search between a range. For
95-
example with the above Post type with datePublished field, a query can find
96-
publish dates within a range
93+
example, with the preceding Post type with the `datePublished` field, a query
94+
can find publish dates within a range.
9795

9896
```graphql
9997
query {
@@ -104,8 +102,8 @@ query {
104102
```
105103

106104
Dgraph can also build GraphQL search ability to find match a value from a list.
107-
For example with the above Author type with the name field, a query can return
108-
the Authors that match a list
105+
For example with the preceding Author type with the name field, a query can
106+
return the Authors that match a list
109107

110108
```graphql
111109
queryAuthor(filter: { name: { in: ["Diggy", "Jarvis"] } } ) {
@@ -115,13 +113,13 @@ queryAuthor(filter: { name: { in: ["Diggy", "Jarvis"] } } ) {
115113

116114
There's different search possible for each type as explained below.
117115

118-
### Int, Float and DateTime
116+
### Int, float and dateTime
119117

120118
| argument | constructed filter |
121119
| -------- | ------------------------------------------------- |
122120
| none | `lt`, `le`, `eq`, `in`, `between`, `ge`, and `gt` |
123121

124-
Search for fields of types `Int`, `Float` and `DateTime` is enabled by adding
122+
Search for fields of types `Int`, `Float` and `dateTime` is enabled by adding
125123
`@search` to the field with no arguments. For example, if a schema contains:
126124

127125
```graphql
@@ -187,7 +185,7 @@ queryAuthor(filter: { name: { eq: "Diggy" } } ) {
187185
}
188186
```
189187

190-
### DateTime
188+
### dateTime
191189

192190
| argument | constructed filters |
193191
| --------------------------------- | ------------------------------------------------- |
@@ -198,14 +196,14 @@ the search index should be built: by year, month, day or hour. `@search`
198196
defaults to year, but once you understand your data and query patterns, you
199197
might want to changes that like `@search(by: [day])`.
200198

201-
### Boolean
199+
### Boolean fields
202200

203201
| argument | constructed filter |
204202
| -------- | ------------------ |
205203
| none | `true` and `false` |
206204

207-
Booleans can only be tested for true or false. If `isPublished: Boolean @search`
208-
is in the schema, then the search allows
205+
Boolean fields can only be tested for `true` or `false`. If
206+
`isPublished: Boolean @search` is in the schema, then the search allows
209207

210208
```graphql
211209
filter: { isPublished: true }
@@ -229,6 +227,7 @@ you have the following options as arguments to `@search`.
229227
| `regexp` | `regexp` (regular expressions) |
230228
| `term` | `allofterms` and `anyofterms` |
231229
| `fulltext` | `alloftext` and `anyoftext` |
230+
| `ngram` | `ngram` |
232231

233232
- _Schema rule_: `hash` and `exact` can't be used together.
234233

@@ -250,7 +249,7 @@ query {
250249
}
251250
```
252251

253-
to find users with names lexicographically after "Diggy".
252+
to find users with names lexicographically after "Diggy."
254253

255254
#### String regular expression search
256255

@@ -283,12 +282,8 @@ query {
283282
}
284283
```
285284

286-
will match all posts with both "GraphQL and "tutorial" in the title, while
287285
`anyofterms: "GraphQL tutorial"` would match posts with either "GraphQL" or
288-
"tutorial".
289286

290-
`fulltext` search is Google-stye text search with stop words, stemming. etc. So
291-
`alloftext: "run woman"` would match "run" as well as "running", etc. For
292287
example, to find posts that talk about fantastic GraphQL tutorials:
293288

294289
```graphql
@@ -297,6 +292,32 @@ query {
297292
}
298293
```
299294

295+
#### String ngram search
296+
297+
The `ngram` index tokenizes a string into contiguous sequences of n words, with
298+
support for stop word removal and stemming. N-gram search matches if the indexed
299+
string contains the given sequence of terms.
300+
301+
If the schema has
302+
303+
```graphql
304+
type Post {
305+
title: String @search(by: [ngram])
306+
...
307+
}
308+
```
309+
310+
then
311+
312+
```graphql
313+
query {
314+
queryPost(filter: { title: { ngram: "quick brown fox" } } ) { ... }
315+
}
316+
```
317+
318+
will match all posts that contain the contiguous sequence "quick brown fox" in
319+
the title.
320+
300321
#### Strings with multiple searches
301322

302323
It is possible to add multiple string indexes to a field. For example to search
@@ -310,7 +331,7 @@ type Author {
310331
}
311332
```
312333

313-
### Enums
334+
### enums
314335

315336
| argument | constructed searches |
316337
| -------- | --------------------------------------------------------------------- |
@@ -319,8 +340,8 @@ type Author {
319340
| `exact` | `lt`, `le`, `eq`, `in`, `between`, `ge`, and `gt` (lexicographically) |
320341
| `regexp` | `regexp` (regular expressions) |
321342

322-
Enums are serialized in Dgraph as strings. `@search` with no arguments is the
323-
same as `@search(by: [hash])` and provides `eq` and `in` searches. Also
343+
enum fields are serialized in Dgraph as strings. `@search` with no arguments is
344+
the same as `@search(by: [hash])` and provides `eq` and `in` searches. Also
324345
available for enums are `exact` and `regexp`. For hash and exact search on
325346
enums, the literal enum value, without quotes `"..."`, is used, for regexp,
326347
strings are required. For example:
@@ -387,7 +408,7 @@ type Hotel {
387408
}
388409
```
389410

390-
#### near
411+
#### Near
391412

392413
The `near` filter matches all entities where the location given by a field is
393414
within a distance `meters` from a coordinate.
@@ -408,7 +429,7 @@ queryHotel(filter: {
408429
}
409430
```
410431

411-
#### within
432+
#### Within
412433

413434
The `within` filter matches all entities where the location given by a field is
414435
within a defined `polygon`.
@@ -441,7 +462,7 @@ queryHotel(filter: {
441462
}
442463
```
443464

444-
#### contains
465+
#### Contains
445466

446467
The `contains` filter matches all entities where the `Polygon` or `MultiPolygon`
447468
field contains another given `point` or `polygon`.
@@ -489,7 +510,7 @@ A `contains` example using `polygon`:
489510
}
490511
```
491512

492-
#### intersects
513+
#### Intersects
493514

494515
The `intersects` filter matches all entities where the `Polygon` or
495516
`MultiPolygon` field intersects another given `polygon` or `multiPolygon`.
@@ -579,8 +600,8 @@ Unions can be queried only as a field of a type. Union queries can't be ordered,
579600
but you can filter and paginate them.
580601

581602
<Note>
582-
Union queries do not support the `order` argument. The results will be ordered
583-
by the `uid` of each node in ascending order.
603+
Union queries don't support the `order` argument. The results will be ordered
604+
by the UID of each node in ascending order.
584605
</Note>
585606

586607
For example, the following schema will enable to query the `members` union field

0 commit comments

Comments
 (0)