From 34e32fe1f5bbdbcad8681424b5abd95ff7f7aa4c Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Jens=20Pryce-=C3=85klundh?= <112686610+JPryce-Aklundh@users.noreply.github.com> Date: Thu, 9 Jan 2025 16:05:40 +0100 Subject: [PATCH 1/2] explain trigram indexing --- .../managing-indexes.adoc | 14 +++++++++++++- .../search-performance-indexes/using-indexes.adoc | 2 +- 2 files changed, 14 insertions(+), 2 deletions(-) diff --git a/modules/ROOT/pages/indexes/search-performance-indexes/managing-indexes.adoc b/modules/ROOT/pages/indexes/search-performance-indexes/managing-indexes.adoc index 53b35afa8..cd1f2e1d4 100644 --- a/modules/ROOT/pages/indexes/search-performance-indexes/managing-indexes.adoc +++ b/modules/ROOT/pages/indexes/search-performance-indexes/managing-indexes.adoc @@ -216,7 +216,7 @@ As of Neo4j 5.17, an informational notification is instead returned. Creating a text index can be done with the `CREATE TEXT INDEX` command. Note that the index name must be unique. -Text indexes have no supported index configuration and, as of Neo4j 5.1, they have two index providers available, `text-2.0` (default) and `text-1.0` (deprecated). +Text indexes have no supported index configuration and, as of Neo4j 5.1, they have two index providers available, `text-2.0` (default -- see xref:indexes/search-performance-indexes/managing-indexes.adoc#text-indexes-trigram-indexes[Trigram indexing] below for more information) and `text-1.0` (deprecated). [[text-indexes-supported-predicates]] [discrete] @@ -286,6 +286,18 @@ See the section about xref:indexes/search-performance-indexes/using-indexes.adoc [TIP] Text indexes are only used for exact query matches. To perform approximate matches (including, for example, variations and typos), and to compute a similarity score between `STRING` values, use semantic xref:indexes/semantic-indexes/full-text-indexes.adoc[full-text indexes] instead. +[[text-indexes-trigram-indexes]] +==== Trigram indexing + +The default text index provider, `text-2.0`, uses trigram indexing. +This means that `STRING` values are indexed into overlapping trigrams, each containing three characters. +For example, the word `"developer"` would be indexed by the following trigrams: `["dev", "eve", "vel", "elo", "lop", "ope", "per"]`. + +This makes text indexes particularly suited for substring (`CONTAINS`) and suffix (`ENDS WITH`) searches. +For example, searches like `CONTAINS "vel"` or `ENDS WITH "per"` can be efficiently performed by directly looking up the relevant trigrams in the index. +By comparison, range indexes, which indexes `STRING` values alphabetically (see xref:indexes/search-performance-indexes/using-indexes.adoc#range-index-backed-order-by[Range index-backed `ORDER BY`] for more information) and are therefore more suited for prefix searches (`STARTS WITH`), would need to scan through all indexed values to check if `"vel"` existed anywhere within the text. +For more information, see xref:indexes/search-performance-indexes/using-indexes.adoc#text-indexes[The impact of indexes on query performance -> Text indexes]. + [discrete] [[text-indexes-examples]] ==== Examples diff --git a/modules/ROOT/pages/indexes/search-performance-indexes/using-indexes.adoc b/modules/ROOT/pages/indexes/search-performance-indexes/using-indexes.adoc index 7a163a8f1..e8c4eee44 100644 --- a/modules/ROOT/pages/indexes/search-performance-indexes/using-indexes.adoc +++ b/modules/ROOT/pages/indexes/search-performance-indexes/using-indexes.adoc @@ -232,7 +232,7 @@ Total database accesses: 7, total allocated memory: 312 This is because range indexes store `STRING` values alphabetically. This means that, while they are very efficient for retrieving exact matches of a `STRING`, or for prefix matching, they are less efficient for suffix and contains searches, where they have to scan all relevant properties to filter any matches. -Text indexes do not store `STRING` properties alphabetically, and are instead optimized for suffix and contains searches. +Text indexes do not store `STRING` properties alphabetically, and are instead optimized for suffix and contains searches (for more information, see xref:indexes/search-performance-indexes/managing-indexes.adoc#text-indexes-trigram-indexes[Create a text index -> Trigram indexing]). That said, if no range index had been present on the name property, the previous query would still have been able to utilize the text index. It would have done so less efficiently than a range index, but it still would have been useful. From a6c78379595cb045b75c6afed72e57e691b02187 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Jens=20Pryce-=C3=85klundh?= <112686610+JPryce-Aklundh@users.noreply.github.com> Date: Fri, 10 Jan 2025 11:20:22 +0100 Subject: [PATCH 2/2] post-review correction --- .../search-performance-indexes/managing-indexes.adoc | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/modules/ROOT/pages/indexes/search-performance-indexes/managing-indexes.adoc b/modules/ROOT/pages/indexes/search-performance-indexes/managing-indexes.adoc index cd1f2e1d4..c523f2b77 100644 --- a/modules/ROOT/pages/indexes/search-performance-indexes/managing-indexes.adoc +++ b/modules/ROOT/pages/indexes/search-performance-indexes/managing-indexes.adoc @@ -290,12 +290,12 @@ Text indexes are only used for exact query matches. To perform approximate match ==== Trigram indexing The default text index provider, `text-2.0`, uses trigram indexing. -This means that `STRING` values are indexed into overlapping trigrams, each containing three characters. +This means that `STRING` values are indexed into overlapping trigrams, each containing three Unicode code points. For example, the word `"developer"` would be indexed by the following trigrams: `["dev", "eve", "vel", "elo", "lop", "ope", "per"]`. -This makes text indexes particularly suited for substring (`CONTAINS`) and suffix (`ENDS WITH`) searches. +This makes text indexes particularly suitable for substring (`CONTAINS`) and suffix (`ENDS WITH`) searches, as well as prefix searches (`STARTS WITH`). For example, searches like `CONTAINS "vel"` or `ENDS WITH "per"` can be efficiently performed by directly looking up the relevant trigrams in the index. -By comparison, range indexes, which indexes `STRING` values alphabetically (see xref:indexes/search-performance-indexes/using-indexes.adoc#range-index-backed-order-by[Range index-backed `ORDER BY`] for more information) and are therefore more suited for prefix searches (`STARTS WITH`), would need to scan through all indexed values to check if `"vel"` existed anywhere within the text. +By comparison, range indexes, which indexes `STRING` values lexicographically (see xref:indexes/search-performance-indexes/using-indexes.adoc#range-index-backed-order-by[Range index-backed `ORDER BY`] for more information) and are therefore more suited for prefix searches, would need to scan through all indexed values to check if `"vel"` existed anywhere within the text. For more information, see xref:indexes/search-performance-indexes/using-indexes.adoc#text-indexes[The impact of indexes on query performance -> Text indexes]. [discrete]