Skip to content

Commit fb2dd95

Browse files
committed
Correct collation Analyzer breaking change
1 parent f98878b commit fb2dd95

File tree

2 files changed

+32
-26
lines changed

2 files changed

+32
-26
lines changed

site/content/3.12/release-notes/version-3.12/incompatible-changes-in-3-12.md

Lines changed: 16 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -174,19 +174,22 @@ instead in most cases, and in some cases AQL can be sufficient.
174174

175175
The [`collation` Analyzer](../../index-and-search/analyzers.md#collation) lets
176176
you adhere to the alphabetic order of a language in range queries. For example,
177-
using the Swedish locale (`sv`), `å` goes after `z` and not near `a`, which
178-
can impact queries.
179-
180-
Sorting by the output of the `collation` Analyzer like
181-
`SORT TOKENS(<text>, <collationAnalyzer>` did not produce meaningful results in
182-
previous versions. Sorting the letters `å`, `a`, `b`, `z` resulted in the order
183-
`b` `a` `å` `z` using an English locale (`en`) and `b` `å` `a` `z` using a
184-
Swedish locale (`sv`).
185-
186-
In v3.12, this now sorts properly to `a` `å` `b` `z` (English) and
187-
`a` `b` `z` `å` (Swedish). This change can make queries behave differently
188-
compared to v3.11 and older, and inverted indexes and Views using
189-
`collation` Analyzers need to be recreated to ensure that they work correctly.
177+
using a Swedish locale (`sv`), the sorting order is `å` after `z`, whereas using
178+
an English locale (`en`), `å` is preceded by `a`. This impacts queries with
179+
`SEARCH` expressions like `doc.text < "c"`, excluding `å` when using the Swedish
180+
locale.
181+
182+
ArangoDB 3.12 bundles an upgraded version of the ICU library. It is used for
183+
Unicode character handling including text sorting. Because of changes in ICU,
184+
data produced by the `collation` Analyzer in previous versions is not compatible
185+
with ArangoDB v3.12. You need to **recreate inverted indexes and Views that use
186+
`collation` Analyzers** to ensure that they work correctly. Otherwise,
187+
range queries involving the `collation` Analyzers and indexes created in v3.11
188+
or older versions may behave in unpredicted ways.
189+
190+
Note that sorting by the output of the `collation` Analyzer like
191+
`SORT TOKENS(<text>, <collationAnalyzer>` is still not a supported feature and
192+
doesn't produce meaningful results.
190193

191194
## Control character escaping in audit log
192195

site/content/4.0/release-notes/version-3.12/incompatible-changes-in-3-12.md

Lines changed: 16 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -174,19 +174,22 @@ instead in most cases, and in some cases AQL can be sufficient.
174174

175175
The [`collation` Analyzer](../../index-and-search/analyzers.md#collation) lets
176176
you adhere to the alphabetic order of a language in range queries. For example,
177-
using the Swedish locale (`sv`), `å` goes after `z` and not near `a`, which
178-
can impact queries.
179-
180-
Sorting by the output of the `collation` Analyzer like
181-
`SORT TOKENS(<text>, <collationAnalyzer>` did not produce meaningful results in
182-
previous versions. Sorting the letters `å`, `a`, `b`, `z` resulted in the order
183-
`b` `a` `å` `z` using an English locale (`en`) and `b` `å` `a` `z` using a
184-
Swedish locale (`sv`).
185-
186-
In v3.12, this now sorts properly to `a` `å` `b` `z` (English) and
187-
`a` `b` `z` `å` (Swedish). This change can make queries behave differently
188-
compared to v3.11 and older, and inverted indexes and Views using
189-
`collation` Analyzers need to be recreated to ensure that they work correctly.
177+
using a Swedish locale (`sv`), the sorting order is `å` after `z`, whereas using
178+
an English locale (`en`), `å` is preceded by `a`. This impacts queries with
179+
`SEARCH` expressions like `doc.text < "c"`, excluding `å` when using the Swedish
180+
locale.
181+
182+
ArangoDB 3.12 bundles an upgraded version of the ICU library. It is used for
183+
Unicode character handling including text sorting. Because of changes in ICU,
184+
data produced by the `collation` Analyzer in previous versions is not compatible
185+
with ArangoDB v3.12. You need to **recreate inverted indexes and Views that use
186+
`collation` Analyzers** to ensure that they work correctly. Otherwise,
187+
range queries involving the `collation` Analyzers and indexes created in v3.11
188+
or older versions may behave in unpredicted ways.
189+
190+
Note that sorting by the output of the `collation` Analyzer like
191+
`SORT TOKENS(<text>, <collationAnalyzer>` is still not a supported feature and
192+
doesn't produce meaningful results.
190193

191194
## Control character escaping in audit log
192195

0 commit comments

Comments
 (0)