Skip to content

Commit 275db8e

Browse files
authored
BTS-1854 | Add known issue about ICU difference between core and JS (#521)
* String comparisons and sorting order may differ between core (ICU 64) and JavaScript (ICU 73) * Remove collation Analyzer breaking change, partially add the info to the Analyzer docs
1 parent 67adec3 commit 275db8e

File tree

8 files changed

+56
-44
lines changed

8 files changed

+56
-44
lines changed

site/content/3.12/index-and-search/analyzers.md

Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -683,6 +683,18 @@ An Analyzer capable of converting the input into a set of language-specific
683683
tokens. This makes comparisons follow the rules of the respective language,
684684
most notable in range queries against Views.
685685

686+
For example, the Swedish alphabet has 29 letters: `a` to `z` plus `å`, `ä`, and
687+
`ö`, in that order. Using a Swedish locale (like `sv`), the sorting order is
688+
`å` after `z`, whereas using an English locale (like `en`), it is `å` after `a`.
689+
This impacts queries with `SEARCH` expressions like `doc.text < "c"`, excluding
690+
`å` when using a Swedish locale but including it when using an English locale.
691+
692+
{{< info >}}
693+
Sorting by the output of the `collation` Analyzer like
694+
`SORT TOKENS(<text>, <collationAnalyzer>)` is not a supported feature and
695+
doesn't produce meaningful results.
696+
{{< /info >}}
697+
686698
The *properties* allowed for this Analyzer are an object with the following
687699
attributes:
688700

site/content/3.12/release-notes/version-3.12/incompatible-changes-in-3-12.md

Lines changed: 12 additions & 20 deletions
Original file line numberDiff line numberDiff line change
@@ -170,26 +170,18 @@ onward and will be removed in a future version.
170170
You can use [Stream Transactions](../../develop/transactions/stream-transactions.md)
171171
instead in most cases, and in some cases AQL can be sufficient.
172172

173-
## Breaking changes to the `collation` Analyzer
174-
175-
The [`collation` Analyzer](../../index-and-search/analyzers.md#collation) lets
176-
you adhere to the alphabetic order of a language in range queries. For example,
177-
using a Swedish locale (`sv`), the sorting order is `å` after `z`, whereas using
178-
an English locale (`en`), `å` is preceded by `a`. This impacts queries with
179-
`SEARCH` expressions like `doc.text < "c"`, excluding `å` when using the Swedish
180-
locale.
181-
182-
ArangoDB 3.12 bundles an upgraded version of the ICU library. It is used for
183-
Unicode character handling including text sorting. Because of changes in ICU,
184-
data produced by the `collation` Analyzer in previous versions is not compatible
185-
with ArangoDB v3.12. You need to **recreate inverted indexes and Views that use
186-
`collation` Analyzers** to ensure that they work correctly. Otherwise,
187-
range queries involving the `collation` Analyzers and indexes created in v3.11
188-
or older versions may behave in unpredicted ways.
189-
190-
Note that sorting by the output of the `collation` Analyzer like
191-
`SORT TOKENS(<text>, <collationAnalyzer>)` is still not a supported feature and
192-
doesn't produce meaningful results.
173+
## Incompatibilities with Unicode text between core and JavaScript
174+
175+
ArangoDB 3.12 uses the ICU library for Unicode handling in version 64 for its core
176+
(ArangoSearch, AQL, RocksDB) but version 73 in [JavaScript contexts](../../develop/javascript-api/_index.md).
177+
If you compare or sort string values with JavaScript and with the core, the values
178+
may not match between the two or have a different order. This is due to changes
179+
in the Unicode standard and the binary representation of strings for comparisons.
180+
181+
You can be affected if you use JavaScript-based features like Foxx microservices
182+
or user-defined AQL functions (UDFs), compare or sort strings in them, and
183+
Unicode characters for which the standard has changed between the two ICU versions
184+
are involved.
193185

194186
## Control character escaping in audit log
195187

site/content/3.12/release-notes/version-3.12/known-issues-in-3-12.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -46,3 +46,4 @@ Note that this page does not list all open issues.
4646
| **Date Added:** 2024-03-21 <br> **Component:** arangod <br> **Deployment Mode:** All <br> **Description:** When creating an `inverted` index with the `inBackground` option enabled, HTTP API calls like `http://localhost:8529/_api/index?collection=<coll>&withHidden=true` don't return the `isBuilding` and `progress` attributes and the progress of the index building can thus not be observed. <br> **Affected Versions:** 3.10.13, 3.11.7, 3.12.x <br> **Fixed in Versions:** - <br> **Reference:** [BTS-1788](https://arangodb.atlassian.net/browse/BTS-1788) (internal) |
4747
| **Date Added:** 2024-03-28 <br> **Component:** arangod <br> **Deployment Mode:** Cluster <br> **Description:** During startup or upgrade from a previous minor version, Agent nodes crash if the `--cluster.force-one-shard` option is enabled. Workaround: Don't use the `--cluster.force-one-shard` option (or set it to `false`) for Agents. <br> **Affected Versions:** 3.12.0 <br> **Fixed in Versions:** 3.12.1 <br> **Reference:** [BTS-1839](https://arangodb.atlassian.net/browse/BTS-1839) (internal) |
4848
| **Date Added:** 2024-03-28 <br> **Component:** arangod <br> **Deployment Mode:** Cluster <br> **Description:** In a cluster, creating an EnterpriseGraph fails in OneShard databases (created with the option `{"sharding": "single"}`). EnterpriseGraphs can still be created in a single server deployment, if the sharding option was not set to `single` during the database creation. <br> **Affected Versions:** 3.12.x <br> **Fixed in Versions:** - <br> **Reference:** [BTS-1841](https://arangodb.atlassian.net/browse/BTS-1841) (internal) |
49+
| **Date Added:** 2024-04-24 <br> **Component:** arangod <br> **Deployment Mode:** All <br> **Description:** ArangoDB uses the ICU library for Unicode handling in version 64 for its core (ArangoSearch, AQL, RocksDB) but version 73 in [JavaScript contexts](../../develop/javascript-api/_index.md) since v3.12.0. If you compare or sort string values with JavaScript and with the core, the values may not match or have a different order. This is due to changes in the Unicode standard and the binary representation of strings for comparisons. You can be affected if you use JavaScript-based features like Foxx microservices or user-defined AQL functions (UDFs), compare or sort strings in them, and Unicode characters for which the standard has changed between the two ICU versions are involved. <br> **Affected Versions:** 3.12.x <br> **Fixed in Versions:** - <br> **Reference:** [BTS-1854](https://arangodb.atlassian.net/browse/BTS-1854) (internal) |

site/content/3.12/release-notes/version-3.12/whats-new-in-3-12.md

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -716,8 +716,9 @@ full, log entries are written synchronously until the queue has space again.
716716
### V8 and ICU library upgrades
717717

718718
The bundled V8 JavaScript engine has been upgraded from version 7.9.317 to
719-
12.1.165. As part of this upgrade, the bundled Unicode character handling library
720-
ICU has been upgraded as well, from version 64.2 to 73.1.
719+
12.1.165. As part of this upgrade, the Unicode character handling library
720+
ICU has been upgraded as well, from version 64.2 to 73.1 (but only for
721+
JavaScript contexts, see [Incompatible changes in ArangoDB 3.12](incompatible-changes-in-3-12.md#incompatibilities-with-unicode-text-between-core-and-javascript)).
721722

722723
Note that ArangoDB's build of V8 has pointer compression disabled to allow for
723724
more than 4 GB of heap memory.

site/content/3.13/index-and-search/analyzers.md

Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -683,6 +683,18 @@ An Analyzer capable of converting the input into a set of language-specific
683683
tokens. This makes comparisons follow the rules of the respective language,
684684
most notable in range queries against Views.
685685

686+
For example, the Swedish alphabet has 29 letters: `a` to `z` plus `å`, `ä`, and
687+
`ö`, in that order. Using a Swedish locale (like `sv`), the sorting order is
688+
`å` after `z`, whereas using an English locale (like `en`), it is `å` after `a`.
689+
This impacts queries with `SEARCH` expressions like `doc.text < "c"`, excluding
690+
`å` when using a Swedish locale but including it when using an English locale.
691+
692+
{{< info >}}
693+
Sorting by the output of the `collation` Analyzer like
694+
`SORT TOKENS(<text>, <collationAnalyzer>)` is not a supported feature and
695+
doesn't produce meaningful results.
696+
{{< /info >}}
697+
686698
The *properties* allowed for this Analyzer are an object with the following
687699
attributes:
688700

site/content/3.13/release-notes/version-3.12/incompatible-changes-in-3-12.md

Lines changed: 12 additions & 20 deletions
Original file line numberDiff line numberDiff line change
@@ -170,26 +170,18 @@ onward and will be removed in a future version.
170170
You can use [Stream Transactions](../../develop/transactions/stream-transactions.md)
171171
instead in most cases, and in some cases AQL can be sufficient.
172172

173-
## Breaking changes to the `collation` Analyzer
174-
175-
The [`collation` Analyzer](../../index-and-search/analyzers.md#collation) lets
176-
you adhere to the alphabetic order of a language in range queries. For example,
177-
using a Swedish locale (`sv`), the sorting order is `å` after `z`, whereas using
178-
an English locale (`en`), `å` is preceded by `a`. This impacts queries with
179-
`SEARCH` expressions like `doc.text < "c"`, excluding `å` when using the Swedish
180-
locale.
181-
182-
ArangoDB 3.12 bundles an upgraded version of the ICU library. It is used for
183-
Unicode character handling including text sorting. Because of changes in ICU,
184-
data produced by the `collation` Analyzer in previous versions is not compatible
185-
with ArangoDB v3.12. You need to **recreate inverted indexes and Views that use
186-
`collation` Analyzers** to ensure that they work correctly. Otherwise,
187-
range queries involving the `collation` Analyzers and indexes created in v3.11
188-
or older versions may behave in unpredicted ways.
189-
190-
Note that sorting by the output of the `collation` Analyzer like
191-
`SORT TOKENS(<text>, <collationAnalyzer>)` is still not a supported feature and
192-
doesn't produce meaningful results.
173+
## Incompatibilities with Unicode text between core and JavaScript
174+
175+
ArangoDB 3.12 uses the ICU library for Unicode handling in version 64 for its core
176+
(ArangoSearch, AQL, RocksDB) but version 73 in [JavaScript contexts](../../develop/javascript-api/_index.md).
177+
If you compare or sort string values with JavaScript and with the core, the values
178+
may not match between the two or have a different order. This is due to changes
179+
in the Unicode standard and the binary representation of strings for comparisons.
180+
181+
You can be affected if you use JavaScript-based features like Foxx microservices
182+
or user-defined AQL functions (UDFs), compare or sort strings in them, and
183+
Unicode characters for which the standard has changed between the two ICU versions
184+
are involved.
193185

194186
## Control character escaping in audit log
195187

site/content/3.13/release-notes/version-3.12/known-issues-in-3-12.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -46,3 +46,4 @@ Note that this page does not list all open issues.
4646
| **Date Added:** 2024-03-21 <br> **Component:** arangod <br> **Deployment Mode:** All <br> **Description:** When creating an `inverted` index with the `inBackground` option enabled, HTTP API calls like `http://localhost:8529/_api/index?collection=<coll>&withHidden=true` don't return the `isBuilding` and `progress` attributes and the progress of the index building can thus not be observed. <br> **Affected Versions:** 3.10.13, 3.11.7, 3.12.x <br> **Fixed in Versions:** - <br> **Reference:** [BTS-1788](https://arangodb.atlassian.net/browse/BTS-1788) (internal) |
4747
| **Date Added:** 2024-03-28 <br> **Component:** arangod <br> **Deployment Mode:** Cluster <br> **Description:** During startup or upgrade from a previous minor version, Agent nodes crash if the `--cluster.force-one-shard` option is enabled. Workaround: Don't use the `--cluster.force-one-shard` option (or set it to `false`) for Agents. <br> **Affected Versions:** 3.12.0 <br> **Fixed in Versions:** 3.12.1 <br> **Reference:** [BTS-1839](https://arangodb.atlassian.net/browse/BTS-1839) (internal) |
4848
| **Date Added:** 2024-03-28 <br> **Component:** arangod <br> **Deployment Mode:** Cluster <br> **Description:** In a cluster, creating an EnterpriseGraph fails in OneShard databases (created with the option `{"sharding": "single"}`). EnterpriseGraphs can still be created in a single server deployment, if the sharding option was not set to `single` during the database creation. <br> **Affected Versions:** 3.12.x <br> **Fixed in Versions:** - <br> **Reference:** [BTS-1841](https://arangodb.atlassian.net/browse/BTS-1841) (internal) |
49+
| **Date Added:** 2024-04-24 <br> **Component:** arangod <br> **Deployment Mode:** All <br> **Description:** ArangoDB uses the ICU library for Unicode handling in version 64 for its core (ArangoSearch, AQL, RocksDB) but version 73 in [JavaScript contexts](../../develop/javascript-api/_index.md) since v3.12.0. If you compare or sort string values with JavaScript and with the core, the values may not match or have a different order. This is due to changes in the Unicode standard and the binary representation of strings for comparisons. You can be affected if you use JavaScript-based features like Foxx microservices or user-defined AQL functions (UDFs), compare or sort strings in them, and Unicode characters for which the standard has changed between the two ICU versions are involved. <br> **Affected Versions:** 3.12.x <br> **Fixed in Versions:** - <br> **Reference:** [BTS-1854](https://arangodb.atlassian.net/browse/BTS-1854) (internal) |

site/content/3.13/release-notes/version-3.12/whats-new-in-3-12.md

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -716,8 +716,9 @@ full, log entries are written synchronously until the queue has space again.
716716
### V8 and ICU library upgrades
717717

718718
The bundled V8 JavaScript engine has been upgraded from version 7.9.317 to
719-
12.1.165. As part of this upgrade, the bundled Unicode character handling library
720-
ICU has been upgraded as well, from version 64.2 to 73.1.
719+
12.1.165. As part of this upgrade, the Unicode character handling library
720+
ICU has been upgraded as well, from version 64.2 to 73.1 (but only for
721+
JavaScript contexts, see [Incompatible changes in ArangoDB 3.12](incompatible-changes-in-3-12.md#incompatibilities-with-unicode-text-between-core-and-javascript)).
721722

722723
Note that ArangoDB's build of V8 has pointer compression disabled to allow for
723724
more than 4 GB of heap memory.

0 commit comments

Comments
 (0)