From 06ce87853e3da6dfe2b45a8f6b323ff347dd523f Mon Sep 17 00:00:00 2001 From: Simran Spiller Date: Wed, 24 Apr 2024 18:00:42 +0200 Subject: [PATCH 1/2] BTS-1854 | String comparisons and sorting order may differ between core (ICU 64) and JavaScript (ICU 73) --- .../incompatible-changes-in-3-12.md | 18 ++++++++++++++++++ .../version-3.12/known-issues-in-3-12.md | 1 + .../version-3.12/whats-new-in-3-12.md | 5 +++-- .../incompatible-changes-in-3-12.md | 18 ++++++++++++++++++ .../version-3.12/known-issues-in-3-12.md | 1 + .../version-3.12/whats-new-in-3-12.md | 5 +++-- 6 files changed, 44 insertions(+), 4 deletions(-) diff --git a/site/content/3.12/release-notes/version-3.12/incompatible-changes-in-3-12.md b/site/content/3.12/release-notes/version-3.12/incompatible-changes-in-3-12.md index b92182290d..9a4751a0c1 100644 --- a/site/content/3.12/release-notes/version-3.12/incompatible-changes-in-3-12.md +++ b/site/content/3.12/release-notes/version-3.12/incompatible-changes-in-3-12.md @@ -170,6 +170,23 @@ onward and will be removed in a future version. You can use [Stream Transactions](../../develop/transactions/stream-transactions.md) instead in most cases, and in some cases AQL can be sufficient. +## Incompatibilities with Unicode text between core and JavaScript + +ArangoDB 3.12 uses the ICU library for Unicode handling in version 64 for its core +(ArangoSearch, AQL, RocksDB) but version 73 in [JavaScript contexts](../../develop/javascript-api/_index.md). +If you compare or sort string values with JavaScript and with the core, the values +may not match between the two or have a different order. This is due to changes +in the Unicode standard and the binary representation of strings for comparisons. + +You can be affected if you use JavaScript-based features like Foxx microservices +or user-defined AQL functions (UDFs), compare or sort strings in them, and +Unicode characters for which the standard has changed between the two ICU versions +are involved. + +{{< comment >}} +TODO: May become relevant later should we upgrade the core ICU. +If not, we still might want to incorporate some of this into the reference docs. + ## Breaking changes to the `collation` Analyzer The [`collation` Analyzer](../../index-and-search/analyzers.md#collation) lets @@ -190,6 +207,7 @@ or older versions may behave in unpredicted ways. Note that sorting by the output of the `collation` Analyzer like `SORT TOKENS(, )` is still not a supported feature and doesn't produce meaningful results. +{{< /comment >}} ## Control character escaping in audit log diff --git a/site/content/3.12/release-notes/version-3.12/known-issues-in-3-12.md b/site/content/3.12/release-notes/version-3.12/known-issues-in-3-12.md index a0b4fd3486..1857ee5026 100644 --- a/site/content/3.12/release-notes/version-3.12/known-issues-in-3-12.md +++ b/site/content/3.12/release-notes/version-3.12/known-issues-in-3-12.md @@ -46,3 +46,4 @@ Note that this page does not list all open issues. | **Date Added:** 2024-03-21
**Component:** arangod
**Deployment Mode:** All
**Description:** When creating an `inverted` index with the `inBackground` option enabled, HTTP API calls like `http://localhost:8529/_api/index?collection=&withHidden=true` don't return the `isBuilding` and `progress` attributes and the progress of the index building can thus not be observed.
**Affected Versions:** 3.10.13, 3.11.7, 3.12.x
**Fixed in Versions:** -
**Reference:** [BTS-1788](https://arangodb.atlassian.net/browse/BTS-1788) (internal) | | **Date Added:** 2024-03-28
**Component:** arangod
**Deployment Mode:** Cluster
**Description:** During startup or upgrade from a previous minor version, Agent nodes crash if the `--cluster.force-one-shard` option is enabled. Workaround: Don't use the `--cluster.force-one-shard` option (or set it to `false`) for Agents.
**Affected Versions:** 3.12.0
**Fixed in Versions:** 3.12.1
**Reference:** [BTS-1839](https://arangodb.atlassian.net/browse/BTS-1839) (internal) | | **Date Added:** 2024-03-28
**Component:** arangod
**Deployment Mode:** Cluster
**Description:** In a cluster, creating an EnterpriseGraph fails in OneShard databases (created with the option `{"sharding": "single"}`). EnterpriseGraphs can still be created in a single server deployment, if the sharding option was not set to `single` during the database creation.
**Affected Versions:** 3.12.x
**Fixed in Versions:** -
**Reference:** [BTS-1841](https://arangodb.atlassian.net/browse/BTS-1841) (internal) | +| **Date Added:** 2024-04-24
**Component:** arangod
**Deployment Mode:** All
**Description:** ArangoDB uses the ICU library for Unicode handling in version 64 for its core (ArangoSearch, AQL, RocksDB) but version 73 in [JavaScript contexts](../../develop/javascript-api/_index.md) since v3.12.0. If you compare or sort string values with JavaScript and with the core, the values may not match or have a different order. This is due to changes in the Unicode standard and the binary representation of strings for comparisons. You can be affected if you use JavaScript-based features like Foxx microservices or user-defined AQL functions (UDFs), compare or sort strings in them, and Unicode characters for which the standard has changed between the two ICU versions are involved.
**Affected Versions:** 3.12.x
**Fixed in Versions:** -
**Reference:** [BTS-1854](https://arangodb.atlassian.net/browse/BTS-1854) (internal) | diff --git a/site/content/3.12/release-notes/version-3.12/whats-new-in-3-12.md b/site/content/3.12/release-notes/version-3.12/whats-new-in-3-12.md index 323dbbda6e..2528816e13 100644 --- a/site/content/3.12/release-notes/version-3.12/whats-new-in-3-12.md +++ b/site/content/3.12/release-notes/version-3.12/whats-new-in-3-12.md @@ -716,8 +716,9 @@ full, log entries are written synchronously until the queue has space again. ### V8 and ICU library upgrades The bundled V8 JavaScript engine has been upgraded from version 7.9.317 to -12.1.165. As part of this upgrade, the bundled Unicode character handling library -ICU has been upgraded as well, from version 64.2 to 73.1. +12.1.165. As part of this upgrade, the Unicode character handling library +ICU has been upgraded as well, from version 64.2 to 73.1 (but only for +JavaScript contexts, see [Incompatible changes in ArangoDB 3.12](incompatible-changes-in-3-12.md#incompatibilities-with-unicode-text-between-core-and-javascript)). Note that ArangoDB's build of V8 has pointer compression disabled to allow for more than 4 GB of heap memory. diff --git a/site/content/3.13/release-notes/version-3.12/incompatible-changes-in-3-12.md b/site/content/3.13/release-notes/version-3.12/incompatible-changes-in-3-12.md index b92182290d..9a4751a0c1 100644 --- a/site/content/3.13/release-notes/version-3.12/incompatible-changes-in-3-12.md +++ b/site/content/3.13/release-notes/version-3.12/incompatible-changes-in-3-12.md @@ -170,6 +170,23 @@ onward and will be removed in a future version. You can use [Stream Transactions](../../develop/transactions/stream-transactions.md) instead in most cases, and in some cases AQL can be sufficient. +## Incompatibilities with Unicode text between core and JavaScript + +ArangoDB 3.12 uses the ICU library for Unicode handling in version 64 for its core +(ArangoSearch, AQL, RocksDB) but version 73 in [JavaScript contexts](../../develop/javascript-api/_index.md). +If you compare or sort string values with JavaScript and with the core, the values +may not match between the two or have a different order. This is due to changes +in the Unicode standard and the binary representation of strings for comparisons. + +You can be affected if you use JavaScript-based features like Foxx microservices +or user-defined AQL functions (UDFs), compare or sort strings in them, and +Unicode characters for which the standard has changed between the two ICU versions +are involved. + +{{< comment >}} +TODO: May become relevant later should we upgrade the core ICU. +If not, we still might want to incorporate some of this into the reference docs. + ## Breaking changes to the `collation` Analyzer The [`collation` Analyzer](../../index-and-search/analyzers.md#collation) lets @@ -190,6 +207,7 @@ or older versions may behave in unpredicted ways. Note that sorting by the output of the `collation` Analyzer like `SORT TOKENS(, )` is still not a supported feature and doesn't produce meaningful results. +{{< /comment >}} ## Control character escaping in audit log diff --git a/site/content/3.13/release-notes/version-3.12/known-issues-in-3-12.md b/site/content/3.13/release-notes/version-3.12/known-issues-in-3-12.md index a0b4fd3486..1857ee5026 100644 --- a/site/content/3.13/release-notes/version-3.12/known-issues-in-3-12.md +++ b/site/content/3.13/release-notes/version-3.12/known-issues-in-3-12.md @@ -46,3 +46,4 @@ Note that this page does not list all open issues. | **Date Added:** 2024-03-21
**Component:** arangod
**Deployment Mode:** All
**Description:** When creating an `inverted` index with the `inBackground` option enabled, HTTP API calls like `http://localhost:8529/_api/index?collection=&withHidden=true` don't return the `isBuilding` and `progress` attributes and the progress of the index building can thus not be observed.
**Affected Versions:** 3.10.13, 3.11.7, 3.12.x
**Fixed in Versions:** -
**Reference:** [BTS-1788](https://arangodb.atlassian.net/browse/BTS-1788) (internal) | | **Date Added:** 2024-03-28
**Component:** arangod
**Deployment Mode:** Cluster
**Description:** During startup or upgrade from a previous minor version, Agent nodes crash if the `--cluster.force-one-shard` option is enabled. Workaround: Don't use the `--cluster.force-one-shard` option (or set it to `false`) for Agents.
**Affected Versions:** 3.12.0
**Fixed in Versions:** 3.12.1
**Reference:** [BTS-1839](https://arangodb.atlassian.net/browse/BTS-1839) (internal) | | **Date Added:** 2024-03-28
**Component:** arangod
**Deployment Mode:** Cluster
**Description:** In a cluster, creating an EnterpriseGraph fails in OneShard databases (created with the option `{"sharding": "single"}`). EnterpriseGraphs can still be created in a single server deployment, if the sharding option was not set to `single` during the database creation.
**Affected Versions:** 3.12.x
**Fixed in Versions:** -
**Reference:** [BTS-1841](https://arangodb.atlassian.net/browse/BTS-1841) (internal) | +| **Date Added:** 2024-04-24
**Component:** arangod
**Deployment Mode:** All
**Description:** ArangoDB uses the ICU library for Unicode handling in version 64 for its core (ArangoSearch, AQL, RocksDB) but version 73 in [JavaScript contexts](../../develop/javascript-api/_index.md) since v3.12.0. If you compare or sort string values with JavaScript and with the core, the values may not match or have a different order. This is due to changes in the Unicode standard and the binary representation of strings for comparisons. You can be affected if you use JavaScript-based features like Foxx microservices or user-defined AQL functions (UDFs), compare or sort strings in them, and Unicode characters for which the standard has changed between the two ICU versions are involved.
**Affected Versions:** 3.12.x
**Fixed in Versions:** -
**Reference:** [BTS-1854](https://arangodb.atlassian.net/browse/BTS-1854) (internal) | diff --git a/site/content/3.13/release-notes/version-3.12/whats-new-in-3-12.md b/site/content/3.13/release-notes/version-3.12/whats-new-in-3-12.md index 323dbbda6e..2528816e13 100644 --- a/site/content/3.13/release-notes/version-3.12/whats-new-in-3-12.md +++ b/site/content/3.13/release-notes/version-3.12/whats-new-in-3-12.md @@ -716,8 +716,9 @@ full, log entries are written synchronously until the queue has space again. ### V8 and ICU library upgrades The bundled V8 JavaScript engine has been upgraded from version 7.9.317 to -12.1.165. As part of this upgrade, the bundled Unicode character handling library -ICU has been upgraded as well, from version 64.2 to 73.1. +12.1.165. As part of this upgrade, the Unicode character handling library +ICU has been upgraded as well, from version 64.2 to 73.1 (but only for +JavaScript contexts, see [Incompatible changes in ArangoDB 3.12](incompatible-changes-in-3-12.md#incompatibilities-with-unicode-text-between-core-and-javascript)). Note that ArangoDB's build of V8 has pointer compression disabled to allow for more than 4 GB of heap memory. From 400d391b5195a13051dceb1252e509d3f131febc Mon Sep 17 00:00:00 2001 From: Simran Spiller Date: Fri, 3 May 2024 16:02:58 +0200 Subject: [PATCH 2/2] Remove collation Analyzer breaking change, partially add the info to the Analyzer docs --- .../3.12/index-and-search/analyzers.md | 12 +++++++++ .../incompatible-changes-in-3-12.md | 26 ------------------- .../3.13/index-and-search/analyzers.md | 12 +++++++++ .../incompatible-changes-in-3-12.md | 26 ------------------- 4 files changed, 24 insertions(+), 52 deletions(-) diff --git a/site/content/3.12/index-and-search/analyzers.md b/site/content/3.12/index-and-search/analyzers.md index 9d0364f87c..75f15d543c 100644 --- a/site/content/3.12/index-and-search/analyzers.md +++ b/site/content/3.12/index-and-search/analyzers.md @@ -683,6 +683,18 @@ An Analyzer capable of converting the input into a set of language-specific tokens. This makes comparisons follow the rules of the respective language, most notable in range queries against Views. +For example, the Swedish alphabet has 29 letters: `a` to `z` plus `å`, `ä`, and +`ö`, in that order. Using a Swedish locale (like `sv`), the sorting order is +`å` after `z`, whereas using an English locale (like `en`), it is `å` after `a`. +This impacts queries with `SEARCH` expressions like `doc.text < "c"`, excluding +`å` when using a Swedish locale but including it when using an English locale. + +{{< info >}} +Sorting by the output of the `collation` Analyzer like +`SORT TOKENS(, )` is not a supported feature and +doesn't produce meaningful results. +{{< /info >}} + The *properties* allowed for this Analyzer are an object with the following attributes: diff --git a/site/content/3.12/release-notes/version-3.12/incompatible-changes-in-3-12.md b/site/content/3.12/release-notes/version-3.12/incompatible-changes-in-3-12.md index 9a4751a0c1..8e2e7c3035 100644 --- a/site/content/3.12/release-notes/version-3.12/incompatible-changes-in-3-12.md +++ b/site/content/3.12/release-notes/version-3.12/incompatible-changes-in-3-12.md @@ -183,32 +183,6 @@ or user-defined AQL functions (UDFs), compare or sort strings in them, and Unicode characters for which the standard has changed between the two ICU versions are involved. -{{< comment >}} -TODO: May become relevant later should we upgrade the core ICU. -If not, we still might want to incorporate some of this into the reference docs. - -## Breaking changes to the `collation` Analyzer - -The [`collation` Analyzer](../../index-and-search/analyzers.md#collation) lets -you adhere to the alphabetic order of a language in range queries. For example, -using a Swedish locale (`sv`), the sorting order is `å` after `z`, whereas using -an English locale (`en`), `å` is preceded by `a`. This impacts queries with -`SEARCH` expressions like `doc.text < "c"`, excluding `å` when using the Swedish -locale. - -ArangoDB 3.12 bundles an upgraded version of the ICU library. It is used for -Unicode character handling including text sorting. Because of changes in ICU, -data produced by the `collation` Analyzer in previous versions is not compatible -with ArangoDB v3.12. You need to **recreate inverted indexes and Views that use -`collation` Analyzers** to ensure that they work correctly. Otherwise, -range queries involving the `collation` Analyzers and indexes created in v3.11 -or older versions may behave in unpredicted ways. - -Note that sorting by the output of the `collation` Analyzer like -`SORT TOKENS(, )` is still not a supported feature and -doesn't produce meaningful results. -{{< /comment >}} - ## Control character escaping in audit log The audit log feature of the Enterprise Edition previously logged query strings diff --git a/site/content/3.13/index-and-search/analyzers.md b/site/content/3.13/index-and-search/analyzers.md index 51723ded13..77a6082208 100644 --- a/site/content/3.13/index-and-search/analyzers.md +++ b/site/content/3.13/index-and-search/analyzers.md @@ -683,6 +683,18 @@ An Analyzer capable of converting the input into a set of language-specific tokens. This makes comparisons follow the rules of the respective language, most notable in range queries against Views. +For example, the Swedish alphabet has 29 letters: `a` to `z` plus `å`, `ä`, and +`ö`, in that order. Using a Swedish locale (like `sv`), the sorting order is +`å` after `z`, whereas using an English locale (like `en`), it is `å` after `a`. +This impacts queries with `SEARCH` expressions like `doc.text < "c"`, excluding +`å` when using a Swedish locale but including it when using an English locale. + +{{< info >}} +Sorting by the output of the `collation` Analyzer like +`SORT TOKENS(, )` is not a supported feature and +doesn't produce meaningful results. +{{< /info >}} + The *properties* allowed for this Analyzer are an object with the following attributes: diff --git a/site/content/3.13/release-notes/version-3.12/incompatible-changes-in-3-12.md b/site/content/3.13/release-notes/version-3.12/incompatible-changes-in-3-12.md index 9a4751a0c1..8e2e7c3035 100644 --- a/site/content/3.13/release-notes/version-3.12/incompatible-changes-in-3-12.md +++ b/site/content/3.13/release-notes/version-3.12/incompatible-changes-in-3-12.md @@ -183,32 +183,6 @@ or user-defined AQL functions (UDFs), compare or sort strings in them, and Unicode characters for which the standard has changed between the two ICU versions are involved. -{{< comment >}} -TODO: May become relevant later should we upgrade the core ICU. -If not, we still might want to incorporate some of this into the reference docs. - -## Breaking changes to the `collation` Analyzer - -The [`collation` Analyzer](../../index-and-search/analyzers.md#collation) lets -you adhere to the alphabetic order of a language in range queries. For example, -using a Swedish locale (`sv`), the sorting order is `å` after `z`, whereas using -an English locale (`en`), `å` is preceded by `a`. This impacts queries with -`SEARCH` expressions like `doc.text < "c"`, excluding `å` when using the Swedish -locale. - -ArangoDB 3.12 bundles an upgraded version of the ICU library. It is used for -Unicode character handling including text sorting. Because of changes in ICU, -data produced by the `collation` Analyzer in previous versions is not compatible -with ArangoDB v3.12. You need to **recreate inverted indexes and Views that use -`collation` Analyzers** to ensure that they work correctly. Otherwise, -range queries involving the `collation` Analyzers and indexes created in v3.11 -or older versions may behave in unpredicted ways. - -Note that sorting by the output of the `collation` Analyzer like -`SORT TOKENS(, )` is still not a supported feature and -doesn't produce meaningful results. -{{< /comment >}} - ## Control character escaping in audit log The audit log feature of the Enterprise Edition previously logged query strings