DOC-628 | Extend and consolidate UPSERT limitations (#352)

Simran-B · web-flow · commit 32a92f05aa09 · 2023-11-27T18:58:28.000+11:00
* Extend and consolidate UPSERT limitations

* Jan's feedback
diff --git a/site/content/3.10/aql/high-level-operations/upsert.md b/site/content/3.10/aql/high-level-operations/upsert.md
@@ -203,35 +203,42 @@ UPDATE { logins: OLD.logins + 1 } IN users
 RETURN { doc: NEW, type: OLD ? 'update' : 'insert' }
 ```
 
-## Transactionality
-
-On a single server, upserts are executed transactionally in an all-or-nothing
-fashion.
-
-If the RocksDB engine is used and intermediate commits are enabled, a query may
-execute intermediate transaction commits in case the running transaction (AQL
-query) hits the specified size thresholds. In this case, the query's operations
-carried out so far will be committed and not rolled back in case of a later
-abort/rollback. That behavior can be controlled by adjusting the intermediate
-commit settings for the RocksDB engine.
-
-For sharded collections, the entire query and/or upsert operation may not be
-transactional, especially if it involves different shards and/or DB-Servers.
-
-## Limitations
+## Transactionality and Limitations
+
+- On a single server, upserts are generally executed transactionally in an
+  all-or-nothing fashion.
+
+  For sharded collections in cluster deployments, the entire query and/or upsert
+  operation may not be transactional, especially if it involves different shards,
+  DB-Servers, or both.
+
+- Queries may execute intermediate transaction commits in case the running
+  transaction (AQL query) hits the specified size thresholds. This writes the
+  data that has been modified so far and it is not rolled back in case of a later
+  abort/rollback of the transaction.
+  
+  Such  **intermediate commits** can occur for `UPSERT` operations over all
+  documents of a large collection, for instance. This has the side-effect that
+  atomicity of this operation cannot be guaranteed anymore and ArangoDB cannot
+  guarantee that "read your own writes" in upserts work.
+
+  This is only an issue if you write a query where your search condition would
+  hit the same document multiple times, and only if you have large transactions.
+  You can adjust the behavior of the RocksDB storage engine by increasing the
+  `intermediateCommit` thresholds for data size and operation counts.
 
 - The lookup and the insert/update/replace parts are executed one after
   another, so that other operations in other threads can happen in
-  between. This means if multiple UPSERT queries run concurrently, they
+  between. This means if multiple `UPSERT` queries run concurrently, they
   may all determine that the target document does not exist and then
   create it multiple times!
 
   Note that due to this gap between the lookup and insert/update/replace,
-  even with a unique index there may be duplicate key errors or conflicts.
+  even with a unique index, duplicate key errors or conflicts can occur.
   But if they occur, the application/client code can execute the same query
   again.
 
-  To prevent this from happening, one should add a unique index to the lookup
+  To prevent this from happening, you should add a unique index to the lookup
   attribute(s). Note that in the cluster a unique index can only be created if
   it is equal to the shard key attribute of the collection or at least contains
   it as a part.
@@ -240,18 +247,20 @@ transactional, especially if it involves different shards and/or DB-Servers.
   `exclusive` option to limit write concurrency for this collection to 1, which
   helps avoiding conflicts but is bad for throughput!
 
-- Using very large transactions in an UPSERT (e.g. UPSERT over all documents in
-  a collection) an **intermediate commit** can be triggered. This intermediate
-  commit will write the data that has been modified so far. However this will
-  have the side-effect that atomicity of this operation cannot be guaranteed
-  anymore and that ArangoDB cannot guarantee to that read your own writes in
-  upsert will work.
+- `UPSERT` operations do not observe their own writes correctly in cluster
+  deployments. They only do for OneShard databases with the `cluster-one-shard`
+  optimizer rule active.
+
+  If upserts in a query create new documents and would then semantically hit the
+  same documents again, the operation may incorrectly use the `INSERT` branch to
+  create more documents instead of the `UPDATE`/`REPLACE` branch to update the
+  previously created documents.
 
-  This will only be an issue if you write a query where your search condition
-  would hit the same document multiple times, and only if you have large
-  transactions. In order to avoid this issues you can increase the
-  `intermediateCommit` thresholds for data and operation counts.
+  If upserts find existing documents for updating/replacing, you can access the
+  current document via the `OLD` pseudo-variable, but this may hold the initial
+  version of the document from before the query even if it has been modified
+  by `UPSERT` in the meantime.
 
 - The lookup attribute(s) from the search expression should be indexed in order
-  to improve UPSERT performance. Ideally, the search expression contains the
+  to improve the `UPSERT` performance. Ideally, the search expression contains the
   shard key, as this allows the lookup to be restricted to a single shard.
diff --git a/site/content/3.11/aql/high-level-operations/upsert.md b/site/content/3.11/aql/high-level-operations/upsert.md
@@ -203,35 +203,42 @@ UPDATE { logins: OLD.logins + 1 } IN users
 RETURN { doc: NEW, type: OLD ? 'update' : 'insert' }
 ```
 
-## Transactionality
-
-On a single server, upserts are executed transactionally in an all-or-nothing
-fashion.
-
-If the RocksDB engine is used and intermediate commits are enabled, a query may
-execute intermediate transaction commits in case the running transaction (AQL
-query) hits the specified size thresholds. In this case, the query's operations
-carried out so far will be committed and not rolled back in case of a later
-abort/rollback. That behavior can be controlled by adjusting the intermediate
-commit settings for the RocksDB engine.
-
-For sharded collections, the entire query and/or upsert operation may not be
-transactional, especially if it involves different shards and/or DB-Servers.
-
-## Limitations
+## Transactionality and Limitations
+
+- On a single server, upserts are generally executed transactionally in an
+  all-or-nothing fashion.
+
+  For sharded collections in cluster deployments, the entire query and/or upsert
+  operation may not be transactional, especially if it involves different shards,
+  DB-Servers, or both.
+
+- Queries may execute intermediate transaction commits in case the running
+  transaction (AQL query) hits the specified size thresholds. This writes the
+  data that has been modified so far and it is not rolled back in case of a later
+  abort/rollback of the transaction.
+  
+  Such  **intermediate commits** can occur for `UPSERT` operations over all
+  documents of a large collection, for instance. This has the side-effect that
+  atomicity of this operation cannot be guaranteed anymore and ArangoDB cannot
+  guarantee that "read your own writes" in upserts work.
+
+  This is only an issue if you write a query where your search condition would
+  hit the same document multiple times, and only if you have large transactions.
+  You can adjust the behavior of the RocksDB storage engine by increasing the
+  `intermediateCommit` thresholds for data size and operation counts.
 
 - The lookup and the insert/update/replace parts are executed one after
   another, so that other operations in other threads can happen in
-  between. This means if multiple UPSERT queries run concurrently, they
+  between. This means if multiple `UPSERT` queries run concurrently, they
   may all determine that the target document does not exist and then
   create it multiple times!
 
   Note that due to this gap between the lookup and insert/update/replace,
-  even with a unique index there may be duplicate key errors or conflicts.
+  even with a unique index, duplicate key errors or conflicts can occur.
   But if they occur, the application/client code can execute the same query
   again.
 
-  To prevent this from happening, one should add a unique index to the lookup
+  To prevent this from happening, you should add a unique index to the lookup
   attribute(s). Note that in the cluster a unique index can only be created if
   it is equal to the shard key attribute of the collection or at least contains
   it as a part.
@@ -240,18 +247,20 @@ transactional, especially if it involves different shards and/or DB-Servers.
   `exclusive` option to limit write concurrency for this collection to 1, which
   helps avoiding conflicts but is bad for throughput!
 
-- Using very large transactions in an UPSERT (e.g. UPSERT over all documents in
-  a collection) an **intermediate commit** can be triggered. This intermediate
-  commit will write the data that has been modified so far. However this will
-  have the side-effect that atomicity of this operation cannot be guaranteed
-  anymore and that ArangoDB cannot guarantee to that read your own writes in
-  upsert will work.
+- `UPSERT` operations do not observe their own writes correctly in cluster
+  deployments. They only do for OneShard databases with the `cluster-one-shard`
+  optimizer rule active.
+
+  If upserts in a query create new documents and would then semantically hit the
+  same documents again, the operation may incorrectly use the `INSERT` branch to
+  create more documents instead of the `UPDATE`/`REPLACE` branch to update the
+  previously created documents.
 
-  This will only be an issue if you write a query where your search condition
-  would hit the same document multiple times, and only if you have large
-  transactions. In order to avoid this issues you can increase the
-  `intermediateCommit` thresholds for data and operation counts.
+  If upserts find existing documents for updating/replacing, you can access the
+  current document via the `OLD` pseudo-variable, but this may hold the initial
+  version of the document from before the query even if it has been modified
+  by `UPSERT` in the meantime.
 
 - The lookup attribute(s) from the search expression should be indexed in order
-  to improve UPSERT performance. Ideally, the search expression contains the
+  to improve the `UPSERT` performance. Ideally, the search expression contains the
   shard key, as this allows the lookup to be restricted to a single shard.
diff --git a/site/content/3.12/aql/high-level-operations/upsert.md b/site/content/3.12/aql/high-level-operations/upsert.md
@@ -203,35 +203,42 @@ UPDATE { logins: OLD.logins + 1 } IN users
 RETURN { doc: NEW, type: OLD ? 'update' : 'insert' }
 ```
 
-## Transactionality
-
-On a single server, upserts are executed transactionally in an all-or-nothing
-fashion.
-
-If the RocksDB engine is used and intermediate commits are enabled, a query may
-execute intermediate transaction commits in case the running transaction (AQL
-query) hits the specified size thresholds. In this case, the query's operations
-carried out so far will be committed and not rolled back in case of a later
-abort/rollback. That behavior can be controlled by adjusting the intermediate
-commit settings for the RocksDB engine.
-
-For sharded collections, the entire query and/or upsert operation may not be
-transactional, especially if it involves different shards and/or DB-Servers.
-
-## Limitations
+## Transactionality and Limitations
+
+- On a single server, upserts are generally executed transactionally in an
+  all-or-nothing fashion.
+
+  For sharded collections in cluster deployments, the entire query and/or upsert
+  operation may not be transactional, especially if it involves different shards,
+  DB-Servers, or both.
+
+- Queries may execute intermediate transaction commits in case the running
+  transaction (AQL query) hits the specified size thresholds. This writes the
+  data that has been modified so far and it is not rolled back in case of a later
+  abort/rollback of the transaction.
+  
+  Such  **intermediate commits** can occur for `UPSERT` operations over all
+  documents of a large collection, for instance. This has the side-effect that
+  atomicity of this operation cannot be guaranteed anymore and ArangoDB cannot
+  guarantee that "read your own writes" in upserts work.
+
+  This is only an issue if you write a query where your search condition would
+  hit the same document multiple times, and only if you have large transactions.
+  You can adjust the behavior of the RocksDB storage engine by increasing the
+  `intermediateCommit` thresholds for data size and operation counts.
 
 - The lookup and the insert/update/replace parts are executed one after
   another, so that other operations in other threads can happen in
-  between. This means if multiple UPSERT queries run concurrently, they
+  between. This means if multiple `UPSERT` queries run concurrently, they
   may all determine that the target document does not exist and then
   create it multiple times!
 
   Note that due to this gap between the lookup and insert/update/replace,
-  even with a unique index there may be duplicate key errors or conflicts.
+  even with a unique index, duplicate key errors or conflicts can occur.
   But if they occur, the application/client code can execute the same query
   again.
 
-  To prevent this from happening, one should add a unique index to the lookup
+  To prevent this from happening, you should add a unique index to the lookup
   attribute(s). Note that in the cluster a unique index can only be created if
   it is equal to the shard key attribute of the collection or at least contains
   it as a part.
@@ -240,18 +247,20 @@ transactional, especially if it involves different shards and/or DB-Servers.
   `exclusive` option to limit write concurrency for this collection to 1, which
   helps avoiding conflicts but is bad for throughput!
 
-- Using very large transactions in an UPSERT (e.g. UPSERT over all documents in
-  a collection) an **intermediate commit** can be triggered. This intermediate
-  commit will write the data that has been modified so far. However this will
-  have the side-effect that atomicity of this operation cannot be guaranteed
-  anymore and that ArangoDB cannot guarantee to that read your own writes in
-  upsert will work.
+- `UPSERT` operations do not observe their own writes correctly in cluster
+  deployments. They only do for OneShard databases with the `cluster-one-shard`
+  optimizer rule active.
+
+  If upserts in a query create new documents and would then semantically hit the
+  same documents again, the operation may incorrectly use the `INSERT` branch to
+  create more documents instead of the `UPDATE`/`REPLACE` branch to update the
+  previously created documents.
 
-  This will only be an issue if you write a query where your search condition
-  would hit the same document multiple times, and only if you have large
-  transactions. In order to avoid this issues you can increase the
-  `intermediateCommit` thresholds for data and operation counts.
+  If upserts find existing documents for updating/replacing, you can access the
+  current document via the `OLD` pseudo-variable, but this may hold the initial
+  version of the document from before the query even if it has been modified
+  by `UPSERT` in the meantime.
 
 - The lookup attribute(s) from the search expression should be indexed in order
-  to improve UPSERT performance. Ideally, the search expression contains the
+  to improve the `UPSERT` performance. Ideally, the search expression contains the
   shard key, as this allows the lookup to be restricted to a single shard.