@@ -232,7 +232,7 @@ can be mixed and written into the same .sst files.
232232
233233When these options are enabled, the RocksDB compaction is more efficient since
234234a lot of different collections/shards/indexes are written to in parallel.
235- The disavantage of enabling these options is that there can be more .sst
235+ The disadvantage of enabling these options is that there can be more .sst
236236files than when the option is turned off, and the disk space used by
237237these .sst files can be higher.
238238In particular, on deployments with many collections/shards/indexes
@@ -241,5 +241,84 @@ of outgrowing the maximum number of file descriptors the ArangoDB process
241241can open. Thus, these options should only be enabled on deployments with a
242242limited number of collections/shards/indexes.
243243
244+ ## Client tools
245+
246+ ### arangodump
247+
248+ #### Improved dump performance and size
249+
250+ From version 3.12 onward, _ arangodump_ has extended parallelization capabilities
251+ to work not only at the collection level, but also at the shard level.
252+ In combination with the newly added support for the VelocyPack format that
253+ ArangoDB uses internally, database dumps can now be created and restored more
254+ quickly and occupy less disk space. This major performance boost makes dumps and
255+ restores up to several times faster, which is extremely useful when dealing
256+ with large shards.
257+
258+ - Whether the new parallel dump variant is used is controlled by the newly added
259+ ` --use-parallel-dump ` startup option. The default value is ` true ` .
260+
261+ - To achieve the best dump performance and the smallest data dumps in terms of
262+ size, you can additionally use the ` --dump-vpack ` option. The resulting dump data
263+ is then stored in the more compact but binary VelocyPack format instead of the
264+ text-based JSON format. The output file size can be less even compared to
265+ compressed JSON. It can also lead to faster dumps because there is less data to
266+ transfer and no conversion from the server-internal format (VelocyPack) to JSON
267+ is needed. Note, however, that this option is ** experimental** and disabled by
268+ default.
269+
270+ - Optionally, you can make _ arangodump_ write multiple output files per
271+ collection/shard. The file splitting allows for better parallelization when
272+ writing the results to disk, which in case of non-split files must be serialized.
273+ You can enable it by setting the ` --split-files ` option to ` true ` . This option
274+ is disabled by default because dumps created with this option enabled cannot
275+ be restored into previous versions of ArangoDB.
276+
277+ - You can enable the new ` --compress-transfer ` startup option for compressing the
278+ dump data on the server for a faster transfer. This is helpful especially if
279+ the network is slow or its capacity is maxed out. The data is decompressed on
280+ the client side and recompressed if you enable the ` --compress-output ` option.
281+
282+ #### Resource usage limits and metrics
283+
284+ The following ` arangod ` startup options can be used to limit
285+ the resource usage of parallel _ arangodump_ invocations:
286+
287+ - ` --dump.max-memory-usage ` : Maximum memory usage (in bytes) to be
288+ used by the server-side parts of all ongoing _ arangodump_ invocations.
289+ This option can be used to limit the amount of memory for prefetching
290+ and keeping results on the server side when _ arangodump_ is invoked
291+ with the ` --parallel-dump ` option. It does not have an effect for
292+ _ arangodump_ invocations that did not use the ` --parallel-dump ` option.
293+ Note that the memory usage limit is not exact and that it can be
294+ slightly exceeded in some situations to guarantee progress.
295+ - -` -dump.max-docs-per-batch ` : Maximum number of documents per batch
296+ that can be used in a dump. If an _ arangodump_ invocation requests
297+ higher values than configured here, the value is automatically
298+ capped to this value. Will only be followed for _ arangodump_ invocations
299+ that use the ` --parallel-dump ` option.
300+ - ` --dump.max-batch-size ` : Maximum batch size value (in bytes) that
301+ can be used in a dump. If an _ arangodump_ invocation requests larger
302+ batch sizes than configured here, the actual batch sizes is capped
303+ to this value. Will only be followed for _ arangodump_ invocations that
304+ use the -` -parallel-dump ` option.
305+ - ` --dump.max-parallelism ` : Maximum parallelism (number of server-side
306+ threads) that can be used in a dump. If an _ arangodump_ invocation requests
307+ a higher number of prefetch threads than configured here, the actual
308+ number of server-side prefetch threads is capped to this value.
309+ Will only be followed for _ arangodump_ invocations that use the
310+ ` --parallel-dump ` option.
311+
312+ The following metrics have been added to observe the behavior of parallel
313+ _ arangodump_ operations on the server:
314+
315+ - ` arangodb_dump_memory_usage ` : Current memory usage of all ongoing
316+ _ arangodump_ operations on the server.
317+ - ` arangodb_dump_ongoing ` : Number of currently ongoing _ arangodump_
318+ operations on the server.
319+ - ` arangodb_dump_threads_blocked_total ` : Number of times a server-side
320+ dump thread was blocked because it honored the server-side memory
321+ limit for dumps.
322+
244323## Internal changes
245324
0 commit comments