Skip to content
72 changes: 72 additions & 0 deletions site/content/3.12/release-notes/version-3.12/whats-new-in-3-12.md
Original file line number Diff line number Diff line change
Expand Up @@ -241,5 +241,77 @@ of outgrowing the maximum number of file descriptors the ArangoDB process
can open. Thus, these options should only be enabled on deployments with a
limited number of collections/shards/indexes.

## Client tools

### arangodump

#### Improved dump performance

ArangoDB 3.12 includes extended parallelization capabilities to work not only
at the collection level, but also at the shard level. In combination with the
new optimized format, database dumps are now created and restored more quickly
and occupy minimal disk space. This major performance boost makes dumps and
restores up to several times faster, which is extremely useful when dealing
with large shards.

The new dump variant can be enabled via `--use-parallel-dump`. The default
value is `true`.

To achieve the best dump performance and the smallest data dumps in terms of
size, you can use the `--dump-vpack` option. The resulting dump data is stored
in velocypack format instead of JSON. The velocypack format is more compact than
JSON, therefore the output file size can be reduced compared to JSON, even
when compression is enabled, and can also lead to faster dumps. Note, however,
that this option is experimental and disabled by default.

Optionally, you can make _arangodump_ write multiple output files per
collection/shard. The file splitting allows better parallelization when
writing the results into the output file, which in case of non-split files
must be serialized.
You can enable it by setting the `--split-files` option to `true`. This option
is disabled by default considering that dumps created with this option enabled
cannot be restored into previous versions of ArangoDB easily.

#### Resource usage limits

The following `arangod` startup options can be used to limit
the resource usage of parallel _arangodump_ invocations:

- `--dump.max-memory-usage`: Maximum memory usage (in bytes) to be
used by the server-side parts of all ongoing _arangodump_ invocations.
This option can be used to limit the amount of memory for prefetching
and keeping results on the server side when _arangodump_ is invoked
with the `--parallel-dump` option. It does not have an effect for
_arangodump_ invocations that did not use the `--parallel-dump` option.
Note that the memory usage limit is not exact and that it can be
slightly exceeded in some situations to guarantee progress.
- -`-dump.max-docs-per-batch`: Maximum number of documents per batch
that can be used in a dump. If an _arangodump_ invocation requests
higher values than configured here, the value is automatically
capped to this value. Will only be followed for _arangodump_ invocations
that use the `--parallel-dump` option.
- `--dump.max-batch-size`: Maximum batch size value (in bytes) that
can be used in a dump. If an _arangodump_ invocation requests larger
batch sizes than configured here, the actual batch sizes is capped
to this value. Will only be followed for _arangodump_ invocations that
use the -`-parallel-dump` option.
- `--dump.max-parallelism`: Maximum parallelism (number of server-side
threads) that can be used in a dump. If an _arangodump_ invocation requests
a higher number of prefetch threads than configured here, the actual
number of server-side prefetch threads is capped to this value.
Will only be followed for _arangodump_ invocations that use the
`--parallel-dump` option.

The following metrics have been added to observe the behavior of parallel
_arangodump_ operations on the server:

- `arangodb_dump_memory_usage`: Current memory usage of all ongoing
_arangodump_ operations on the server.
- `arangodb_dump_ongoing`: Number of currently ongoing _arangodump_
operations on the server.
- `arangodb_dump_threads_blocked_total`: Number of times a server-side
dump thread was blocked because it honored the server-side memory
limit for dumps.

## Internal changes