DOC-557 | arangodump improved performance & resource usage limits #295

nerpaula · 2023-10-18T09:00:07Z

Description

Added release notes - DOC-557, DOC-492

Upstream PRs

3.10:
3.11:
3.12:

arangodb-docs-automation · 2023-10-18T09:00:10Z

Deploy Preview Available Via
https://deploy-preview-295--docs-hugo.netlify.app

Simran-B · 2023-10-19T07:36:13Z

site/content/3.12/release-notes/version-3.12/whats-new-in-3-12.md

+new optimized format, database dumps are now created and restored quickly and
+occupy minimal disk space. This major performance boost makes dumps five times


I think "more quickly" would be better, it wasn't horribly slow before.

What are the changes to the format that make the dumps smaller on disk?

What are the changes to the format that make the dumps smaller on disk?

@jsteemann maybe you can clarify this?

I also all for more quickly rather than quickly. The old dump variant wasn't too slow, as Simran already mentioned. Additionally, the new variant can make dump & restore quicker than when using the old variant, it is not a guarantee that we are always 5 times faster than before. So this should be rephrased to "up to several times faster" instead of giving a precise number.

The potential speedup can be achieved by the following factors:

new dump variant, enabled via --use-parallel-dump true (which now is also the default). The new variant uses prefetching and parallelization on the server, so that the server proactively keeps producing more results for arangodump to fetch. So that when an arangodump request comes in, the server can already respond with some ready-to-use result.

optional: make arangodump write multiple output files per collection/shard. This can be enabled by setting the --split-files option to true. This is currently opt-in. The reason is that dumps that are created with this option enabled cannot be restored into previous versions of ArangoDB easily. The file splitting allows better parallelization when writing the results into the output file, which in case of non-split files must be serialized. The serialization can easily become a bottleneck especially when output files are gzip-compressed by arangodump.

dumping the data into velocypack format instead of JSON. By setting the option --dump-vpack, the resulting dump data will be stored in velocypack format, not JSON. The velocypack format is normally more compact than JSON, so by using this option, the output file size can be reduced compared to JSON, even when compression is enabled. It can also lead to faster dumps, because less data needs to be shipped around and written. This is currently experimental and opt-in, for the reasons that only arangorestore from 3.12 or higher will be able to interpret and restore vpack dumps, and because there aren't many other tools than can read vpack data. So from the user side it may be unwanted to produce dumps in a different format that is not that much supported by other tools. But the option should be mentioned for users that want best dump performance and the smallest possible dumps.

compressing the dump data on the server for transfer. By setting the option --compress-transfer to true, dump data can be compressed on the server for faster transfer. This is helpful especially if the network is slow or its capacity is maxed out. It won't make a difference otherwise.

@jsteemann I haven't looked at the dump output but if --dump-vpack is used, can the arangovpack tool be used to convert (parts of) the dump from VPack to JSON, and does this work with reasonable speed and memory usage? If not, then we might want to create a ticket for that.

Unfortunately the arangovpack tool cannot be used for that.
The reason is that arangodump with --dump-vpack produces individual batches of data, which are all valid velocypack, but these batches are all written to the same file, one after the other.
Currently arangovpack will either only handle the velocypack from the first batch of data, or even fail. I am not sure because I didn't test that.
But I am sure we would need to augment arangovpack first. It is probably a good idea to do it anyway, but it hasn't been done yet.

https://arangodb.atlassian.net/browse/FRB-340

site/content/3.12/release-notes/version-3.12/whats-new-in-3-12.md

…DOC-557

Simran-B

No mention yet of the --compress-transfer option. In contrast to the existing --compress-output, the compression takes place on the server-side, reducing the amount of data to transfer over the network but (slightly?) increasing the server load

site/content/3.12/release-notes/version-3.12/whats-new-in-3-12.md

Co-authored-by: Simran <[email protected]>

Simran-B · 2023-11-07T14:54:23Z

Related: https://github.com/arangodb/docs-hugo/pull/329/files#diff-898c8e81bdfb62d15e80dd1c0b0280eb8ed5e6ca874ddb65ed4df7802c06a36aL1274-R1304

…-557

nerpaula added 2 commits October 16, 2023 15:18

add arangodump resource usage limits

5a6070c

improved dump performance

d430a50

nerpaula self-assigned this Oct 18, 2023

nerpaula added this to the 3.12 milestone Oct 18, 2023

nerpaula requested a review from a team October 18, 2023 09:01

Simran-B requested changes Oct 19, 2023

View reviewed changes

nerpaula requested a review from jsteemann October 19, 2023 12:46

Merge branch 'main' into DOC-557

bdd4f2f

jsteemann reviewed Oct 24, 2023

View reviewed changes

site/content/3.12/release-notes/version-3.12/whats-new-in-3-12.md Outdated Show resolved Hide resolved

Merge branch 'main' into DOC-557

3ec87c9

cla-bot bot added the cla-signed label Oct 25, 2023

nerpaula added 2 commits October 25, 2023 16:53

clarifications

c512b04

Merge branch 'DOC-557' of https://github.com/arangodb/docs-hugo into …

9aa9ef3

…DOC-557

nerpaula requested a review from Simran-B October 25, 2023 14:56

Simran-B requested changes Oct 27, 2023

View reviewed changes

nerpaula and others added 3 commits October 30, 2023 12:23

Apply suggestions from code review

fcc67bc

Co-authored-by: Simran <[email protected]>

Apply suggestions from code review

b4046dc

Co-authored-by: Simran <[email protected]>

Merge branch 'main' into DOC-557

543d2e5

Simran-B added 2 commits November 8, 2023 13:24

Merge branch 'main' of https://github.com/arangodb/docs-hugo into DOC…

eaa43c6

…-557

Review

09b5380

Simran-B approved these changes Nov 8, 2023

View reviewed changes

Simran-B merged commit ffc2c4f into main Nov 8, 2023

Simran-B deleted the DOC-557 branch November 8, 2023 15:10

		new optimized format, database dumps are now created and restored quickly and
		occupy minimal disk space. This major performance boost makes dumps five times

DOC-557 | arangodump improved performance & resource usage limits #295

DOC-557 | arangodump improved performance & resource usage limits #295

Uh oh!

Conversation

nerpaula commented Oct 18, 2023

Description

Upstream PRs

Uh oh!

arangodb-docs-automation bot commented Oct 18, 2023

Uh oh!

Simran-B Oct 19, 2023

Choose a reason for hiding this comment

Uh oh!

nerpaula Oct 19, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jsteemann Oct 24, 2023

Choose a reason for hiding this comment

Uh oh!

Simran-B Oct 24, 2023

Choose a reason for hiding this comment

Uh oh!

jsteemann Oct 25, 2023

Choose a reason for hiding this comment

Uh oh!

Simran-B Nov 7, 2023

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Simran-B left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Simran-B commented Nov 7, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

nerpaula Oct 19, 2023 •

edited

Loading