-
Notifications
You must be signed in to change notification settings - Fork 8
DOC-557 | arangodump improved performance & resource usage limits #295
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Deploy Preview Available Via |
| new optimized format, database dumps are now created and restored quickly and | ||
| occupy minimal disk space. This major performance boost makes dumps five times |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think "more quickly" would be better, it wasn't horribly slow before.
What are the changes to the format that make the dumps smaller on disk?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What are the changes to the format that make the dumps smaller on disk?
@jsteemann maybe you can clarify this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I also all for more quickly rather than quickly. The old dump variant wasn't too slow, as Simran already mentioned. Additionally, the new variant can make dump & restore quicker than when using the old variant, it is not a guarantee that we are always 5 times faster than before. So this should be rephrased to "up to several times faster" instead of giving a precise number.
The potential speedup can be achieved by the following factors:
- new dump variant, enabled via
--use-parallel-dump true(which now is also the default). The new variant uses prefetching and parallelization on the server, so that the server proactively keeps producing more results for arangodump to fetch. So that when an arangodump request comes in, the server can already respond with some ready-to-use result. - optional: make arangodump write multiple output files per collection/shard. This can be enabled by setting the
--split-filesoption totrue. This is currently opt-in. The reason is that dumps that are created with this option enabled cannot be restored into previous versions of ArangoDB easily. The file splitting allows better parallelization when writing the results into the output file, which in case of non-split files must be serialized. The serialization can easily become a bottleneck especially when output files are gzip-compressed by arangodump. - dumping the data into velocypack format instead of JSON. By setting the option
--dump-vpack, the resulting dump data will be stored in velocypack format, not JSON. The velocypack format is normally more compact than JSON, so by using this option, the output file size can be reduced compared to JSON, even when compression is enabled. It can also lead to faster dumps, because less data needs to be shipped around and written. This is currently experimental and opt-in, for the reasons that only arangorestore from 3.12 or higher will be able to interpret and restore vpack dumps, and because there aren't many other tools than can read vpack data. So from the user side it may be unwanted to produce dumps in a different format that is not that much supported by other tools. But the option should be mentioned for users that want best dump performance and the smallest possible dumps. - compressing the dump data on the server for transfer. By setting the option
--compress-transfertotrue, dump data can be compressed on the server for faster transfer. This is helpful especially if the network is slow or its capacity is maxed out. It won't make a difference otherwise.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jsteemann I haven't looked at the dump output but if --dump-vpack is used, can the arangovpack tool be used to convert (parts of) the dump from VPack to JSON, and does this work with reasonable speed and memory usage? If not, then we might want to create a ticket for that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unfortunately the arangovpack tool cannot be used for that.
The reason is that arangodump with --dump-vpack produces individual batches of data, which are all valid velocypack, but these batches are all written to the same file, one after the other.
Currently arangovpack will either only handle the velocypack from the first batch of data, or even fail. I am not sure because I didn't test that.
But I am sure we would need to augment arangovpack first. It is probably a good idea to do it anyway, but it hasn't been done yet.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
site/content/3.12/release-notes/version-3.12/whats-new-in-3-12.md
Outdated
Show resolved
Hide resolved
site/content/3.12/release-notes/version-3.12/whats-new-in-3-12.md
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No mention yet of the --compress-transfer option. In contrast to the existing --compress-output, the compression takes place on the server-side, reducing the amount of data to transfer over the network but (slightly?) increasing the server load
site/content/3.12/release-notes/version-3.12/whats-new-in-3-12.md
Outdated
Show resolved
Hide resolved
site/content/3.12/release-notes/version-3.12/whats-new-in-3-12.md
Outdated
Show resolved
Hide resolved
site/content/3.12/release-notes/version-3.12/whats-new-in-3-12.md
Outdated
Show resolved
Hide resolved
site/content/3.12/release-notes/version-3.12/whats-new-in-3-12.md
Outdated
Show resolved
Hide resolved
site/content/3.12/release-notes/version-3.12/whats-new-in-3-12.md
Outdated
Show resolved
Hide resolved
site/content/3.12/release-notes/version-3.12/whats-new-in-3-12.md
Outdated
Show resolved
Hide resolved
site/content/3.12/release-notes/version-3.12/whats-new-in-3-12.md
Outdated
Show resolved
Hide resolved
site/content/3.12/release-notes/version-3.12/whats-new-in-3-12.md
Outdated
Show resolved
Hide resolved
Co-authored-by: Simran <[email protected]>
Co-authored-by: Simran <[email protected]>
Description
Added release notes - DOC-557, DOC-492
Upstream PRs