Skip to content

Conversation

@nerpaula
Copy link
Contributor

Description

Added release notes - DOC-557, DOC-492

Upstream PRs

  • 3.10:
  • 3.11:
  • 3.12:

@arangodb-docs-automation
Copy link
Contributor

Deploy Preview Available Via
https://deploy-preview-295--docs-hugo.netlify.app

@nerpaula nerpaula self-assigned this Oct 18, 2023
@nerpaula nerpaula added this to the 3.12 milestone Oct 18, 2023
@nerpaula nerpaula requested a review from a team October 18, 2023 09:01
Comment on lines 233 to 234
new optimized format, database dumps are now created and restored quickly and
occupy minimal disk space. This major performance boost makes dumps five times
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think "more quickly" would be better, it wasn't horribly slow before.

What are the changes to the format that make the dumps smaller on disk?

Copy link
Contributor Author

@nerpaula nerpaula Oct 19, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What are the changes to the format that make the dumps smaller on disk?

@jsteemann maybe you can clarify this?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also all for more quickly rather than quickly. The old dump variant wasn't too slow, as Simran already mentioned. Additionally, the new variant can make dump & restore quicker than when using the old variant, it is not a guarantee that we are always 5 times faster than before. So this should be rephrased to "up to several times faster" instead of giving a precise number.

The potential speedup can be achieved by the following factors:

  • new dump variant, enabled via --use-parallel-dump true (which now is also the default). The new variant uses prefetching and parallelization on the server, so that the server proactively keeps producing more results for arangodump to fetch. So that when an arangodump request comes in, the server can already respond with some ready-to-use result.
  • optional: make arangodump write multiple output files per collection/shard. This can be enabled by setting the --split-files option to true. This is currently opt-in. The reason is that dumps that are created with this option enabled cannot be restored into previous versions of ArangoDB easily. The file splitting allows better parallelization when writing the results into the output file, which in case of non-split files must be serialized. The serialization can easily become a bottleneck especially when output files are gzip-compressed by arangodump.
  • dumping the data into velocypack format instead of JSON. By setting the option --dump-vpack, the resulting dump data will be stored in velocypack format, not JSON. The velocypack format is normally more compact than JSON, so by using this option, the output file size can be reduced compared to JSON, even when compression is enabled. It can also lead to faster dumps, because less data needs to be shipped around and written. This is currently experimental and opt-in, for the reasons that only arangorestore from 3.12 or higher will be able to interpret and restore vpack dumps, and because there aren't many other tools than can read vpack data. So from the user side it may be unwanted to produce dumps in a different format that is not that much supported by other tools. But the option should be mentioned for users that want best dump performance and the smallest possible dumps.
  • compressing the dump data on the server for transfer. By setting the option --compress-transfer to true, dump data can be compressed on the server for faster transfer. This is helpful especially if the network is slow or its capacity is maxed out. It won't make a difference otherwise.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jsteemann I haven't looked at the dump output but if --dump-vpack is used, can the arangovpack tool be used to convert (parts of) the dump from VPack to JSON, and does this work with reasonable speed and memory usage? If not, then we might want to create a ticket for that.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately the arangovpack tool cannot be used for that.
The reason is that arangodump with --dump-vpack produces individual batches of data, which are all valid velocypack, but these batches are all written to the same file, one after the other.
Currently arangovpack will either only handle the velocypack from the first batch of data, or even fail. I am not sure because I didn't test that.
But I am sure we would need to augment arangovpack first. It is probably a good idea to do it anyway, but it hasn't been done yet.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nerpaula nerpaula requested a review from jsteemann October 19, 2023 12:46
@cla-bot cla-bot bot added the cla-signed label Oct 25, 2023
@nerpaula nerpaula requested a review from Simran-B October 25, 2023 14:56
Copy link
Contributor

@Simran-B Simran-B left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No mention yet of the --compress-transfer option. In contrast to the existing --compress-output, the compression takes place on the server-side, reducing the amount of data to transfer over the network but (slightly?) increasing the server load

@Simran-B
Copy link
Contributor

Simran-B commented Nov 7, 2023

@Simran-B Simran-B merged commit ffc2c4f into main Nov 8, 2023
@Simran-B Simran-B deleted the DOC-557 branch November 8, 2023 15:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants