Conversation
|
GNU testsuite comparison: |
|
Could you please share your example file? I don't get the same results |
|
Uhm it is almost 5 GB large... maybe I can try with a smaller one? What do you suggest? Nevermind, I found it back online... it is one of the models for DeepSeek, those are available here. https://huggingface.co/deepseek-ai/DeepSeek-V3/tree/main Probably a good option is selecting this one: https://huggingface.co/deepseek-ai/DeepSeek-V3/resolve/main/model-00001-of-000163.safetensors?download=true It is roughly the same size |
8f5d7b1 to
c38288b
Compare
|
GNU testsuite comparison: |
|
I re-ran the tests with the file linked above: For encoding: For decoding: The system is also on some load, so it might be slower than usual, but more or less the results stay consistent with what reported before. Please let me know if there is any difference. |
c38288b to
8e0969e
Compare
|
GNU testsuite comparison: |
8e0969e to
44147d1
Compare
|
GNU testsuite comparison: |
44147d1 to
05d7d9f
Compare
|
GNU testsuite comparison: |
05d7d9f to
1854b91
Compare
|
GNU testsuite comparison: |
|
GNU testsuite comparison: |
CodSpeed Performance ReportMerging #8622 will not alter performanceComparing Summary
Footnotes
|
|
GNU testsuite comparison: |
|
GNU testsuite comparison: |
bcd2ec4 to
1854b91
Compare
|
GNU testsuite comparison: |
23bc39f to
1bc46e6
Compare
|
GNU testsuite comparison: |
|
GNU testsuite comparison: |
|
GNU testsuite comparison: |
e9882d5 to
f60b1b9
Compare
88e3b2b to
9bdac31
Compare
|
GNU testsuite comparison: |
9bdac31 to
87978e0
Compare
|
GNU testsuite comparison: |
87978e0 to
76cb7e6
Compare
|
GNU testsuite comparison: |
76cb7e6 to
8fd5dc2
Compare
|
GNU testsuite comparison: |
fe89a32 to
76cb42b
Compare
|
GNU testsuite comparison: |
|
Issues with base58 not supporting streaming are solved by the latest commit. We should provide some information about whether the algorithm supports streaming. I modified the trait responsible for managing the chunk size, so now this is working properly. |
76cb42b to
bbc2458
Compare
|
GNU testsuite comparison: |
bbc2458 to
22f8c10
Compare
|
GNU testsuite comparison: |
22f8c10 to
024c137
Compare
This should remove the dependency we have in knowing whether the final message has padding or not. This is the first step to not have a ahead-of-time loading of the entire message to encode/decode, and allow for streaming. Signed-off-by: Andrea Calabrese <andrea.calabrese@amarulasolutions.com>
As per title, this is the main feature of this patch set. First, by avoiding looking for the final padding, there is the ability to read data streaming in before the stream finished producing them. This also enables the tool to work with much less memory needed, essentially making it a fixed amount instead of tepending by the file size. Signed-off-by: Andrea Calabrese <andrea.calabrese@amarulasolutions.com>
We read linearly, so we do not need to seek within a file Signed-off-by: Andrea Calabrese <andrea.calabrese@amarulasolutions.com>
base58 does not support streaming when encoding. This patch allows base58 and other not-streaming algorithms to work with the new streaming mechanism. Signed-off-by: Andrea Calabrese <andrea.calabrese@amarulasolutions.com>
a74141f to
1ed7176
Compare
Signed-off-by: Andrea Calabrese <andrea.calabrese@amarulasolutions.com>
1ed7176 to
c3c77fd
Compare
|
GNU testsuite comparison: |
|
many jobs are still failing, could you please have a look? thanks |
On the main branch, the encode and decode operations look at the file ahead-of-time to gather information about padding. However, padding only appears at the end, and the rest of the file can be encoded and decoded disregarding the padding.
The main issue with the file being read ahead-of-time is that we need the entire file to be available from the beginning. This is in contrast with a use case that can be streaming data: imagine you have a web socket, the sender sends base64-encoded data, but the receiver can only translate it in the end, making real-time communication impossible.
Moreover, reading the entire file from the beginning means that it needs to stay in RAM the whole time. For smaller files it is not a problem, but when encoding to base64 few gigabytes of file this can be an issue, as it could easily saturate the main memory when reading the file.
This patch is aimed to solve the issue of the ahead-of-time reading. First, we do not check for padding, but let the decoder work for us: as said earlier, most of the encoded file does not have padding, and there is a 1/3 probability that there is no padding in the end. The STANDARD_NO_PAD base64 decoder used produces an error if padding is present; if so, we resort to the STANDARD base64 decoder. This is how the problem of the padding ahead-of-time is solved.
Also, please notice that the encoder does not need any ahead-of-time knowledge of padding, since it is the encoder itself that generates it.
For the benchmarking:
coreutils base64refers to this PR versioncoreutils_main_branch base64refers to the version that is on the main branchbase64refers to GNU Coreutils base64As this is partially also a performance-related patch, I will paste the hyperfine analysis:
For encoding:
For decoding:
For memory consumption, using ps and grep on the 3 implementation variants working on the same file to gather the memory used, I will put the 3 values near each other to compare. I will report the entire line, since it has no sensitive information for me.
This approach is feasible because the memory footprint remains stable during the program execution: after the file is loaded/memory is allocated, there is no more large allocations that take place (except, maybe, inside of the fast_encoder/decoder in the base64_simd crate, which is shown by the flamegraph tool (I used
flamegraph, which also generates an svg to explore) (image at the end of this PR).For encoding:
For decoding:
the issue we still have is that memory usage is double with respect to the GNU Coreutils implementation, but it also does not increase with the size of the file.
Malloc inside base64_simd:
