Skip to content

Streaming base64 encode/decode#8622

Open
ThePseudo wants to merge 5 commits intouutils:mainfrom
ThePseudo:streamline_b64_decode
Open

Streaming base64 encode/decode#8622
ThePseudo wants to merge 5 commits intouutils:mainfrom
ThePseudo:streamline_b64_decode

Conversation

@ThePseudo
Copy link

@ThePseudo ThePseudo commented Sep 12, 2025

On the main branch, the encode and decode operations look at the file ahead-of-time to gather information about padding. However, padding only appears at the end, and the rest of the file can be encoded and decoded disregarding the padding.

The main issue with the file being read ahead-of-time is that we need the entire file to be available from the beginning. This is in contrast with a use case that can be streaming data: imagine you have a web socket, the sender sends base64-encoded data, but the receiver can only translate it in the end, making real-time communication impossible.

Moreover, reading the entire file from the beginning means that it needs to stay in RAM the whole time. For smaller files it is not a problem, but when encoding to base64 few gigabytes of file this can be an issue, as it could easily saturate the main memory when reading the file.

This patch is aimed to solve the issue of the ahead-of-time reading. First, we do not check for padding, but let the decoder work for us: as said earlier, most of the encoded file does not have padding, and there is a 1/3 probability that there is no padding in the end. The STANDARD_NO_PAD base64 decoder used produces an error if padding is present; if so, we resort to the STANDARD base64 decoder. This is how the problem of the padding ahead-of-time is solved.

Also, please notice that the encoder does not need any ahead-of-time knowledge of padding, since it is the encoder itself that generates it.

For the benchmarking:
coreutils base64 refers to this PR version
coreutils_main_branch base64 refers to the version that is on the main branch
base64 refers to GNU Coreutils base64

As this is partially also a performance-related patch, I will paste the hyperfine analysis:

For encoding:

Benchmark 1: ./coreutils base64 model-00001-of-000163.safetensors
  Time (mean ± σ):      2.423 s ±  0.039 s    [User: 0.997 s, System: 1.424 s]
  Range (min … max):    2.393 s …  2.524 s    10 runs
 
Benchmark 2: ./coreutils_main_branch base64 model-00001-of-000163.safetensors
  Time (mean ± σ):      4.111 s ±  0.035 s    [User: 1.172 s, System: 2.937 s]
  Range (min … max):    4.052 s …  4.158 s    10 runs
 
Benchmark 3: base64 model-00001-of-000163.safetensors
  Time (mean ± σ):      4.000 s ±  0.016 s    [User: 3.054 s, System: 0.941 s]
  Range (min … max):    3.976 s …  4.033 s    10 runs
 
Summary
  ./coreutils base64 model-00001-of-000163.safetensors ran
    1.65 ± 0.03 times faster than base64 model-00001-of-000163.safetensors
    1.70 ± 0.03 times faster than ./coreutils_main_branch base64 model-00001-of-000163.safetensors

For decoding:

Benchmark 1: ./coreutils base64 -d base64.txt
  Time (mean ± σ):      9.442 s ±  0.060 s    [User: 7.622 s, System: 1.814 s]
  Range (min … max):    9.373 s …  9.580 s    10 runs
 
Benchmark 2: ./coreutils_main_branch base64 -d base64.txt
  Time (mean ± σ):      9.504 s ±  0.201 s    [User: 5.766 s, System: 3.727 s]
  Range (min … max):    9.309 s …  9.882 s    10 runs
 
Benchmark 3: base64 -d base64.txt
  Time (mean ± σ):      8.362 s ±  0.140 s    [User: 6.750 s, System: 1.605 s]
  Range (min … max):    8.155 s …  8.527 s    10 runs
 
Summary
  base64 -d base64.txt ran
    1.13 ± 0.02 times faster than ./coreutils base64 -d base64.txt
    1.14 ± 0.03 times faster than ./coreutils_main_branch base64 -d base64.txt

For memory consumption, using ps and grep on the 3 implementation variants working on the same file to gather the memory used, I will put the 3 values near each other to compare. I will report the entire line, since it has no sensitive information for me.

This approach is feasible because the memory footprint remains stable during the program execution: after the file is loaded/memory is allocated, there is no more large allocations that take place (except, maybe, inside of the fast_encoder/decoder in the base64_simd crate, which is shown by the flamegraph tool (I used flamegraph, which also generates an svg to explore) (image at the end of this PR).

For encoding:

andrea    167746  100  0.0  15880  6616 pts/6    R+   10:08   0:01 ./coreutils base64 model-00001-of-000163.safetensors
andrea    168813  102  6.1 5127348 1894336 pts/6 R+   10:10   0:00 ./coreutils_main_branch base64 model-00001-of-000163.safetensors
andrea    169415  100  0.0   8392  2272 pts/6    R+   10:11   0:02 base64 model-00001-of-000163.safetensors

For decoding:

andrea    164864  100  0.0  15876  6288 pts/6    R+   10:01   0:01 ./coreutils base64 -d base64.txt
andrea    165735  125  0.7 6920844 233384 pts/6  R+   10:03   0:00 ./coreutils_main_branch base64 -d base64.txt
andrea    166374  100  0.0   8388  2208 pts/6    R+   10:05   0:03 base64 -d base64.txt

the issue we still have is that memory usage is double with respect to the GNU Coreutils implementation, but it also does not increase with the size of the file.

Malloc inside base64_simd:
image

@ThePseudo ThePseudo marked this pull request as ready for review September 12, 2025 08:17
@github-actions
Copy link

GNU testsuite comparison:

Skip an intermittent issue tests/timeout/timeout (fails in this run but passes in the 'main' branch)

@sylvestre
Copy link
Contributor

Could you please share your example file? I don't get the same results

@ThePseudo
Copy link
Author

ThePseudo commented Sep 12, 2025

Uhm it is almost 5 GB large... maybe I can try with a smaller one? What do you suggest?

Nevermind, I found it back online... it is one of the models for DeepSeek, those are available here. https://huggingface.co/deepseek-ai/DeepSeek-V3/tree/main

Probably a good option is selecting this one: https://huggingface.co/deepseek-ai/DeepSeek-V3/resolve/main/model-00001-of-000163.safetensors?download=true

It is roughly the same size

@ThePseudo ThePseudo force-pushed the streamline_b64_decode branch from 8f5d7b1 to c38288b Compare September 15, 2025 07:18
@github-actions
Copy link

GNU testsuite comparison:

Skip an intermittent issue tests/misc/stdbuf (fails in this run but passes in the 'main' branch)
Skip an intermittent issue tests/timeout/timeout (fails in this run but passes in the 'main' branch)

@ThePseudo
Copy link
Author

I re-ran the tests with the file linked above:

For encoding:

Benchmark 1: ./coreutils base64 model-00001-of-000163.safetensors
  Time (mean ± σ):      2.152 s ±  0.066 s    [User: 0.952 s, System: 1.199 s]
  Range (min … max):    2.092 s …  2.301 s    10 runs
 
Benchmark 2: ./coreutils_main_branch base64 model-00001-of-000163.safetensors
  Time (mean ± σ):      3.759 s ±  0.119 s    [User: 1.140 s, System: 2.619 s]
  Range (min … max):    3.616 s …  3.976 s    10 runs
 
Benchmark 3: base64 model-00001-of-000163.safetensors
  Time (mean ± σ):      3.723 s ±  0.032 s    [User: 3.044 s, System: 0.679 s]
  Range (min … max):    3.687 s …  3.783 s    10 runs
 
Summary
  ./coreutils base64 model-00001-of-000163.safetensors ran
    1.73 ± 0.05 times faster than base64 model-00001-of-000163.safetensors
    1.75 ± 0.08 times faster than ./coreutils_main_branch base64 model-00001-of-000163.safetensors

For decoding:

Benchmark 1: ./coreutils base64 -d base64.txt
  Time (mean ± σ):      9.167 s ±  0.101 s    [User: 7.637 s, System: 1.499 s]
  Range (min … max):    9.063 s …  9.347 s    10 runs
 
Benchmark 2: ./coreutils_main_branch base64 -d base64.txt
  Time (mean ± σ):      9.329 s ±  0.020 s    [User: 5.620 s, System: 3.669 s]
  Range (min … max):    9.301 s …  9.380 s    10 runs
 
  Warning: Statistical outliers were detected. Consider re-running this benchmark on a quiet system without any interferences from other programs. It might help to use the '--warmup' or '--prepare' options.
 
Benchmark 3: base64 -d base64.txt
  Time (mean ± σ):      8.038 s ±  0.037 s    [User: 6.471 s, System: 1.536 s]
  Range (min … max):    7.991 s …  8.104 s    10 runs
 
Summary
  base64 -d base64.txt ran
    1.14 ± 0.01 times faster than ./coreutils base64 -d base64.txt
    1.16 ± 0.01 times faster than ./coreutils_main_branch base64 -d base64.txt

The system is also on some load, so it might be slower than usual, but more or less the results stay consistent with what reported before. Please let me know if there is any difference.

@ThePseudo ThePseudo force-pushed the streamline_b64_decode branch from c38288b to 8e0969e Compare September 16, 2025 06:07
@github-actions
Copy link

GNU testsuite comparison:

Skip an intermittent issue tests/timeout/timeout (fails in this run but passes in the 'main' branch)

@ThePseudo ThePseudo force-pushed the streamline_b64_decode branch from 8e0969e to 44147d1 Compare September 17, 2025 07:11
@github-actions
Copy link

GNU testsuite comparison:

Skip an intermittent issue tests/misc/tee (fails in this run but passes in the 'main' branch)

@ThePseudo ThePseudo force-pushed the streamline_b64_decode branch from 44147d1 to 05d7d9f Compare September 17, 2025 09:24
@github-actions
Copy link

GNU testsuite comparison:

Skipping an intermittent issue tests/timeout/timeout (passes in this run but fails in the 'main' branch)

@ThePseudo ThePseudo force-pushed the streamline_b64_decode branch from 05d7d9f to 1854b91 Compare September 18, 2025 07:27
@github-actions
Copy link

GNU testsuite comparison:

Skip an intermittent issue tests/timeout/timeout (fails in this run but passes in the 'main' branch)
Skipping an intermittent issue tests/misc/tee (passes in this run but fails in the 'main' branch)

@github-actions
Copy link

GNU testsuite comparison:

Skipping an intermittent issue tests/misc/stdbuf (passes in this run but fails in the 'main' branch)
Skipping an intermittent issue tests/timeout/timeout (passes in this run but fails in the 'main' branch)

@codspeed-hq
Copy link

codspeed-hq bot commented Sep 20, 2025

CodSpeed Performance Report

Merging #8622 will not alter performance

Comparing ThePseudo:streamline_b64_decode (c3c77fd) with main (9225670)

Summary

✅ 123 untouched
⏩ 5 skipped1

Footnotes

  1. 5 benchmarks were skipped, so the baseline results were used instead. If they were deleted from the codebase, click here and archive them to remove them from the performance reports.

@github-actions
Copy link

GNU testsuite comparison:

Skip an intermittent issue tests/misc/tee (fails in this run but passes in the 'main' branch)
Skip an intermittent issue tests/timeout/timeout (fails in this run but passes in the 'main' branch)

@github-actions
Copy link

GNU testsuite comparison:

Skip an intermittent issue tests/misc/tee (fails in this run but passes in the 'main' branch)

@ThePseudo ThePseudo force-pushed the streamline_b64_decode branch 2 times, most recently from bcd2ec4 to 1854b91 Compare September 22, 2025 08:01
@github-actions
Copy link

GNU testsuite comparison:

Skip an intermittent issue tests/misc/tee (fails in this run but passes in the 'main' branch)

@ThePseudo ThePseudo force-pushed the streamline_b64_decode branch 2 times, most recently from 23bc39f to 1bc46e6 Compare September 22, 2025 12:35
@github-actions
Copy link

GNU testsuite comparison:

Skip an intermittent issue tests/misc/tee (fails in this run but passes in the 'main' branch)

@github-actions
Copy link

GNU testsuite comparison:

Skip an intermittent issue tests/misc/tee (fails in this run but passes in the 'main' branch)
Skip an intermittent issue tests/timeout/timeout (fails in this run but passes in the 'main' branch)
Skipping an intermittent issue tests/misc/stdbuf (passes in this run but fails in the 'main' branch)

@github-actions
Copy link

GNU testsuite comparison:

Skip an intermittent issue tests/misc/tee (fails in this run but passes in the 'main' branch)
Skipping an intermittent issue tests/timeout/timeout (passes in this run but fails in the 'main' branch)
Congrats! The gnu test tests/tail/overlay-headers is no longer failing!

@ThePseudo ThePseudo force-pushed the streamline_b64_decode branch from e9882d5 to f60b1b9 Compare September 24, 2025 13:36
@ThePseudo ThePseudo force-pushed the streamline_b64_decode branch from 88e3b2b to 9bdac31 Compare October 13, 2025 11:38
@github-actions
Copy link

GNU testsuite comparison:

Skip an intermittent issue tests/misc/tee (fails in this run but passes in the 'main' branch)

@ThePseudo ThePseudo force-pushed the streamline_b64_decode branch from 9bdac31 to 87978e0 Compare October 16, 2025 06:16
@github-actions
Copy link

GNU testsuite comparison:

Skip an intermittent issue tests/timeout/timeout (fails in this run but passes in the 'main' branch)
Skipping an intermittent issue tests/misc/tee (passes in this run but fails in the 'main' branch)

@ThePseudo ThePseudo force-pushed the streamline_b64_decode branch from 87978e0 to 76cb7e6 Compare October 17, 2025 09:45
@github-actions
Copy link

GNU testsuite comparison:

Skip an intermittent issue tests/misc/tee (fails in this run but passes in the 'main' branch)
Skip an intermittent issue tests/timeout/timeout (fails in this run but passes in the 'main' branch)
Skipping an intermittent issue tests/tail/overlay-headers (passes in this run but fails in the 'main' branch)

@ThePseudo ThePseudo force-pushed the streamline_b64_decode branch from 76cb7e6 to 8fd5dc2 Compare October 21, 2025 08:26
@github-actions
Copy link

GNU testsuite comparison:

Skip an intermittent issue tests/tail/overlay-headers (fails in this run but passes in the 'main' branch)

@ThePseudo ThePseudo force-pushed the streamline_b64_decode branch 4 times, most recently from fe89a32 to 76cb42b Compare October 21, 2025 11:51
@github-actions
Copy link

GNU testsuite comparison:

Skip an intermittent issue tests/timeout/timeout (fails in this run but passes in the 'main' branch)
Skipping an intermittent issue tests/misc/tee (passes in this run but fails in the 'main' branch)

@ThePseudo
Copy link
Author

Issues with base58 not supporting streaming are solved by the latest commit. We should provide some information about whether the algorithm supports streaming. I modified the trait responsible for managing the chunk size, so now this is working properly.
The other algorithms support streaming, and even base58 decoding supports it, so that stays unchanged (except that there is logic for supporting also the non-streaming variant now).

@ThePseudo ThePseudo force-pushed the streamline_b64_decode branch from 76cb42b to bbc2458 Compare October 23, 2025 06:45
@github-actions
Copy link

GNU testsuite comparison:

Skipping an intermittent issue tests/misc/tee (passes in this run but fails in the 'main' branch)
Skipping an intermittent issue tests/timeout/timeout (passes in this run but fails in the 'main' branch)

@ThePseudo ThePseudo force-pushed the streamline_b64_decode branch from bbc2458 to 22f8c10 Compare November 5, 2025 07:15
@github-actions
Copy link

github-actions bot commented Nov 5, 2025

GNU testsuite comparison:

Skipping an intermittent issue tests/tail/overlay-headers (passes in this run but fails in the 'main' branch)

@ThePseudo ThePseudo force-pushed the streamline_b64_decode branch from 22f8c10 to 024c137 Compare November 6, 2025 09:52
Andrea Calabrese added 4 commits November 11, 2025 08:46
This should remove the dependency we have in knowing whether the final
message has padding or not. This is the first step to not have a
ahead-of-time loading of the entire message to encode/decode, and allow
for streaming.

Signed-off-by: Andrea Calabrese <andrea.calabrese@amarulasolutions.com>
As per title, this is the main feature of this patch set. First, by
avoiding looking for the final padding, there is the ability to read
data streaming in before the stream finished producing them. This also
enables the tool to work with much less memory needed, essentially
making it a fixed amount instead of tepending by the file size.

Signed-off-by: Andrea Calabrese <andrea.calabrese@amarulasolutions.com>
We read linearly, so we do not need to seek within a file

Signed-off-by: Andrea Calabrese <andrea.calabrese@amarulasolutions.com>
base58 does not support streaming when encoding. This patch allows
base58 and other not-streaming algorithms to work with the new streaming
mechanism.

Signed-off-by: Andrea Calabrese <andrea.calabrese@amarulasolutions.com>
@ThePseudo ThePseudo force-pushed the streamline_b64_decode branch 2 times, most recently from a74141f to 1ed7176 Compare November 11, 2025 08:12
Signed-off-by: Andrea Calabrese <andrea.calabrese@amarulasolutions.com>
@ThePseudo ThePseudo force-pushed the streamline_b64_decode branch from 1ed7176 to c3c77fd Compare November 11, 2025 08:30
@github-actions
Copy link

GNU testsuite comparison:

GNU test failed: tests/basenc/base64. tests/basenc/base64 is passing on 'main'. Maybe you have to rebase?
GNU test failed: tests/basenc/basenc. tests/basenc/basenc is passing on 'main'. Maybe you have to rebase?

@sylvestre
Copy link
Contributor

many jobs are still failing, could you please have a look? thanks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants