Skip to content

Conversation

cosmo0920
Copy link
Contributor

@cosmo0920 cosmo0920 commented Sep 11, 2025

This change is needed because, without this change, we just received unknown zstd frames like:

forward: len=12984, head=54 d1 00 da cc 74 2e 4e 20 d6 2a ce 01 be f2 c2 | compressed(opt)=2

But, zstd specification always needs to attach the head of magic bytes like:

forward: len=19835, head=28 b5 2f fd a0 c2 14 03 00 a4 d0 00 4a f7 30 39 | compressed(opt)=2

So, we need to attach the head of magic bytes 28 b5 2f fd in zstd compressed payloads.

This can be dumped with:

diff --git a/lib/fluent/plugin/out_forward.rb b/lib/fluent/plugin/out_forward.rb
index 4c323bb0..f192d6e3 100644
--- a/lib/fluent/plugin/out_forward.rb
+++ b/lib/fluent/plugin/out_forward.rb
@@ -672,6 +672,9 @@ module Fluent::Plugin
         sock.write @sender.forward_header                    # array, size=3
         sock.write tag.to_msgpack                            # 1. tag: String (str)
         chunk.open(compressed: @compress) do |chunk_io|
+          head = chunk_io.read(8) || ''.b
+          @log.info "debug: forward entries head", hex: head.bytes.map { |b| "%02x" % b }.join(' ')
+          chunk_io.rewind
           entries = [0xdb, chunk_io.size].pack('CN')
           sock.write entries.force_encoding(Encoding::UTF_8) # 2. entries: String (str32)
           IO.copy_stream(chunk_io, sock)                     #    writeRawBody(packed_es)

Which issue(s) this PR fixes:
None

What this PR does / why we need it:

This could be a known issue after merging #4657.
This is because with that PR patch, we wasn't able to decompress zstd compressed insisted payloads in Fluent Bit side.
In our side, we need to set up explicit zstd frames with the head of magic bytes: 28 b5 2f fd.

However, stream writer of zstd-ruby does not wrap up their compressing payloads with that zstd specific payloads.
So, we always experienced this kind of errors by using Fluent Bit's development version of in_forward with zstd compressed insisted payloads.
With gzip compressed payloads, there is no issue but the behavior differences of StreamWrite class between Gzip and Zstd could cause this issue.

The related Fluent Bit's PR is:
fluent/fluent-bit#10710

Docs Changes:

Release Note:

This change is needed because, without this change, we just received
unknown zstd frames like:

forward: len=12984, head=54 d1 00 da cc 74 2e 4e 20 d6 2a ce 01 be f2 c2 | compressed(opt)=2

But, zstd specification always needs to attach the head of magic bytes
like:

forward: len=19835, head=28 b5 2f fd a0 c2 14 03 00 a4 d0 00 4a f7 30 39 | compressed(opt)=2

So, we need to attach the head of magic bytes `28 b5 2f fd` in zstd compressed
payloads.

This can be dumped with:

```diff
diff --git a/lib/fluent/plugin/out_forward.rb b/lib/fluent/plugin/out_forward.rb
index 4c323bb..f192d6e3 100644
--- a/lib/fluent/plugin/out_forward.rb
+++ b/lib/fluent/plugin/out_forward.rb
@@ -672,6 +672,9 @@ module Fluent::Plugin
         sock.write @sender.forward_header                    # array, size=3
         sock.write tag.to_msgpack                            # 1. tag: String (str)
         chunk.open(compressed: @compress) do |chunk_io|
+          head = chunk_io.read(8) || ''.b
+          @log.info "debug: forward entries head", hex: head.bytes.map { |b| "%02x" % b }.join(' ')
+          chunk_io.rewind
           entries = [0xdb, chunk_io.size].pack('CN')
           sock.write entries.force_encoding(Encoding::UTF_8) # 2. entries: String (str32)
           IO.copy_stream(chunk_io, sock)                     #    writeRawBody(packed_es)
```

Signed-off-by: Hiroshi Hatake <[email protected]>
@cosmo0920 cosmo0920 requested review from ashie and daipom September 11, 2025 09:16
@daipom
Copy link
Contributor

daipom commented Sep 11, 2025

@cosmo0920
Thanks for this fix!
I’ve only been able to check a little so far, but does this mean that Fluentd’s zstd implementation isn’t compliant with RCF 8878 Zstandard Frames?
(Fluentd produces data without a Magic_Number?)

When I try a simple test with Zstd::StreamWriter, it includes the Magic_Number.
I wonder why.

$ irb -rzstd-ruby -rstringio
irb(main):001> io = StringIO.new
=> #<StringIO:0x00007d7e3bab3c18>
irb(main):002> stream = Zstd::StreamWriter.new(io)
=> #<Zstd::StreamWriter:0x00007d7e3632d6b0 @io=#<StringIO:0x00007d7e3bab3c18>, @stream=#<Zstd::StreamingCompress:0x00007d7e3632a258>>
irb(main):003> stream.write("abc")
=> 12
irb(main):004> stream.finish
=> 3
irb(main):005> io.rewind
=> 0
irb(main):006> d=io.read
=> "(\xB5/\xFD\u0000X\u0018\u0000\u0000abc\u0001\u0000\u0000"
irb(main):008> d.unpack("H*")
=> ["28b52ffd0058180000616263010000"]

I’ll also try to check this with Fluentd’s behavior.

@cosmo0920
Copy link
Contributor Author

cosmo0920 commented Sep 11, 2025

How about using Enumerable mixined class instances case?

@cosmo0920
Copy link
Contributor Author

cosmo0920 commented Sep 11, 2025

I’ve only been able to check a little so far, but does this mean that Fluentd’s zstd implementation isn’t compliant with RCF 8878 Zstandard Frames?
(Fluentd produces data without a Magic_Number?)

Yup, at least, out_forward does not attach such zstd standarized magic number when using zstd compression.

@cosmo0920
Copy link
Contributor Author

cosmo0920 commented Sep 11, 2025

(Fluentd produces data without a Magic_Number?)

From RFC 8878:

3.1.1. Zstandard Frames

The structure of a single Zstandard frame is as follows:

                +--------------------+------------+
                | Magic_Number       | 4 bytes    |
                +--------------------+------------+
                | Frame_Header       | 2-14 bytes |
                +--------------------+------------+
                | Data_Block         | n bytes    |
                +--------------------+------------+
                | [More Data_Blocks] |            |
                +--------------------+------------+
                | [Content_Checksum] | 4 bytes    |
                +--------------------+------------+

                    Table 1: The Structure of a
                       Single Zstandard Frame

Magic_Number: 4 bytes, little-endian format. Value: 0xFD2FB528.

Yes, Fluentd generates without this type of magic number when compressing and using zstd compression.
This could be sick for other implementation of Fluent Server.

Plus, it's little endian flag so the number of series that is 0x28 0xb5 0x2f 0xfd should be needed to include zstd compressed payloads in forward protocol to distinguish whether among plain text or gzip compressed or zstd compressed.
When using gzip compression, Fluentd already uses 0x1f 0x8b magic number for gzip compression.

From https://datatracker.ietf.org/doc/html/rfc6713, we need to use this magic number at the payloads' headers: 0x1f, 0x8b

Additional information:
Magic number(s): first two bytes are 0x1f, 0x8b.
File extension(s): gz
Macintosh file type code(s): N/A

@daipom daipom added this to the v1.19.1 milestone Sep 12, 2025
@daipom
Copy link
Contributor

daipom commented Sep 12, 2025

Sorry, I didn’t have much time today.
I also tried to reproduce it with Fluentd, but I still couldn’t.
I can confirm the magic number.

  • Fluentd v1.19.0
  • Ubuntu 22.04
  • conf:
<source>
  @type sample
  tag test.foo
</source>

<match test.**>
  @type forward
  compress zstd
  <server>
    host localhost
    port 24224
  </server>
  <buffer>
    @type memory
    flush_mode interval
    flush_interval 2s
  </buffer>
</match>

<source>
  @type forward
  @label @SERVER
</source>

<label @SERVER>
  <match **>
    @type stdout
  </match>
</label>
  • patch:
diff --git a/lib/fluent/plugin/out_forward.rb b/lib/fluent/plugin/out_forward.rb
index 4c323bb0..977d99f6 100644
--- a/lib/fluent/plugin/out_forward.rb
+++ b/lib/fluent/plugin/out_forward.rb
@@ -672,6 +672,9 @@ module Fluent::Plugin
         sock.write @sender.forward_header                    # array, size=3
         sock.write tag.to_msgpack                            # 1. tag: String (str)
         chunk.open(compressed: @compress) do |chunk_io|
+          head = chunk_io.read(8) || ''.b
+          @log.warn "debug: forward entries head", hex: head.bytes.map { |b| "%02x" % b }.join(' ')
+          chunk_io.rewind
           entries = [0xdb, chunk_io.size].pack('CN')
           sock.write entries.force_encoding(Encoding::UTF_8) # 2. entries: String (str32)
           IO.copy_stream(chunk_io, sock)                     #    writeRawBody(packed_es)
  • Result:
2025-09-12 18:56:04 +0900 [info]: #0 fluentd worker is now running worker=0
2025-09-12 18:56:07 +0900 [warn]: #0 debug: forward entries head hex="28 b5 2f fd 00 58 d8 00"
2025-09-12 18:56:05.086068100 +0900 test.foo: {"message":"sample"}
2025-09-12 18:56:06.087594952 +0900 test.foo: {"message":"sample"}
2025-09-12 18:56:07.088746068 +0900 test.foo: {"message":"sample"}

@daipom
Copy link
Contributor

daipom commented Sep 12, 2025

I’ll check more patterns, including forwarding with Fluent Bit.

@cosmo0920
Copy link
Contributor Author

cosmo0920 commented Sep 12, 2025

Hi, I rechecked and found that -- when nothing to occur for not concatenated cases of zstd compression.
The current implementation is already working.
But when occurring zstd frames concatenations are occurred, the C style of concatenation is always corrupted and couldn't decompress.
So, we need to terminate the zstd compression buffers one-by-one and concatenating with C style is needed.

This could be reproduced with huge amount of lines file and head - /path/to/tailing_target and using in_tail plugin to ingest large amount of events and will be able to handle this type of high volume specific occurrences.

To reproduce this issue, it needs an ingestion of around the amount of 1700 lines of file contents at once.

@cosmo0920
Copy link
Contributor Author

How's going this PR, mate?
Should we deeply dive into the dependent gem like zstd-ruby?

@daipom
Copy link
Contributor

daipom commented Sep 19, 2025

Sorry, I haven’t been able to make time over the past few days.
Thanks for the reproduction steps!
I'll try it and review this change.

@daipom
Copy link
Contributor

daipom commented Sep 19, 2025

I could reproduce this! Thanks!
As you said, the issue can be reproduced when concatenating a large amount of data.
It does not occur with the concatenation of small data.

<source>
  @type sample
  tag test.foo
  size 5000 # This is important.
</source>

<match test.**>
  @type forward
  compress zstd
  <server>
    host localhost
    port 24224
  </server>
  <buffer>
    @type memory
    flush_mode interval
    flush_interval 2s
  </buffer>
</match>

<source>
  @type forward
  @label @SERVER
</source>

<label @SERVER>
  <match **>
    @type stdout
  </match>
</label>
2025-09-19 18:40:21 +0900 [info]: #0 fluentd worker is now running worker=0
2025-09-19 18:40:24 +0900 [warn]: #0 debug: forward entries head hex="0c 0f 00 24 1a 65 a5 bb"
2025-09-19 18:40:24 +0900 [error]: #0 unexpected error on reading data host="127.0.0.1" port=34744 error_class=RuntimeError error="decompress error error code: Unknown frame descriptor"
  2025-09-19 18:40:24 +0900 [error]: #0 /home/daipom/.rbenv/versions/3.2.2/lib/ruby/gems/3.2.0/gems/zstd-ruby-1.5.7.0/lib/zstd-ruby/stream_reader.rb:14:in `decompress'
  2025-09-19 18:40:24 +0900 [error]: #0 /home/daipom/.rbenv/versions/3.2.2/lib/ruby/gems/3.2.0/gems/zstd-ruby-1.5.7.0/lib/zstd-ruby/stream_reader.rb:14:in `read'
  2025-09-19 18:40:24 +0900 [error]: #0 /home/daipom/work/fluentd/fluentd/lib/fluent/plugin/compressable.rb:86:in `block in string_decompress_zstd'
  2025-09-19 18:40:24 +0900 [error]: #0 /home/daipom/work/fluentd/fluentd/lib/fluent/plugin/compressable.rb:84:in `loop'
  2025-09-19 18:40:24 +0900 [error]: #0 /home/daipom/work/fluentd/fluentd/lib/fluent/plugin/compressable.rb:84:in `string_decompress_zstd'
  2025-09-19 18:40:24 +0900 [error]: #0 /home/daipom/work/fluentd/fluentd/lib/fluent/plugin/compressable.rb:97:in `string_decompress'
  2025-09-19 18:40:24 +0900 [error]: #0 /home/daipom/work/fluentd/fluentd/lib/fluent/plugin/compressable.rb:57:in `decompress'
  2025-09-19 18:40:24 +0900 [error]: #0 /home/daipom/work/fluentd/fluentd/lib/fluent/event.rb:307:in `ensure_decompressed!'
  2025-09-19 18:40:24 +0900 [error]: #0 /home/daipom/work/fluentd/fluentd/lib/fluent/event.rb:289:in `each'
  2025-09-19 18:40:24 +0900 [error]: #0 /home/daipom/work/fluentd/fluentd/lib/fluent/plugin/in_forward.rb:373:in `check_and_skip_invalid_event'

@daipom daipom modified the milestones: v1.19.1, v1.20.0 Sep 22, 2025
@daipom daipom added the backport to v1.19 We will backport this fix to the LTS branch label Sep 22, 2025
@daipom
Copy link
Contributor

daipom commented Sep 26, 2025

Sorry for the delay.
It appears this is a bug of Zstd::StreamWriter.
I have reported it to the upstream: SpringMT/zstd-ruby#112 (comment).

As already reported here, it appears that Zstd::StreamingCompress has issues when handling large data.

@cosmo0920
Copy link
Contributor Author

cosmo0920 commented Sep 26, 2025

It's a really good point.
We need to handle this issue at the upstream gem's repo.
But for the reference, we need to keep opening this PR until the patch SpringMT/zstd-ruby#114 is merged and released.

@daipom
Copy link
Contributor

daipom commented Sep 26, 2025

Thanks for making the patch for the upstream.
It’s amazing!

I agree.
It would be better if we could work around it on the Fluentd side for now.

@daipom
Copy link
Contributor

daipom commented Sep 26, 2025

Although I haven’t thoroughly tested it and there are likely other parts that will require fixes as well, I think this approach would be a good direction to take for now.

diff --git a/lib/fluent/plugin/compressable.rb b/lib/fluent/plugin/compressable.rb
index 2ec21229..a9344171 100644
--- a/lib/fluent/plugin/compressable.rb
+++ b/lib/fluent/plugin/compressable.rb
@@ -26,13 +26,13 @@ module Fluent
         io = output_io || StringIO.new
         if type == :gzip
           writer = Zlib::GzipWriter.new(io)
+          writer.write(data)
+          writer.finish
         elsif type == :zstd
-          writer = Zstd::StreamWriter.new(io)
+          io << Zstd.compress(data)
         else
           raise ArgumentError, "Unknown compression type: #{type}"
         end
-        writer.write(data)
-        writer.finish
         output_io || io.string
       end

It appears we don't need to use the Streaming feature.
It seems sufficient to just do a single compression and append it to the IO.
I have confirmed this resolves the error mentioned in #5094 (comment).

@cosmo0920
Copy link
Contributor Author

cosmo0920 commented Sep 26, 2025

It appears we don't need to use the Streaming feature.
It seems sufficient to just do a single compression and append it to the IO.
I have confirmed this resolves the error mentioned in #5094 (comment).

Does this patch work for Enumerable type argument of data?

@cosmo0920
Copy link
Contributor Author

cosmo0920 commented Sep 29, 2025

Hi,
I also worked and addressed for larger compression cases on zstd-ruby:
SpringMT/zstd-ruby#116

Does this PR resolve this kind of issues without this patch?
Because implementing in C would be better to handle zstd compression in large amount of data processing environments.
So, I wanted to confirm with Fluentd Core team.

@daipom
Copy link
Contributor

daipom commented Sep 29, 2025

It appears we don't need to use the Streaming feature.
It seems sufficient to just do a single compression and append it to the IO.
I have confirmed this resolves the error mentioned in #5094 (comment).

Does this patch work for Enumerable type argument of data?

Does Fluent::Plugin::Compressable#compress() need to support Enumerable?
(Does the existing code support it? Looks like it does not support it.)

Note: Fluent::Plugin::Compressable#compress() is used here (EventStream#to_msgpack_stream).

def to_compressed_msgpack_stream(time_int: false, packer: nil, type: :gzip)
packed = to_msgpack_stream(time_int: time_int, packer: packer)
compress(packed, type: type)
end

Enumerable is used here (Chunk#append).
(We need to consider Enumerable here.)

def append(data, **kwargs)
if kwargs[:compress] == :zstd
io = StringIO.new
stream = Zstd::StreamWriter.new(io)
data.each do |d|
stream.write(d)
end
stream.finish
concat(io.string, data.size)
else
super
end
end

@daipom
Copy link
Contributor

daipom commented Sep 29, 2025

Hi, I also worked and addressed for larger compression cases on zstd-ruby: SpringMT/zstd-ruby#116

Does this PR resolve this kind of issues without this patch? Because implementing in C would be better to handle zstd compression in large amount of data processing environments. So, I wanted to confirm with Fluentd Core team.

Thanks so much!
Yes! The library fix should solve this issue. I will confirm it.

After this issue itself is resolved, it might still be worth considering the point that we may not need to use the Streaming feature of zstd-ruby.

@daipom
Copy link
Contributor

daipom commented Sep 29, 2025

I have confirmed this issue is resolved with zstd-ruby 1.5.7.1!

@daipom
Copy link
Contributor

daipom commented Sep 29, 2025

After this issue itself is resolved, it might still be worth considering the point that we may not need to use the Streaming feature of zstd-ruby.

It may be better to address this point in a separate PR.
I made the issue for it.

Perhaps we can close this PR now.

@cosmo0920
Copy link
Contributor Author

After this issue itself is resolved, it might still be worth considering the point that we may not need to use the Streaming feature of zstd-ruby.

It may be better to address this point in a separate PR. I made the issue for it.

Perhaps we can close this PR now.

Now, we need to take benckmark w/ streaming zstd compression or w/ normal zstd compression.
Do we need to avoid to use this kind of feature?
Even though I fixed the experimental-like feature of streaming compression in zstd-ruby.
So, I mean, your company insists that "we fix OSS bugs in their upstream and we tend to report their bugs with patches as much as possible".
Just avoiding to use the buggy feature seems not to follow your company's credo.

@cosmo0920
Copy link
Contributor Author

Just closing this PR and updating zstd dependency with minimum required version would be one of the ideal solutions for mitigating this.
If this shouldn't work, we need to reconsider to solve this issue with another way.

@daipom
Copy link
Contributor

daipom commented Sep 29, 2025

After this issue itself is resolved, it might still be worth considering the point that we may not need to use the Streaming feature of zstd-ruby.

It may be better to address this point in a separate PR. I made the issue for it.

Perhaps we can close this PR now.

Now, we need to take benckmark w/ streaming zstd compression or w/ normal zstd compression. Do we need to avoid to use this kind of feature? Even though I fixed the experimental-like feature of streaming compression in zstd-ruby. So, I mean, your company insists that "we fix OSS bugs in their upstream and we tend to report their bugs with patches as much as possible". Just avoiding to use the buggy feature seems not to follow your company's credo.

Ah, there might have been a misunderstanding.

If the logic fundamentally requires the streaming feature, it wouldn’t be desirable to give it up just because the feature is still experimental.
However, my concern is that the logic does not fundamentally need the streaming feature.
This is because the process simply writes the given data and finishes.

@cosmo0920
Copy link
Contributor Author

cosmo0920 commented Sep 29, 2025

Now, we need to take benckmark w/ streaming zstd compression or w/ normal zstd compression. Do we need to avoid to use this kind of feature? Even though I fixed the experimental-like feature of streaming compression in zstd-ruby. So, I mean, your company insists that "we fix OSS bugs in their upstream and we tend to report their bugs with patches as much as possible". Just avoiding to use the buggy feature seems not to follow your company's credo.

Ah, there might have been a misunderstanding.

If the logic fundamentally requires the streaming feature, it wouldn’t be desirable to give it up just because the feature is still experimental. However, my concern is that the logic does not fundamentally need the streaming feature. This is because the process simply writes the given data and finishes.

Got it. The minimum required logic here is just terminating/creating for each zstd frame with the correct data structures and heading magic bytes in zstd compression with reasonable CPU or memory usages.
So, just to avoid using the streaming feature, it would be encountered for performance issues.
But, in reality, we sometimes don't have enough cycles to fix this kind of feature.
In this case, we could avoid using the streaming feature in zstd compression and ensuring the boundaries of zstd frames.
Plus, personally, I am concerned about memory consumption when using the streaming feature in zstd compression. This could be a performance trade off which is like processing speed vs memory consumption.

@daipom
Copy link
Contributor

daipom commented Sep 29, 2025

The minimum required logic here is just terminating/creating for each zstd frame with the correct data structures and heading magic bytes in zstd compression with reasonable CPU or memory usages.

Agree.

So, just to avoid using the streaming feature, it would be encountered for performance issues.

Since I don’t have a deep understanding of the library’s implementation, this is only a guess, but I expected single compression to perform better in both CPU and memory usage, as streaming compression seems likely to be more complex.
Of course, if streaming compression is appropriately applied in a situation where appends occur frequently, there may be aspects where streaming compression performs better.
However, at least with the current design of Fluentd, I assumed such benefits could not be expected.

This is only my speculation, and you know the implementation of the library better than I.
If switching to single compression is likely to cause performance issues, we should be cautious about making the change.

@ashie
Copy link
Member

ashie commented Sep 30, 2025

Thanks for your effort!
In my understanding:

  • This issue is fixed in zstd-ruby v1.5.7.1, so this PR can be closed without merge.
    • It would be better to require zstd-ruby v1.5.7.1 or later (or v2.0.1 or later) in another PR.
  • It might be better to stop using streaming API, we should discuss it in zstd compression: stop using the streaming feature #5111 & another PR if needed.

@ashie ashie closed this Sep 30, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

backport to v1.19 We will backport this fix to the LTS branch

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants