Bound Syslog/CEF message size to prevent unbounded memory growth by utpilla · Pull Request #2452 · open-telemetry/otel-arrow

utpilla · 2026-03-28T00:57:19Z

Change Summary

Introduces a 16 KiB per-message size limit (MAX_MESSAGE_SIZE) for the syslog CEF receiver to prevent unbounded memory growth from malicious or misbehaving clients.

Before vs After

Aspect	Before	After
TCP message size	Unbounded (memory exhaustion risk)	Capped at 16 KiB per message
UDP receive buffer	1 KiB stack array (silent truncation)	16 KiB heap `Vec`
Truncation visibility	None	`received_logs_truncated` metric
TCP line buffer allocation	`Vec::new()` (multiple early reallocs)	`Vec::with_capacity(4096)`
Oversized message handling	N/A	Parsed and forwarded as partial data

Problem

TCP: read_until(b'\n') would keep appending bytes to a Vec until a newline appeared. A client sending data without newlines could exhaust memory.
UDP: The receive buffer was a 1 KiB stack array, too small for most real-world syslog messages and silently truncating them with no visibility.

Changes

Add read_line_bounded() helper that wraps AsyncReadExt::take() + AsyncBufReadExt::read_until() to cap TCP reads at MAX_MESSAGE_SIZE (16 KiB) per message. Returns a BoundedReadResult enum (Complete, Truncated, or Eof) so callers can distinguish outcomes.
Upgrade UDP receive buffer from 1 KiB stack array to 16 KiB heap Vec, matching the TCP limit.
Add received_logs_truncated metric to track messages that exceeded the size limit. For TCP, truncation is detected precisely. For UDP, it uses a best-effort heuristic (datagram fills buffer).
Pre-allocate TCP line buffer with Vec::with_capacity(4096) to avoid early reallocations for typical messages.
Add doc comments explaining cancellation safety: buf may contain partial data from select! cancellation and must not be cleared between calls unless discarding the current message.

Design decisions

Truncated messages are still forwarded: Preserves partial data (syslog header, timestamp, severity) rather than dropping it. The tail of an oversized message is also processed as its own message.
Metric semantics: received_logs_total counts each bounded read as a separate message. An oversized message that splits into head + tail counts as 2.

What issue does this PR close?

Towards #1149

How are these changes tested?

TCP truncation integration test: Sends an oversized message followed by a normal message, verifies all 3 records (truncated head, tail, normal) are received.
6 unit tests for read_line_bounded: Empty reader, complete line, truncation, exact-limit edge case, EOF with partial data, and multi-call behavior after truncation.
Telemetry test updates: Metric index adjustments to account for the new received_logs_truncated field.

Are there any user-facing changes?

No

codecov · 2026-03-28T01:00:32Z

Codecov Report

❌ Patch coverage is 98.90710% with 2 lines in your changes missing coverage. Please review.
✅ Project coverage is 88.40%. Comparing base (18fcc9e) to head (2da4083).

Additional details and impacted files

@@           Coverage Diff            @@
##             main    #2452    +/-   ##
========================================
  Coverage   88.40%   88.40%            
========================================
  Files         603      603            
  Lines      213389   213561   +172     
========================================
+ Hits       188639   188794   +155     
- Misses      24224    24241    +17     
  Partials      526      526

Components	Coverage Δ
otap-dataflow	`90.42% <98.90%> (-0.01%)`	⬇️
query_abstraction	`80.61% <ø> (ø)`
query_engine	`90.74% <ø> (ø)`
syslog_cef_receivers	`∅ <ø> (∅)`
otel-arrow-go	`52.44% <ø> (ø)`
quiver	`91.94% <ø> (ø)`

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

cijothomas · 2026-03-28T01:20:29Z

rust/otap-dataflow/crates/core-nodes/src/receivers/syslog_cef_receiver/mod.rs

                                        let _ = peer_addr;

-                                        let mut line_bytes = Vec::new();
+                                        let mut line_bytes = Vec::with_capacity(4096);


lets move to a const?

Fixed in 2da4083.

cijothomas · 2026-03-28T01:30:36Z

rust/otap-dataflow/crates/core-nodes/src/receivers/syslog_cef_receiver/mod.rs


                let socket = effect_handler.udp_socket(udp_config.listening_addr)?;
-                let mut buf = [0u8; 1024]; // ToDo: Find out the maximum allowed size for syslog messages
+                let mut buf = vec![0u8; MAX_MESSAGE_SIZE];


wondering why would we allocate for worst possible scenario? Could we be more conservative, and accept that vec! will re-grow if we actually has such a big message? Maybe 1/2 or 1/4 of the MAX size for initial size?

The UDP receive buffer is fundamentally different from TCP's Vec. recv_from writes into a fixed &mut [u8] slice. It doesn't append or grow. If the buffer is smaller than the incoming datagram, the OS kernel silently discards the excess bytes with no way to recover them. So, we must allocate the full MAX_MESSAGE_SIZE upfront to avoid permanent data loss. This is a one-time 16 KiB allocation for the lifetime of the receiver (one per core).

lalitb · 2026-03-28T02:52:00Z

rust/otap-dataflow/crates/core-nodes/src/receivers/syslog_cef_receiver/mod.rs

+    max_size: usize,
+) -> std::io::Result<BoundedReadResult> {
+    let n = (&mut *reader)
+        .take(max_size as u64)


I think this doesn’t fully enforce the size cap.

Because the read is inside tokio::select!, it can be interrupted if the batch timer fires before a newline arrives. When that happens, any bytes already appended to line_bytes stay in the buffer, and the next loop iteration starts reading the same message again.

I think read_line_bounded() should consider how much is already in buf and only allow reading the remaining space. If buf is already at the limit, it should return Truncated immediately.

Great catch @lalitb! Thank you! Fixed in 35c5f1d.

lalitb · 2026-03-28T03:01:38Z

rust/otap-dataflow/crates/core-nodes/src/receivers/syslog_cef_receiver/mod.rs

-                                                                // Handle incomplete line (log, emit metrics, etc.)
+                                                        Ok(bounded_result) => {
+                                                            if matches!(bounded_result, BoundedReadResult::Truncated) {
+                                                                metrics.borrow_mut().received_logs_truncated.inc();


I see from the PR description that splitting an oversized TCP message into head + tail is intentional, but I’m not sure this is the right semantic for a single syslog message.

In this branch we parse and emit the truncated prefix, clear line_bytes, and then continue reading, which means the remaining bytes from the same original message get emitted as a second record. That makes one input message turn into multiple downstream records, and received_logs_total no longer maps cleanly to original messages.

Why do we want to preserve and emit the tail as a separate logical message instead of either keeping a single truncated record or discarding until the next newline after truncation?

Why do we want to preserve and emit the tail as a separate logical message instead of either keeping a single truncated record or discarding until the next newline after truncation?

The truncated head preserves the syslog header (timestamp, hostname, severity, facility) which is the most valuable structured metadata. The tail preserves part of the message body. Discarding either means losing data that could be useful for debugging.

The receiver's abstraction is "newline-delimited messages up to 16 KiB each." From its perspective, the head and tail are two separate protocol-level messages. The received_logs_truncated metric (incremented only once per oversized message, not per fragment) alerts operators that truncation is occurring, so they can investigate if needed.

Messages exceeding 16 KiB are extremely rare in practice. When one does occur, it's likely an unusual event worth investigating which makes preserving partial data better than silently dropping it.

Add message size limit

433b93d

utpilla requested a review from a team as a code owner March 28, 2026 00:57

github-project-automation bot added this to OTel-Arrow Mar 28, 2026

github-actions bot added the rust Pull requests that update Rust code label Mar 28, 2026

cijothomas reviewed Mar 28, 2026

View reviewed changes

lalitb reviewed Mar 28, 2026

View reviewed changes

utpilla added 3 commits March 30, 2026 04:55

Fix clippy

2538127

Fix size cap enforcement

35c5f1d

Use const

2da4083

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bound Syslog/CEF message size to prevent unbounded memory growth#2452

Bound Syslog/CEF message size to prevent unbounded memory growth#2452
utpilla wants to merge 4 commits intoopen-telemetry:mainfrom
utpilla:utpilla/Add-Syslog-CEF-message-size-limit

utpilla commented Mar 28, 2026

Uh oh!

codecov bot commented Mar 28, 2026 •

edited

Loading

Uh oh!

cijothomas Mar 28, 2026

Uh oh!

utpilla Mar 30, 2026

Uh oh!

cijothomas Mar 28, 2026

Uh oh!

utpilla Mar 30, 2026

Uh oh!

lalitb Mar 28, 2026

Uh oh!

utpilla Mar 30, 2026

Uh oh!

lalitb Mar 28, 2026 •

edited

Loading

Uh oh!

utpilla Mar 30, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

utpilla commented Mar 28, 2026

Change Summary

Before vs After

Problem

Changes

Design decisions

What issue does this PR close?

How are these changes tested?

Are there any user-facing changes?

Uh oh!

codecov bot commented Mar 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

cijothomas Mar 28, 2026

Choose a reason for hiding this comment

Uh oh!

utpilla Mar 30, 2026

Choose a reason for hiding this comment

Uh oh!

cijothomas Mar 28, 2026

Choose a reason for hiding this comment

Uh oh!

utpilla Mar 30, 2026

Choose a reason for hiding this comment

Uh oh!

lalitb Mar 28, 2026

Choose a reason for hiding this comment

Uh oh!

utpilla Mar 30, 2026

Choose a reason for hiding this comment

Uh oh!

lalitb Mar 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

utpilla Mar 30, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

codecov bot commented Mar 28, 2026 •

edited

Loading

lalitb Mar 28, 2026 •

edited

Loading