feat(consumer): Emit batch_write_bytes metric from RowBinary writer#7863
Merged
feat(consumer): Emit batch_write_bytes metric from RowBinary writer#7863
Conversation
The RowBinary writer was missing the insertions.batch_write_bytes counter that the JSONEachRow writer already emits. Adding it here so we can observe batch byte sizes across all eap_items consumers (s4s2/de use RowBinary) as part of Phase 0 of the byte-based batching rollout. Refs EAP-460 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Comment on lines
127
to
133
| } | ||
| } | ||
|
|
||
| counter!("insertions.batch_write_bytes", num_bytes as i64); | ||
| counter!("insertions.batch_write_msgs", batch_len as i64); | ||
| empty_batch.record_message_latency(); | ||
| empty_batch.emit_item_type_metrics(); |
Contributor
There was a problem hiding this comment.
Bug: The insertions.batch_write_bytes metric for the RowBinary writer uses an in-memory size estimate (estimated_size()) instead of the actual serialized byte count, making it inconsistent with the JSONEachRow writer.
Severity: MEDIUM
Suggested Fix
The RowBinary writer should measure the size of the data after it has been serialized into the RowBinary format, not before. This can be achieved by getting the length of the serialized byte buffer, which will ensure the metric reflects the actual number of bytes written and is consistent with the JSONEachRow writer's implementation.
Prompt for AI Agent
Review the code at the location below. A potential bug has been identified by an AI
agent.
Verify if this is a real issue. If it is, propose a fix; if not, explain why it's not
valid.
Location: rust_snuba/src/strategies/clickhouse/row_binary_writer.rs#L127-L133
Potential issue: The `insertions.batch_write_bytes` metric for the RowBinary writer is
calculated using `estimated_size()`, which measures the in-memory size of the data
structures. This is inconsistent with the JSONEachRow writer, which measures the actual
byte length of the serialized data. Because the RowBinary format uses variable-length
encoding, the in-memory size is not an accurate representation of the on-the-wire size.
This discrepancy will result in inaccurate and non-comparable metric data between the
two writers, which could lead to incorrect decisions about batching thresholds and
undermines the goal of standardizing this metric.
Did we get this right? 👍 / 👎 to inform future reviews.
kylemumma
approved these changes
Apr 7, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
insertions.batch_write_bytescounter to the RowBinary ClickHouse writer, matching what the JSONEachRow writer already emitseap_itemsin s4s2/de (which use RowBinary) to pick a data-driven threshold for--max-batch-size-calculation=bytesRefs EAP-460
Test plan
cargo checkpassesinsertions.batch_write_bytesappears in DataDog for RowBinary consumers🤖 Generated with Claude Code