Skip to content
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions rust_snuba/src/strategies/clickhouse/row_binary_writer.rs
Original file line number Diff line number Diff line change
Expand Up @@ -79,6 +79,7 @@ where
Box::pin(async move {
let (empty_message, insert_batch) = message.take();
let batch_len = insert_batch.len();
let num_bytes = insert_batch.num_bytes();
let (rows, empty_batch) = insert_batch.take();

let write_start = SystemTime::now();
Expand Down Expand Up @@ -126,6 +127,7 @@ where
}
}

counter!("insertions.batch_write_bytes", num_bytes as i64);
counter!("insertions.batch_write_msgs", batch_len as i64);
empty_batch.record_message_latency();
empty_batch.emit_item_type_metrics();
Comment on lines 127 to 133
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug: The insertions.batch_write_bytes metric for the RowBinary writer uses an in-memory size estimate (estimated_size()) instead of the actual serialized byte count, making it inconsistent with the JSONEachRow writer.
Severity: MEDIUM

Suggested Fix

The RowBinary writer should measure the size of the data after it has been serialized into the RowBinary format, not before. This can be achieved by getting the length of the serialized byte buffer, which will ensure the metric reflects the actual number of bytes written and is consistent with the JSONEachRow writer's implementation.

Prompt for AI Agent
Review the code at the location below. A potential bug has been identified by an AI
agent.
Verify if this is a real issue. If it is, propose a fix; if not, explain why it's not
valid.

Location: rust_snuba/src/strategies/clickhouse/row_binary_writer.rs#L127-L133

Potential issue: The `insertions.batch_write_bytes` metric for the RowBinary writer is
calculated using `estimated_size()`, which measures the in-memory size of the data
structures. This is inconsistent with the JSONEachRow writer, which measures the actual
byte length of the serialized data. Because the RowBinary format uses variable-length
encoding, the in-memory size is not an accurate representation of the on-the-wire size.
This discrepancy will result in inaccurate and non-comparable metric data between the
two writers, which could lead to incorrect decisions about batching thresholds and
undermines the goal of standardizing this metric.

Did we get this right? 👍 / 👎 to inform future reviews.

Expand Down
Loading