Skip to content

[Bug] Avro bytes fields rendered as Latin-1 codepoint strings instead of Base64 since v3.7.0 #2421

@haoyukongTrackunit

Description

@haoyukongTrackunit

Description

Since v3.7.0, Avro bytes fields — including those inside Debezium logical types such as io.debezium.data.VariableScaleDecimal — are rendered in the Console UI as raw Latin-1/UTF-8 codepoint strings (e.g. "\r\u0003\u0012\u00aa\u00a3_\u00f2") instead of Base64 (e.g. "QwhcN9vr+A==").

This is a regression from v2.8.x behaviour and breaks the ability to visually verify decimal values encoded via Debezium's VariableScaleDecimal type directly in the Console UI.

Root Cause

The regression was introduced in PR #2351 (commit 14d05a16f), which changed Avro serialisation from Go's json.Marshal to avroSch.EncodeJSON. The Avro JSON spec encodes bytes as ISO-8859-1 codepoint strings rather than Base64. While spec-compliant, this is a breaking behaviour change for consumers who relied on the previous Base64 output for manual data verification.

Steps to Reproduce

  1. Configure a Kafka topic with Avro-encoded messages containing bytes fields, or Debezium CDC messages containing io.debezium.data.VariableScaleDecimal fields (e.g. a numeric/decimal PostgreSQL column via Debezium).
  2. Open the topic in Redpanda Console v3.7.0 or v3.7.1.
  3. Inspect a message — the bytes / VariableScaleDecimal.value field renders as a raw codepoint string instead of Base64.

Expected Behaviour

bytes fields should be rendered as Base64-encoded strings, consistent with v2.8.x behaviour.

v2.8.11 output (expected):

"cumulative_emissions": {
  "io.debezium.data.VariableScaleDecimal": {
    "scale": 15,
    "value": "cVLOXkWiqq=="
  }
}

Actual Behaviour

v3.7.1 output (actual):

"cumulative_emissions": {
  "scale": 13,
  "value": "\tn\\\u001a#\u00dbN"
}

Bytes above 0x7F are corrupted when re-encoded as UTF-8 codepoints, making the original value unrecoverable from the UI alone.

Impact

  1. Lossless verification broken — Base64 can be reliably decoded to the original byte sequence; Latin-1 codepoint strings mangle bytes above 0x7F and the original data cannot be recovered from the UI.
  2. Manual data validation impossible — teams using Console to spot-check CDC pipeline values (e.g. Databricks → PostgreSQL → Debezium → Redpanda) can no longer verify decimal field values without accessing raw Avro binaries or a separate toolchain.
  3. AI/tooling decode failure — even LLM-based tools cannot correctly decode the mangled codepoint representation because the original bytes are already lost in the re-encoding.
  4. Silent corruption risk — data that looks like text is actually mangled bytes, making it harder to detect real pipeline issues early.

Environment

  • Redpanda Console versions affected: v3.7.0, v3.7.1 (reproduced on both)
  • Last working version: v2.8.11
  • Schema type: Debezium PostgreSQL CDC, io.debezium.data.VariableScaleDecimal logical type
  • Schema Registry: Confluent-compatible (kafka-cp-schema-registry)

Notes

  • The change in PR serde: use EncodeJSON and DecodeJSON for Avro serialization #2351 was intentional and also fixed real bugs (NaN/±Inf serialisation errors, wrong time.Duration units, union wrappers on decode). A naive revert is not recommended.
  • Suggested fix: keep EncodeJSON but post-process []byte/fixed Avro fields back to Base64 in the UI-facing JSON payload.
  • v3.6.0 does not exhibit this specific bytes regression but has other known issues (schema subject resolution), so downgrading to v3.6.0 is not a viable workaround.
  • Verified by locally running Console v2.8.11, v3.6.0, and v3.7.1 side-by-side against the same production Kafka cluster and schema registry.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions