Skip to content

Commit bb01c23

Browse files
authored
DOC-995 Add metadata field and standardize Avro JSON content (#170)
1 parent 4682d8a commit bb01c23

File tree

2 files changed

+88
-30
lines changed

2 files changed

+88
-30
lines changed

modules/components/pages/processors/schema_registry_decode.adoc

Lines changed: 56 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -22,9 +22,12 @@ Common::
2222
--
2323
2424
```yml
25-
# Common config fields, showing default values
25+
# Common configuration fields, showing default values
2626
label: ""
2727
schema_registry_decode:
28+
avro:
29+
raw_unions: false # No default (optional)
30+
preserve_logical_types: false
2831
url: "" # No default (required)
2932
```
3033
@@ -34,10 +37,12 @@ Advanced::
3437
--
3538
3639
```yml
37-
# All config fields, showing default values
40+
# All configuration fields, showing default values
3841
label: ""
3942
schema_registry_decode:
40-
avro_raw_json: false
43+
avro:
44+
raw_unions: false
45+
preserve_logical_types: false
4146
url: "" # No default (required)
4247
oauth:
4348
enabled: false
@@ -66,36 +71,72 @@ schema_registry_decode:
6671
--
6772
======
6873

69-
Decodes messages automatically from a schema stored within a https://docs.confluent.io/platform/current/schema-registry/index.html[Confluent Schema Registry service^] by extracting a schema ID from the message and obtaining the associated schema from the registry. If a message fails to match against the schema then it will remain unchanged and the error can be caught using xref:configuration:error_handling.adoc[error handling methods].
74+
Decodes messages automatically from a schema stored within a https://docs.confluent.io/platform/current/schema-registry/index.html[Confluent Schema Registry service^] by extracting a schema ID from the message and obtaining the associated schema from the registry. If a message fails to match against the schema then it will remain unchanged and the error can be caught using xref:configuration:error_handling.adoc[error-handling methods].
7075

7176
Avro, Protobuf and JSON schemas are supported, all are capable of expanding from schema references as of v4.22.0.
7277

7378
== Avro JSON format
7479

75-
This processor creates documents formatted as https://avro.apache.org/docs/current/specification/_print/#json-encoding[Avro JSON^] when decoding with Avro schemas. In this format the value of a union is encoded in JSON as follows:
80+
By default, this processor expects documents formatted as https://avro.apache.org/docs/current/specification/[Avro JSON^] when decoding with Avro schemas. In this format, the value of a union is encoded in JSON as follows:
7681

77-
- if its type is `null`, then it is encoded as a JSON `null`;
78-
- otherwise it is encoded as a JSON object with one name/value pair whose name is the type's name and whose value is the recursively encoded value. For Avro's named types (record, fixed or enum) the user-specified name is used, for other types the type name is used.
82+
- If the union's type is `null`, it is encoded as a JSON `null`.
83+
- Otherwise, the union is encoded as a JSON object with one name/value pair. The name is the type's name, and the value is the recursively-encoded value. The user-specified name is used for Avro's named types (record, fixed, or enum). For other types, the type name is used.
7984

80-
For example, the union schema `["null","string","Foo"]`, where `Foo` is a record name, would encode:
85+
For example, the union schema `["null","string","Transaction"]`, where `Transaction` is a record name, would encode:
8186

82-
- `null` as `null`;
83-
- the string `"a"` as `\{"string": "a"}`; and
84-
- a `Foo` instance as `\{"Foo": {...}}`, where `{...}` indicates the JSON encoding of a `Foo` instance.
87+
- `null` as a JSON `null`
88+
- The string `"a"` as `{"string": "a"}`
89+
- A `Transaction` instance as `{"Transaction": {...}}`, where `{...}` indicates the JSON encoding of a `Transaction` instance
8590

86-
However, it is possible to instead create documents in https://pkg.go.dev/github.com/linkedin/goavro/v2#NewCodecForStandardJSONFull[standard/raw JSON format^] by setting the field <<avro_raw_json, `avro_raw_json`>> to `true`.
91+
Alternatively, you can create documents in https://pkg.go.dev/github.com/linkedin/goavro/v2#NewCodecForStandardJSONFull[standard/raw JSON format^] by setting the field <<avro-raw_unions,`avro.raw_unions`>> to `true`.
8792

8893
== Protobuf format
8994

90-
This processor decodes protobuf messages to JSON documents, you can read more about JSON mapping of protobuf messages here: https://developers.google.com/protocol-buffers/docs/proto3#json
95+
This processor decodes Protobuf messages to JSON documents. For more information about the JSON mapping of Protobuf messages, see the https://developers.google.com/protocol-buffers/docs/proto3#json[Protocol Buffers documentation^].
9196

97+
== Metadata
98+
99+
This processor adds the following metadata to processed messages:
100+
101+
- `schema_id`: The ID of the schema in the schema registry associated with the message.
92102

93103
== Fields
94104

95-
=== `avro_raw_json`
105+
=== `avro.raw_unions`
106+
107+
Whether Avro messages should be decoded into normal JSON (JSON that meets the expectations of regular internet JSON) rather than https://avro.apache.org/docs/current/specification/[Avro JSON^].
108+
109+
If set to `false`, Avro messages are decoded as https://pkg.go.dev/github.com/linkedin/goavro/v2#NewCodec[Avro JSON^].
110+
111+
For example, the union schema `["null","string","Transaction"]`, where `Transaction` is a record name, would be decoded as:
112+
113+
- A `null` as a JSON `null`
114+
- The string `"a"` as `{"string": "a"}`
115+
- A `Transaction` instance as `{"Transaction": {...}}`, where `{...}` indicates the JSON encoding of a `Transaction` instance.
116+
117+
If set to `true`, Avro messages are decoded as https://pkg.go.dev/github.com/linkedin/goavro/v2#NewCodecForStandardJSONFull[standard JSON^].
118+
119+
For example, the same union schema `["null","string","Transaction"]` is decoded as:
120+
121+
- A `null` as JSON `null`
122+
- The string `"a"` as `"a"`
123+
- A `Transaction` instance as `{...}`, where `{...}` indicates the JSON encoding of a `Transaction` instance.
124+
125+
For more details on the difference between standard JSON and Avro JSON, see the https://github.com/linkedin/goavro/blob/5ec5a5ee7ec82e16e6e2b438d610e1cab2588393/union.go#L224-L249[comment in Goavro^] and the https://github.com/linkedin/goavro[underlying library used for Avro serialization^].
126+
127+
128+
*Type*: `bool`
129+
130+
*Default*: `false`
131+
132+
=== `avro.preserve_logical_types`
133+
134+
Choose whether to:
96135

97-
Whether Avro messages should be decoded into normal JSON ("json that meets the expectations of regular internet json") rather than https://avro.apache.org/docs/current/specification/_print/#json-encoding[Avro JSON^]. If `true` the schema returned from the subject should be decoded as https://pkg.go.dev/github.com/linkedin/goavro/v2#NewCodecForStandardJSONFull[standard json^] instead of as https://pkg.go.dev/github.com/linkedin/goavro/v2#NewCodec[avro json^]. There is a https://github.com/linkedin/goavro/blob/5ec5a5ee7ec82e16e6e2b438d610e1cab2588393/union.go#L224-L249[comment in goavro^], the https://github.com/linkedin/goavro[underlining library used for avro serialization^], that explains in more detail the difference between the standard json and avro json.
136+
- Transform logical types into their primitive type (default). For example, decimals become raw bytes and timestamps become plain integers.
137+
- Preserve logical types.
98138

139+
Set to `true` to preserve logical types.
99140

100141
*Type*: `bool`
101142

modules/components/pages/processors/schema_registry_encode.adoc

Lines changed: 32 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -25,7 +25,7 @@ Common::
2525
--
2626
2727
```yml
28-
# Common config fields, showing default values
28+
# Common configuration fields, showing default values
2929
label: ""
3030
schema_registry_encode:
3131
url: "" # No default (required)
@@ -39,7 +39,7 @@ Advanced::
3939
--
4040
4141
```yml
42-
# All config fields, showing default values
42+
# All configuration fields, showing default values
4343
label: ""
4444
schema_registry_encode:
4545
url: "" # No default (required)
@@ -75,36 +75,36 @@ schema_registry_encode:
7575

7676
Encodes messages automatically from schemas obtains from a https://docs.confluent.io/platform/current/schema-registry/index.html[Confluent Schema Registry service^] by polling the service for the latest schema version for target subjects.
7777

78-
If a message fails to encode under the schema then it will remain unchanged and the error can be caught using xref:configuration:error_handling.adoc[error handling methods].
78+
If a message fails to encode under the schema then it will remain unchanged and the error can be caught using xref:configuration:error_handling.adoc[error-handling methods].
7979

80-
Avro, Protobuf and Json schemas are supported, all are capable of expanding from schema references as of v4.22.0.
80+
Avro, Protobuf and JSON schemas are supported, all are capable of expanding from schema references as of v4.22.0.
8181

8282
== Avro JSON format
8383

84-
By default this processor expects documents formatted as https://avro.apache.org/docs/current/specification/_print/#json-encoding[Avro JSON^] when encoding with Avro schemas. In this format the value of a union is encoded in JSON as follows:
84+
By default, this processor expects documents formatted as https://avro.apache.org/docs/current/specification/[Avro JSON^] when encoding with Avro schemas. In this format, the value of a union is encoded in JSON as follows:
8585

86-
- if its type is `null`, then it is encoded as a JSON `null`;
87-
- otherwise it is encoded as a JSON object with one name/value pair whose name is the type's name and whose value is the recursively encoded value. For Avro's named types (record, fixed or enum) the user-specified name is used, for other types the type name is used.
86+
- If the union's type is `null`, it is encoded as a JSON `null`.
87+
- Otherwise, the union is encoded as a JSON object with one name/value pair. The name is the type's name, and the value is the recursively-encoded value. The user-specified name is used for Avro's named types (record, fixed, or enum). For other types, the type name is used.
8888

89-
For example, the union schema `["null","string","Foo"]`, where `Foo` is a record name, would encode:
89+
For example, the union schema `["null","string","Transaction"]`, where `Transaction` is a record name, would encode:
9090

91-
- `null` as `null`;
92-
- the string `"a"` as `\{"string": "a"}`; and
93-
- a `Foo` instance as `\{"Foo": {...}}`, where `{...}` indicates the JSON encoding of a `Foo` instance.
91+
- A `null` as a JSON `null`
92+
- The string `"a"` as `{"string": "a"}`
93+
- A `Transaction` instance as `{"Transaction": {...}}`, where `{...}` indicates the JSON encoding of a `Transaction` instance
9494

95-
However, it is possible to instead consume documents in https://pkg.go.dev/github.com/linkedin/goavro/v2#NewCodecForStandardJSONFull[standard/raw JSON format^] by setting the field <<avro_raw_json, `avro_raw_json`>> to `true`.
95+
Alternatively, you can consume documents in https://pkg.go.dev/github.com/linkedin/goavro/v2#NewCodecForStandardJSONFull[standard/raw JSON format^] by setting the field <<avro_raw_json,`avro_raw_json`>> to `true`.
9696

9797
=== Known issues
9898

9999
Important! There is an outstanding issue in the https://github.com/linkedin/goavro[avro serializing library^] that Redpanda Connect uses which means it https://github.com/linkedin/goavro/issues/252[doesn't encode logical types correctly^]. It's still possible to encode logical types that are in-line with the spec if `avro_raw_json` is set to true, though now of course non-logical types will not be in-line with the spec.
100100

101101
== Protobuf format
102102

103-
This processor encodes protobuf messages either from any format parsed within Redpanda Connect (encoded as JSON by default), or from raw JSON documents, you can read more about JSON mapping of protobuf messages here: https://developers.google.com/protocol-buffers/docs/proto3#json
103+
This processor encodes Protobuf messages either from any format parsed within Redpanda Connect (encoded as JSON by default), or from raw JSON documents. For more information about the JSON mapping of Protobuf messages, see the https://developers.google.com/protocol-buffers/docs/proto3#json[Protocol Buffers documentation^].
104104

105105
=== Multiple message support
106106

107-
When a target subject presents a protobuf schema that contains multiple messages it becomes ambiguous which message definition a given input data should be encoded against. In such scenarios Redpanda Connect will attempt to encode the data against each of them and select the first to successfully match against the data, this process currently *ignores all nested message definitions*. In order to speed up this exhaustive search the last known successful message will be attempted first for each subsequent input.
107+
When a target subject presents a Protobuf schema that contains multiple messages it becomes ambiguous which message definition a given input data should be encoded against. In such scenarios Redpanda Connect will attempt to encode the data against each of them and select the first to successfully match against the data, this process currently *ignores all nested message definitions*. In order to speed up this exhaustive search the last known successful message will be attempted first for each subsequent input.
108108

109109
We will be considering alternative approaches in future so please https://redpanda.com/slack[get in touch^] with thoughts and feedback.
110110

@@ -155,8 +155,25 @@ refresh_period: 1h
155155

156156
=== `avro_raw_json`
157157

158-
Whether messages encoded in Avro format should be parsed as normal JSON ("json that meets the expectations of regular internet json") rather than https://avro.apache.org/docs/current/specification/_print/#json-encoding[Avro JSON^]. If `true` the schema returned from the subject should be parsed as https://pkg.go.dev/github.com/linkedin/goavro/v2#NewCodecForStandardJSONFull[standard json^] instead of as https://pkg.go.dev/github.com/linkedin/goavro/v2#NewCodec[avro json^]. There is a https://github.com/linkedin/goavro/blob/5ec5a5ee7ec82e16e6e2b438d610e1cab2588393/union.go#L224-L249[comment in goavro^], the https://github.com/linkedin/goavro[underlining library used for avro serialization^], that explains in more detail the difference between standard json and avro json.
158+
Whether Avro messages should be parsed as normal JSON (JSON that meets the expectations of regular internet JSON) rather than https://avro.apache.org/docs/current/specification/[Avro JSON^].
159159

160+
If set to `false`, the schema returned from the subject is parsed as https://pkg.go.dev/github.com/linkedin/goavro/v2#NewCodec[Avro JSON^].
161+
162+
For example, the union schema `["null","string","Transaction"]`, where `Transaction` is a record name, would be decoded as:
163+
164+
- A `null` as a JSON `null`
165+
- The string `"a"` as `{"string": "a"}`
166+
- A `Transaction` instance as `{"Transaction": {...}}`, where `{...}` indicates the JSON encoding of a `Transaction` instance.
167+
168+
If set to `true`, the schema returned from the subject is parsed as https://pkg.go.dev/github.com/linkedin/goavro/v2#NewCodecForStandardJSONFull[standard JSON^].
169+
170+
For example, the same union schema `["null","string","Transaction"]` is decoded as:
171+
172+
- A `null` as JSON `null`
173+
- The string `"a"` as `"a"`
174+
- A `Transaction` instance as `{...}`, where `{...}` indicates the JSON encoding of a `Transaction` instance.
175+
176+
For more details on the difference between standard JSON and Avro JSON, see the https://github.com/linkedin/goavro/blob/5ec5a5ee7ec82e16e6e2b438d610e1cab2588393/union.go#L224-L249[comment in Goavro^] and the https://github.com/linkedin/goavro[underlying library used for Avro serialization^].
160177

161178
*Type*: `bool`
162179

0 commit comments

Comments
 (0)