You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Cloud Storage Sink Connector provides multiple output format options, including JSON, Avro, Bytes, or Parquet. The default format is JSON. With current implementation, there are some limitations for different formats:
25
+
The Cloud Storage sink connector provides multiple output format options, including JSON, Avro, Bytes, or Parquet. The default format is JSON. With current implementation, there are some limitations for different formats:
*The JSON writer will try to convert the data with a String or Bytes schema to JSON-format data if convertable.
35
+
*The JSON writer will try to convert data with a String or Bytes schema to JSON-format data if convertable.
36
36
37
37
**The Protobuf schema is based on the Avro schema. It uses Avro as an intermediate format, so it may not provide the best effort conversion.
38
38
39
39
\*** The ProtobufNative record holds the Protobuf descriptor and the message. When writing to Avro format, the connector uses avro-protobuf to do the conversion.
40
40
____
41
41
42
-
The support of `withMetadata` configurations for different writer formats:
42
+
Supported `withMetadata` configurations for different writer formats:
*When using Parquet with PROTOBUF_NATIVE format, the connector will write the messages with DynamicMessage format. When withMetadata is set to true, the connector will add __message_metadata__ to the messages with PulsarIOCSCProtobufMessageMetadata format.
50
+
*When using Parquet with PROTOBUF_NATIVE format, the connector will write the messages with the DynamicMessage format. When withMetadata is set to true, the connector will add __message_metadata__ to the messages with PulsarIOCSCProtobufMessageMetadata format.
51
51
52
52
For example, if a message User has the following schema:
53
53
@@ -85,7 +85,7 @@ By default, when the connector receives a message with a non-supported schema ty
85
85
86
86
== Dead-letter topics
87
87
88
-
To use a dead-letter topic, set `skipFailedMessages` to `false` in the cloud provider config. Then using either pulsar-admin or curl, set `--max-redeliver-count` and `--dead-letter-topic`. For more info about dead-letter topics, see the https://pulsar.apache.org/docs/en/concepts-messaging/#dead-letter-topic[Pulsar documentation^]{external-link-icon}. If a message fails to be sent to the Cloud Storage and there is a dead-letter topic, the connector will send the message to the assigned topic.
88
+
To use a dead-letter topic, set `skipFailedMessages` to `false` in the cloud provider config. Then using either pulsar-admin or curl, set `--max-redeliver-count` and `--dead-letter-topic`. For more info about dead-letter topics, see the https://pulsar.apache.org/docs/en/concepts-messaging/#dead-letter-topic[Pulsar documentation^]{external-link-icon}. If a message fails to be sent to the Cloud Storage sink and there is a dead-letter topic, the connector will send the message to the assigned topic.
With the Cloud Storage Sink there a two sets of parameters. First the Astra Streaming parameters, then the params specific to your chosen cloud store.
100
+
With the Cloud Storage Sink there are two sets of parameters. First, the Astra Streaming parameters, then the parameters specific to your chosen cloud store.
101
101
102
102
=== Astra Streaming
103
103
@@ -132,7 +132,7 @@ The suggested permission policies for AWS S3 are:
132
132
- s3:PutObject*
133
133
- s3:List*
134
134
135
-
If you do not want to provide region in the configuration, you should enable s3:GetBucketLocation permission policy as well.
135
+
If you do not want to provide a region in the configuration, you should enable s3:GetBucketLocation permission policy as well.
0 commit comments