Skip to content

Request batching does not track encoding overhead accurately #2106

@sjvanrossum

Description

@sjvanrossum

Steps to reproduce

  1. Set the byte size per batched request to >9.1 MB.
  2. Construct a batch of sufficiently large messages while respecting limits per message (max 10 MB data + attributes + ordering key, max 100 attributes, max 256 B per attribute key, max 1024 B per attribute value, max 1024 B per ordering key) to exceed either the serialized size limit of 10 MB per PublishRequest or 10 MiB request size limit per call to publish (pointless because of the first limit unless explicit size validation is enabled).
  3. Publish the batch.
  4. Observe thrown exception, either an early rejection (>10 MiB) or an internal error (>10 MB, <10 MiB) on request_size if explicit size validation isn't enabled for the topic.

External references such as API reference guides

The API reference does not provide adequate documentation on these limits, since it suggests that explicit size validation is used.

Any additional information below

Discovered this while reviewing size validation in PubsubIO for Apache Beam, see apache/beam#31800 for details.
Happy to open a PR with fixes when I get a chance, just wanted to highlight this ahead of time for discussion.

This shouldn't be an issue if a user recovers by lowering the byte size per batch (<9 MB is generally safe with current limits per message), but the client library should track the outgoing request size accurately on the user's behalf in my opinion.

Also note that allowing users to use the REST API for Pub/Sub as proposed in #2075 would complicate matters since the serialized size of publish requests sent as JSON can be smaller than the serialized size of the same requests after it's transcoded into protobuf. For this message type a protobuf message can be up to ~38 KB smaller or up to ~0.9 MB larger than its JSON counterpart.
An example of this would be a JSON publish request of 736 messages, each message with no data and 100 attributes, each attribute with no value and a key of 128 alphanumeric ASCII characters. The JSON request does not exceed 10 MB, but the protobuf transcoded request that's passed to Pub/Sub exceeds it by ~145 KB. That triggers the internal validation error without explicit size validation, but I guess this will become a non-issue when explicit size validation is used by default.

Metadata

Metadata

Assignees

Labels

api: pubsubIssues related to the googleapis/java-pubsub API.

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions