Unable to produce to EventHub with to `Error: NETWORK_EXCEPTION. Error Message: Disconnected from node `, caused by `max.request.size`

:wave: I'm filing this one mostly as feedback as to whether the failure mode could be a little more obvious or graceful for users. Also, I hope others may find this useful if they go searching for the same errors. Recently, I found myself setting up KafkaMirrorMaker2 for EventHub-to-EventHub mirroring.

The same setup has already been in use, and happened to have `max.request.size ` set to `20971520` (20MiB) for the producer. When I was using the same setup for EventHub I was running into errors on the Kafka producer that I was unable to pin down. They were along the lines of:

```Got error produce response with correlation id 6397 on topic-partition <MYTOPIC>-7, retrying (2147481516 attempts left). Error: NETWORK_EXCEPTION. Error Message: Disconnected from node 0 (org.apache.kafka.clients.producer.internals.Sender)```

```Node 0 disconnected. (org.apache.kafka.clients.NetworkClient)```

``` Cancelled in-flight PRODUCE request with correlation id 6391 due to node 0 being disconnected (elapsed time since creation: 45ms, elapsed time since send: 45ms, request timeout: 30000ms) (org.apache.kafka.clients.NetworkClient)```

Now, I eventually combed through plenty of resources on getting things setup, like:
- [strimzi's blog on setting up Mirror Maker 2 on EventHub](https://strimzi.io/blog/2020/06/09/mirror-maker-2-eventhub/)
- [EventHub's troubshooting doc for Kafka](https://learn.microsoft.com/en-us/azure/event-hubs/apache-kafka-troubleshooting-guide)
- [EventHub's recommended Kafka configuration](https://learn.microsoft.com/en-us/azure/event-hubs/apache-kafka-configurations)

Eventually, I got it figured out once I applied every configuration in the recommendation and it got unwedged after setting `max.request.size`. This oversized request was exercised because the mirror source topic has plenty of data (for testing). In hindsight, the recommendations guide indicates this will happen:
> The service will close connections if requests larger than 1,046,528 bytes are sent. _This value *must* be changed and will cause issues in high-throughput produce scenarios._

There's maybe a few things that can be improved here:
- There was no mention of this failure mode on the troubleshooting doc, which I first went through. Having specific error references would be helpful to debug this and to find from search engines
- Similarly, no indication of the specific error in the recommended configuration notes made it easy to not find when debugging the error
- I did not observe any error metrics for EventHub, such as user errors (using the Datadog Azure integration). I was unable to find any diagnostic logs, but that could be on me (unaware of where exactly I can find them in my setup). Having some feedback would have helped pin down that it wasn't just me going crazy

Thanks!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Unable to produce to EventHub with to `Error: NETWORK_EXCEPTION. Error Message: Disconnected from node` , caused by `max.request.size` #255

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Unable to produce to EventHub with to Error: NETWORK_EXCEPTION. Error Message: Disconnected from node , caused by max.request.size #255

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

Unable to produce to EventHub with to `Error: NETWORK_EXCEPTION. Error Message: Disconnected from node` , caused by `max.request.size` #255