Skip to content

Conversation

vivekchavan14
Copy link
Contributor

Problem

Fixes #2697 - AutoBalancer metrics reporter experiencing OutOfOrderSequenceException and missing consumption data curves after upgrading from v1.1.2 to v1.4.1.

Root Cause

The producer was configured with retries and acks=all but missing enable.idempotence=true, causing sequence number conflicts during retries.

Changes Made

  • Enable producer idempotence to prevent OutOfOrderSequenceException
  • Add proper timeout configurations (delivery_timeout_ms=120000)
  • Improve error handling with specific exception type logging
  • Implement graceful shutdown with producer flush
  • Add shutdown guards to prevent metrics sending during shutdown
  • Enhanced tests for configuration validation

- Fix IllegalArgumentException: Illegal base64 character 20 in S3StreamKafkaMetricsManager
- Replace single newline removal with comprehensive whitespace cleanup using replaceAll("\s", "")
- Add graceful error handling for both Base64 and certificate parsing failures
- Add comprehensive unit tests covering various whitespace scenarios and edge cases
- Improve logging with specific error messages for failed certificate parsing

Fixes AutoMQ#2615
…trics reporter robustness

- Enable producer idempotence to prevent OutOfOrderSequenceException during retries
- Add proper timeout configurations (delivery_timeout_ms=120000, request_timeout_ms=30000)
- Improve error handling with specific exception type logging
- Implement graceful shutdown with producer flush and thread join
- Add shutdown guards to prevent metrics sending during shutdown
- Enhance tests for configuration validation and shutdown scenarios

Fixes AutoMQ#2697: Missing consumption data curves and OutOfOrderSequenceException errors

This resolves the issues reported after upgrading from v1.1.2 to v1.4.1 where:
- OutOfOrderSequenceException caused metric transmission failures
- Missing idempotence configuration led to sequence number conflicts during retries
- Poor shutdown handling caused InterruptExceptions
- Generic error handling made troubleshooting difficult

The fix maintains backward compatibility while significantly improving reliability.
@vivekchavan14 vivekchavan14 changed the title Vivekchavan14/fix autobalancer metrics reporter outofordersequence vivekchavan14/fix autobalancer metrics reporter outofordersequence Jul 25, 2025
@superhx
Copy link
Collaborator

superhx commented Jul 25, 2025

Welcome to use AI to fix issues, but the expected fixes need to be verified and confirmed by the developers themselves. @vivekchavan14

@vivekchavan14
Copy link
Contributor Author

Yup, I’ve reviewed the response, it's not entirely AI-generated. I used AI to assist with the analysis, but all suggested fixes and insights have been manually verified and cross-checked before posting. Let me know if any part needs further clarification or adjustment.

The root cause was that the AutoBalancer metrics reporter was configured with:
- `retries = 5`
- `acks = all`
- **BUT missing `enable.idempotence = true`**
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The default here is true, and there is no conflicting configuration. Idempotence should be effective and no additional modification is required.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[BUG] automq-kafka 升级后消费数据的曲线少了
3 participants