Skip to content

kfake: further improvements#1260

Merged
twmb merged 7 commits intomasterfrom
kfake
Feb 18, 2026
Merged

kfake: further improvements#1260
twmb merged 7 commits intomasterfrom
kfake

Conversation

@twmb
Copy link
Owner

@twmb twmb commented Feb 18, 2026

Claude-audit of kfake vs. Kafka, and then the fixes associated with what it found
Incremental fetch sessions actually implemented
More fixes, more tests

twmb and others added 7 commits February 18, 2026 02:07
- EndTxn v5+ retry: validate commit/abort direction matches the
  completed transaction, return INVALID_TXN_STATE on mismatch
- EndTxn v5+ empty abort: bump epoch so the client uses a fresh
  epoch for the next transaction
- InitProducerID: validate timeout against transaction.max.timeout.ms
  broker config (default 900000)
- TxnOffsetCommit: validate topics/partitions exist, return
  UNKNOWN_TOPIC_OR_PARTITION per-partition for missing ones
- Produce: reject client-sent control batches (INVALID_RECORD)
- Produce: reject non-transactional write during active transaction
  (INVALID_TXN_STATE)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- addPartitionEpochs: panic if a partition already has a higher or
  equal epoch, catching double-assignment bugs early
- Rebalance timeout: always reschedule when partitions are pending
  revocation, not just on first entry, so the timeout tracks the
  latest revocation state
- Document deliberate divergence in member epoch advancement: kfake
  requires full convergence (current == target) before advancing,
  which is conservative but correct

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- protocolsMatch: check that the joining member supports at least
  one protocol supported by all existing members, not just any
  existing protocol
- Rebalance timer: always reset with the current max timeout instead
  of only creating on first rebalance
- Pending sync timeout: track members that received JoinGroup
  responses but haven't sent SyncGroup, and remove them after
  the rebalance timeout fires
- Protocol selection: use Kafka-style voting where each member votes
  for their most-preferred universally-supported protocol

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Evict the oldest fetch session when per-broker session count
  exceeds the limit (1000), using lastUsed timestamp for LRU
  ordering
- Update skipped_features with deliberately skipped features for
  transactions, classic groups, produce, and 848 groups

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Filter unchanged partitions from incremental fetch responses. A partition
is included if it has records, an error, a changed high watermark, or a
changed log start offset - matching Kafka's CachedPartition logic.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Replace per-batch txnFirstOffset tracking with a sorted index on
partData for AbortedTransactions in read_committed fetch responses.
This correctly handles fetches starting mid-transaction where the
transaction's firstOffset is before the fetchOffset.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add max.incremental.fetch.session.cache.slots broker config to control
the per-broker fetch session limit. Rewrite TestFetchSessionEviction to
use a small cache (3 slots) and verify evicted clients can re-establish
sessions and continue consuming. Add TxnOffsetCommit partition
validation coverage to TestTxnAddOffsetsWithoutGroup.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@twmb twmb merged commit ffcae12 into master Feb 18, 2026
13 checks passed
@twmb twmb deleted the kfake branch February 18, 2026 19:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant

Comments