Skip to content

Kafka source issues after upgrade to edge: PollExceeded errors and consumption problems #6014

@earlbread

Description

@earlbread

Describe the bug
After upgrading from commit qw-airmail-20250522-hotfix (488375a9) to edge (660388a42756a739d0ef0aecd234ca953b85caf5), we are experiencing Kafka source failures.

error_log.txt

2025-12-09 16:45:58.757 | PollExceeded (Local: Maximum application poll interval (max.poll.interval.ms) exceeded)) |  
-- | -- | --
  |   | 2025-12-09 16:45:58.757 | 2025-12-09T07:45:58.757Z ERROR quickwit_actors::actor_context: exit activating-kill-switch actor=SourceActor-proud-RYA5 exit_status=Failure(Message consumption error: PollExceeded (Local: Maximum application poll interval (max.poll.interval.ms) exceeded) |  
  |   | 2025-12-09 16:45:58.757 | PollExceeded (Local: Maximum application poll interval (max.poll.interval.ms) exceeded)) |  
  |   | 2025-12-09 16:45:58.757 | 2025-12-09T07:45:58.757Z ERROR quickwit_actors::spawn_builder: actor-exit actor_id="SourceActor-proud-RYA5" phase=handling(quickwit_indexing::source::Loop) exit_status=Failure(Message consumption error: PollExceeded (Local: Maximum application poll interval (max.poll.interval.ms) exceeded) |  
  |   | 2025-12-09 16:45:58.476 | 2025-12-09T07:45:58.476Z ERROR rdkafka::client: librdkafka: Global error: PollExceeded (Local: Maximum application poll interval (max.poll.interval.ms) exceeded): Application maximum poll interval (600000ms) exceeded by 16ms |  
  |   | 2025-12-09 16:45:50.710 | PollExceeded (Local: Maximum application poll interval (max.poll.interval.ms) exceeded)) |  
  |   | 2025-12-09 16:45:50.710 | 2025-12-09T07:45:50.710Z ERROR quickwit_actors::actor_context: exit activating-kill-switch actor=SourceActor-polished-D8qx exit_status=Failure(Message consumption error: PollExceeded (Local: Maximum application poll interval (max.poll.interval.ms) exceeded) |  
  |   | 2025-12-09 16:45:50.710 | PollExceeded (Local: Maximum application poll interval (max.poll.interval.ms) exceeded)) |  
  |   | 2025-12-09 16:45:50.710 | 2025-12-09T07:45:50.710Z ERROR quickwit_actors::spawn_builder: actor-exit actor_id="SourceActor-polished-D8qx" phase=handling(quickwit_indexing::source::Loop) exit_status=Failure(Message consumption error: PollExceeded (Local: Maximum application poll interval (max.poll.interval.ms) exceeded) |  
  |   | 2025-12-09 16:45:50.710 | 2025-12-09T07:45:50.710Z ERROR rdkafka::client: librdkafka: Global error: PollExceeded (Local: Maximum application poll interval (max.poll.interval.ms) exceeded): Application maximum poll interval (600000ms) exceeded by 185ms

Steps to reproduce (if applicable)
Steps to reproduce the behavior:

  1. Run indexing under sustained high load where backpressure occurs
    2 .Observe PollExceeded errors or consumption stalls

Expected behavior
Kafka source should continue consuming messages under high load without max.poll.interval.ms exceeded errors, as it did in the previous version (qw-airmail-20250522-hotfix).

Configuration:
Please provide:

  1. Quickwit Version(edge: 660388a42756a739d0ef0aecd234ca953b85caf5)
  2. The index_config.yaml
version: 0.8

index_id: log.common.access_log_v2_quickwit

doc_mapping:
  field_mappings:
    - name: id
      type: text
      tokenizer: raw
      description: "unique identifier for the event"
    - name: specversion
      type: text
      stored: false
      indexed: false
      description: "version information about the CloudEvents specification"
    - name: source
      type: text
      tokenizer: raw
      description: "information about where the event occurred"
    - name: subject
      type: text
      tokenizer: raw
      description: "detailed information about the source where the event occurred"
    - name: time
      type: datetime
      input_formats:
        - unix_timestamp
        - iso8601
      output_format: unix_timestamp_nanos
      fast: true
      description: "timestamp of the event"
    - name: datacontenttype
      type: text
      tokenizer: raw
      description: "content type of the data"
    - name: requestId
      type: text
      tokenizer: raw
      description: "request id"
    - name: ip
      type: ip
      fast: true
      description: "ip address"
    - name: userAgent
      type: text
      tokenizer: default
    - name: xUserAgent
      type: text
      tokenizer: default
    - name: userId
      type: text
      tokenizer: raw
    - name: deviceId
      type: text
      tokenizer: raw
      description: "device's id eg) 6ae1f6a6-107d-3183-a1df-adf7368f9d10"
    - name: latency
      type: text
      indexed: false
      stored: false
      description: "latency of the request, but already in latencyNs, so we don't need to index it"
    - name: latencyNs
      type: i64
      fast: true
      description: "latency of the request in nanoseconds"
    - name: occurredAt
      type: datetime
      input_formats:
        - unix_timestamp
        - iso8601
      output_format: unix_timestamp_nanos
      description: "timestamp of the log entry"
    - name: httpStatusCode
      type: i64
      fast: true
      description: "HTTP status code"
    - name: httpMethod
      type: text
      fast: true
      description: "HTTP method"
    - name: httpPath
      type: text
      fast: true
      description: "HTTP path"
    - name: grpcStatusCode
      type: i64
      fast: true
      description: "gRPC status code"
    - name: grpcMethod
      type: text
      fast: true
      description: "gRPC method"
    - name: extra
      type: json
      expand_dots: true
      description: "extra fields"
    - name: kafkaConsumerGroupId
      type: text
      fast: false
      description: "Kafka Consumer Group Id"
    - name: kafkaConsumerClientId
      type: text
      fast: false
      description: "Kafka Consumer Client Id"
    - name: kafkaConsumerHostName
      type: text
      fast: false
      description: "Kafka Consumer Host Name"
    - name: kafkaTopic
      type: text
      fast: false
      description: "Kafka Topic"
    - name: kafkaPartition
      type: i64
      description: "Kafka Partition"
    - name: kafkaOffset
      type: i64
      description: "Kafka Offset"
    - name: kafkaMessageKey
      type: text
      tokenizer: raw
      description: "Kafka Message Key"
    - name: kafkaConsumingResult
      type: text
      fast: true
      description: "Kafka Consuming Result"
    - name: env
      type: text
      tokenizer: raw
      indexed: true
      description: "represents environmental information as an extension field in the cloudEvents specification"
    - name: region
      type: text
      tokenizer: raw
      fast: true
      indexed: true
      description: "represents region information as an extension field in the cloudEvents specification"
    - name: namespace
      type: text
      tokenizer: raw
      fast: true
      indexed: true
      description: "represents namespace information as an extension field in the cloudEvents specification"
  timestamp_field: time
  tag_fields: [region]
  partition_key: namespace

search_settings:
    default_search_fields: [id, requestId, userId, extra.grpc.headers.x-request-id]

indexing_settings:
  merge_policy:
    type: "stable_log"
    merge_factor: 10
    max_merge_factor: 12
    maturation_period: 48h
  commit_timeout_secs: 30

retention:
  period: 30 days
  schedule: daily

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions