Skip to content

[Bug]: Behavior change between Beam 2.66 and 2.71 causing severe buffering #37664

@raishakti-1983

Description

@raishakti-1983

What happened?

Environment:
Beam SDK: 2.66.0 (stable) vs 2.71.0 (problematic)
Runner: Google Cloud Dataflow (Streaming)
Language: Java
Source: KafkaIO

Pipeline Structure
KafkaIO
→ MapElements (KafkaRecordToKV)
→ Window.into(...)
→ GroupByKey
→ Sort
→ BigtableIO read/ write
→ Write to Kafka

Window / Trigger configuration
window
.triggering(
AfterWatermark.pastEndOfWindow()
.withEarlyFirings(
AfterPane.elementCountAtLeast(elementCount))
.withLateFirings(
AfterPane.elementCountAtLeast(lateElementCount)))
.discardingFiredPanes()
.withAllowedLateness(Duration.standardSeconds(allowedLateness));

window duration: 1 sec
allowed lateness: 600 sec
early count: 100
late count: 1

Observed behavior
Beam 2.66
Stable message delay, not a big accumulation at GroupAndSort step. Data Freshness graph shows a stable pattern
Beam 2.71
Delay shows saw-tooth pattern (buffer → flush cycles). Seems accumulate significantly larger panes before flushing

Important observation
No code or traffic change — only Beam version upgrade.

Image Image

Issue Priority

Priority: 2 (default / most bugs should be filed as P2)

Issue Components

  • Component: Python SDK
  • Component: Java SDK
  • Component: Go SDK
  • Component: Typescript SDK
  • Component: IO connector
  • Component: Beam YAML
  • Component: Beam examples
  • Component: Beam playground
  • Component: Beam katas
  • Component: Website
  • Component: Infrastructure
  • Component: Spark Runner
  • Component: Flink Runner
  • Component: Samza Runner
  • Component: Twister2 Runner
  • Component: Hazelcast Jet Runner
  • Component: Google Cloud Dataflow Runner

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions