Skip to content

[FLINK-39909][state] Fix heap state backend savepoint NPE with null map state values#28389

Open
mukul-8 wants to merge 1 commit into
apache:masterfrom
mukul-8:FLINK-39909/heap-savepoint-null-map-state
Open

[FLINK-39909][state] Fix heap state backend savepoint NPE with null map state values#28389
mukul-8 wants to merge 1 commit into
apache:masterfrom
mukul-8:FLINK-39909/heap-savepoint-null-map-state

Conversation

@mukul-8

@mukul-8 mukul-8 commented Jun 11, 2026

Copy link
Copy Markdown
Contributor

What is the purpose of the change

Found while investigating FLINK-38144. HeapKeyValueStateIterator unconditionally calls userValueSerializer.serialize(userValue, valueOut) after writing the null flag, causing NPE during canonical savepoint when MapState contains null values with null-unsafe serializers (IntSerializer, BooleanSerializer, LongSerializer, etc.).

Same asymmetric pattern fixed for RocksDB/ForSt in #26831 (FLINK-38137). Only affects canonical savepoints — heap checkpoints use MapSerializer.serialize() which handles nulls internally

Brief change log

  • Skip userValueSerializer.serialize() when value is null in HeapKeyValueStateIterator, matching the deserialization side which already checks the null flag
  • Add testMapStateWithNullUnsafeSerializerCheckpointingAndRestore to MapStateNullValueCheckpointingITCase using IntSerializer (null-unsafe) to cover this code path

Verifying this change

  • Added integration test testMapStateWithNullUnsafeSerializerCheckpointingAndRestore — runs across all backends (hashmap, rocksdb, forst) and snapshot types (full checkpoint, incremental checkpoint, canonical savepoint, native savepoint)
  • Manually verified with a long-running job using hashmap backend + null map state values + savepoint/restore

Does this pull request potentially affect one of the following parts:

  • Dependencies (does it add or upgrade a dependency): no
  • The public API, i.e., is any changed class annotated with @Public(Evolving): no
  • The serializers: no
  • The runtime per-record code paths (performance sensitive): no
  • Anything that affects deployment or recovery: JobManager (and its components), Checkpointing, Kubernetes/Yarn, ZooKeeper: no
  • The S3 file system connector: no

Documentation

  • Does this pull request introduce a new feature? no

Was generative AI tooling used to co-author this PR?
  • Yes (please specify the tool below)
    kiro-cli

@flinkbot

flinkbot commented Jun 11, 2026

Copy link
Copy Markdown
Collaborator

CI report:

Bot commands The @flinkbot bot supports the following commands:
  • @flinkbot run azure re-run the last Azure build

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants