-
Notifications
You must be signed in to change notification settings - Fork 1.2k
Description
Search before asking
- I searched in the issues and found nothing similar.
Paimon version
0.8.2
Compute Engine
Flink
Minimal reproduce step
-
Run a Flink job writing to Paimon on Object Storage (S3).
-
Simulate a network timeout specifically during the "Commit/Rename" phase of a checkpoint.
-
Trigger the file system client to retry the rename operation.
-
Observe the deletion of the target snapshot file.
-
Wait for Flink failover; the job will get stuck in a restart loop with FileNotFoundException.
2025-11-12 18:05:09 java.lang.RuntimeException: java.io.FileNotFoundException: File 's3://dt-warehouse/paimon-warehouse/nd_game_sjmy_cdm.db/dwd_sjmy_gmlog_boss_user_bekilllist/manifest/manifest-list-20f67c4d-ba05-4dbf-a26a-24f229b949cc-32' not found, Possible causes: 1.snapshot expires too fast, you can configure 'snapshot.time-retained' option with a larger value. 2. consumption is too slow, you can improve the performance of consumption (For example, increasing parallelism). at org.apache.paimon.flink.sink.AsyncLookupSinkWrite.<init>(AsyncLookupSinkWrite.java:75) at org.apache.paimon.flink.sink.FlinkSink.lambda$createWriteProvider$672a9d60$1(FlinkSink.java:147) at org.apache.paimon.flink.sink.TableWriteOperator.initializeState(TableWriteOperator.java:78) at org.apache.flink.streaming.api.operators.StreamOperatorStateHandler.initializeOperatorState(StreamOperatorStateHandler.java:122) at org.apache.flink.streaming.api.operators.AbstractStreamOperator.initializeState(AbstractStreamOperator.java:316) at org.apache.flink.streaming.api.operators.AbstractStreamOperator.initializeState(AbstractStreamOperator.java:306) at org.apache.flink.streaming.runtime.tasks.RegularOperatorChain.initializeStateAndOpenOperators(RegularOperatorChain.java:107) at org.apache.flink.streaming.runtime.tasks.StreamTask.restoreGates(StreamTask.java:759) at org.apache.flink.streaming.runtime.tasks.StreamTaskActionExecutor$1.call(StreamTaskActionExecutor.java:55) at org.apache.flink.streaming.runtime.tasks.StreamTask.restoreInternal(StreamTask.java:734) at org.apache.flink.streaming.runtime.tasks.StreamTask.restore(StreamTask.java:699) at org.apache.flink.runtime.taskmanager.Task.runWithSystemExitMonitoring(Task.java:971) at org.apache.flink.runtime.taskmanager.Task.restoreAndInvoke(Task.java:940) at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:764) at org.apache.flink.runtime.taskmanager.Task.run(Task.java:574) at java.base/java.lang.Thread.run(Thread.java:829) Caused by: java.lang.RuntimeException: java.io.FileNotFoundException: File 's3://dt-warehouse/paimon-warehouse/nd_game_sjmy_cdm.db/dwd_sjmy_gmlog_boss_user_bekilllist/manifest/manifest-list-20f67c4d-ba05-4dbf-a26a-24f229b949cc-32' not found, Possible causes: 1.snapshot expires too fast, you can configure 'snapshot.time-retained' option with a larger value. 2. consumption is too slow, you can improve the performance of consumption (For example, increasing parallelism). at org.apache.paimon.utils.ObjectsCache.readSegments(ObjectsCache.java:143) at org.apache.paimon.utils.ObjectsCache.read(ObjectsCache.java:93) at org.apache.paimon.utils.ObjectsFile.readWithIOException(ObjectsFile.java:149) at org.apache.paimon.utils.ObjectsFile.read(ObjectsFile.java:134) at org.apache.paimon.utils.ObjectsFile.read(ObjectsFile.java:105) at org.apache.paimon.manifest.ManifestList.readDataManifests(ManifestList.java:90) at org.apache.paimon.operation.ManifestsReader.readManifests(ManifestsReader.java:128) at org.apache.paimon.operation.ManifestsReader.read(ManifestsReader.java:114) at org.apache.paimon.operation.AbstractFileStoreScan.readManifests(AbstractFileStoreScan.java:417) at org.apache.paimon.operation.AbstractFileStoreScan.plan(AbstractFileStoreScan.java:257) at org.apache.paimon.operation.AbstractFileStoreWrite.scanExistingFileMetas(AbstractFileStoreWrite.java:491) at org.apache.paimon.operation.AbstractFileStoreWrite.createWriterContainer(AbstractFileStoreWrite.java:440)
What doesn't meet your expectations?
The rename operation should be idempotent or check for the existence of the target file before considering the operation failed. If the source is missing but the target exists and matches expectations during a retry, it should be treated as a success, or at least the target file should not be deleted.
Anything else?
No response
Are you willing to submit a PR?
- I'm willing to submit a PR!