Skip to content

[SPARK-55701][SS] Fix race condition in CompactibleFileStreamLog.allFiles#54500

Open
zeruibao wants to merge 2 commits intoapache:masterfrom
zeruibao:SPARK-55701-fix-race-condition
Open

[SPARK-55701][SS] Fix race condition in CompactibleFileStreamLog.allFiles#54500
zeruibao wants to merge 2 commits intoapache:masterfrom
zeruibao:SPARK-55701-fix-race-condition

Conversation

@zeruibao
Copy link
Contributor

What changes were proposed in this pull request?

Changed the exception type thrown in CompactibleFileStreamLog.allFiles() from IllegalStateException to FileNotFoundException when a batch metadata file is missing (line 270). Since FileNotFoundException extends IOException, the existing retry loop (line 277) now catches this case and retries with an updated latestId.

Why are the changes needed?

There is a race condition between a batch reader (e.g., DESCRIBE TABLE via Thrift server) and a streaming writer performing compaction + cleanup concurrently:

  1. The reader calls getLatestBatchId() and observes latestId = N.
  2. The writer completes a new compaction batch and deleteExpiredLog removes old batch files.
  3. The reader tries to read the now-deleted batch files based on the stale latestId.

The allFiles() method already has a retry loop designed to handle this exact scenario — it catches IOException, refreshes latestId, and retries. However, the missing-file case was throwing IllegalStateException, which is not a subclass of IOException, so it escaped the retry loop entirely and surfaced as a fatal error to the user.

The fix changes the exception to FileNotFoundException so the existing retry logic handles it correctly. The safety check on re-throw (lines 284-286) ensures that if no newer compaction exists, the exception is still propagated rather than silently swallowed.

Does this PR introduce any user-facing change?

No

How was this patch tested?

UT

Was this patch authored or co-authored using generative AI tooling?

Get help with Claude 4.6 Opus but also review it carefully.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant