Skip to content

[Fix][Zeta] CleanLogOperation receives jobId=0 in separated cluster mode#10797

Open
hyboll wants to merge 1 commit intoapache:devfrom
hyboll:dev-zeta
Open

[Fix][Zeta] CleanLogOperation receives jobId=0 in separated cluster mode#10797
hyboll wants to merge 1 commit intoapache:devfrom
hyboll:dev-zeta

Conversation

@hyboll
Copy link
Copy Markdown
Contributor

@hyboll hyboll commented Apr 21, 2026

Purpose of this pull request

In separated cluster mode, CleanLogOperation is serialized on the master node and deserialized on the worker node. Since jobId was not included in writeInternal/readInternal, the worker always received jobId=0, causing all log files whose name contains "0" to be deleted by mistake.

Root Cause

CleanLogOperation extends TracingOperation which extends Hazelcast Operation. When Hazelcast deserializes an IdentifiedDataSerializable on a remote node, it uses the no-arg constructor and then calls readInternal to restore fields. Since CleanLogOperation did not override these methods, the jobId field defaulted to 0 after deserialization. The getLogFiles method uses name.contains(String.valueOf(jobId)) as the filter, so jobId=0 matches nearly every log file in the directory.

Changes:

  • Add writeInternal/readInternal to CleanLogOperation to properly
    serialize jobId across Hazelcast nodes
  • Add boundary guard (path == null || jobId <= 0) in
    TaskLogManagerService.clean() to prevent invalid cleanup attempts

Does this PR introduce any user-facing change?

No

How was this patch tested?

Added unit tests:

  • CleanLogOperationSerializationTest: Verifies jobId is preserved through Hazelcast SerializationService round-trip (serialize via writeInternal, deserialize via readInternal). Tests normal jobId value.
  • TaskLogManagerServiceTest: Verifies the boundary guard in clean(long jobId):
    • jobId=0: guard takes effect, no files deleted
    • Normal jobId: matching log files are deleted, unrelated files are preserved

Check list

@github-actions github-actions Bot added the Zeta label Apr 21, 2026
@DanielLeens
Copy link
Copy Markdown

Hi @hyboll, I rechecked the current PR head locally as seatunnel-review-10797 at c385cc761b14. I reviewed the full diff against upstream/dev and did not run local Maven/tests in this batch; this is a source-level review.

This PR fixes a real separated-cluster log-cleanup bug:

Job history expiration
  -> JobHistoryService.JobInfoExpiredListener.entryExpired(...)
  -> NodeEngineUtil.sendOperationToMemberNode(new CleanLogOperation(jobId), workerAddress)
  -> Hazelcast serializes CleanLogOperation
  -> worker deserializes and runs CleanLogOperation.runInternal()
  -> TaskLogManagerService.clean(jobId)

Before this patch, CleanLogOperation did not write/read jobId, so the worker-side operation could receive the default 0 and skip or mis-target cleanup. The new writeInternal/readInternal methods preserve the id, and TaskLogManagerService.clean() now returns early for invalid ids and logs concrete file paths.

I do not see a source-level blocker. The tests cover the serialization round-trip and the jobId=0 no-delete guard. Fetched CI metadata reports Build: CANCELLED, so CI needs a fresh green run.

Conclusion: can merge after fixes

Blocking item:

  1. Rerun CI and merge only after Build is green.

@chl-wxp chl-wxp requested a review from corgy-w April 29, 2026 03:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants