Skip to content

Conversation

@JinZhou5042
Copy link
Member

@JinZhou5042 JinZhou5042 commented Nov 3, 2025

Proposed Changes

A file may be transferred to another worker because a task that declares it as input is scheduled there, resulting in an extra replica that consumes worker storage space.

This function is called every time a task completes, it evaluates whether its input files have excessive replicas and removes those on workers that do not execute their dependent tasks.

update: make it an tuning parameter

Merge Checklist

The following items must be completed before PRs can be merged.
Check these off to verify you have completed all steps.

  • make test Run local tests prior to pushing.
  • make format Format source code to comply with lint policies. Note that some lint errors can only be resolved manually (e.g., Python)
  • make lint Run lint on source code prior to pushing.
  • Manual Update: Update the manual to reflect user-visible changes.
  • Type Labels: Select a github label for the type: bugfix, enhancement, etc.
  • Product Labels: Select a github label for the product: TaskVine, Makeflow, etc.
  • PR RTM: Mark your PR as ready to merge.

@JinZhou5042 JinZhou5042 self-assigned this Nov 3, 2025
@dthain
Copy link
Member

dthain commented Nov 3, 2025

Hmm, I am not sure that this is a global behavior that we want. The general function of taskvine is to replicate files on demand, and to only clean them up when prune is called.

Is there a reason to have this behavior in the manager (which doesn't know the future) as opposed to the workflow layer above it? (which may know the future)

@JinZhou5042
Copy link
Member Author

JinZhou5042 commented Nov 3, 2025

The workflow-level pruning removes all replicas for a file, because it has full knowledge of data dependencies.

In contrast, this manager-level cleanup only removes extra replicas beyond q->temp_replica_count to save space, which has nothing to do with the graph structure.

There will be at least 1 available replica remaining for future usage.

@dthain
Copy link
Member

dthain commented Nov 4, 2025

Ok, that makes more sense to me that "redundant" means "in excess of temp_replica_count".

I am wondering about the check for whether the file is currently in use. Does that apply in other situations where files are deleted as well?

@JinZhou5042
Copy link
Member Author

Ok, that makes more sense to me that "redundant" means "in excess of temp_replica_count".

I am wondering about the check for whether the file is currently in use. Does that apply in other situations where files are deleted as well?

No, that check only applies in this situation.

@JinZhou5042 JinZhou5042 mentioned this pull request Nov 4, 2025
7 tasks
@btovar
Copy link
Member

btovar commented Nov 4, 2025

Is it possible not to clear the redundant replica in the first place?

@JinZhou5042
Copy link
Member Author

JinZhou5042 commented Nov 4, 2025

Yeah, I’m on board with not doing it. Just want to make sure the part mentioned in the paper actually shows up in the code. If someone else wants to try it, they can turn the knob on and see the difference.

@JinZhou5042 JinZhou5042 mentioned this pull request Nov 4, 2025
7 tasks
@JinZhou5042
Copy link
Member Author

Closing this as it is incorporated in #4269

@JinZhou5042 JinZhou5042 closed this Nov 5, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants