Skip to content

Conversation

@jorgee
Copy link
Contributor

@jorgee jorgee commented Nov 12, 2025

close #5888

.command.stage file is created when stage script is larger than 1MB and stage file is in the default filesystem. In the case of Google Batch executor, the stage file is /mnt/xxx/ because of the gcsfuse. Later, it tries to write the file that doesn't exist in the head node, so it fails. It is mainly due to the targetStage file returns the mount path instead of the remote path as for .command.run, etc.

In this PR, I have fixed the error by returning he remote version of the stage file instead of the localone when requesting the targetStageFile.

@netlify
Copy link

netlify bot commented Nov 12, 2025

Deploy Preview for nextflow-docs-staging canceled.

Name Link
🔨 Latest commit 37eab74
🔍 Latest deploy log https://app.netlify.com/projects/nextflow-docs-staging/deploys/69179b1c71ed800008b629ef

@bentsherman
Copy link
Member

Ideally it should not write the stage script at all for google batch. That was the intention of this check:

// enable only when the stage uses the default file system, i.e. it's not a remote object storage file
// see https://github.com/nextflow-io/nextflow/issues/4279
if( stageFile.fileSystem == FileSystems.default && stagingScript.size() >= stageFileThreshold.bytes ) {
stageScript = stagingScript
return header + "bash ${stageFile}"
}
else
return header + stagingScript

But I didn't consider that the google batch script launcher overwrites the task workDir to be the local container-mounted path. So I'm not sure if there is a clean way to fix this check

@jorgee
Copy link
Contributor Author

jorgee commented Nov 13, 2025

I fix changing this because it was the simpler one that implied less modifications. The easy approaches I was considering is overriding the stageCommand method (the one that you mention) in the Google case with just the 'else' part, or encapsulate the check in a function and override to return false for the Google batch case. Do you think one of these is more suitable?

@bentsherman
Copy link
Member

I feel like the cleanest solution is to add a method to Executor that specifies whether to use the stage script. It should be false by default and the AbstractGridExecutor should override it to be true.

For that I think you'll have to add a field to the TaskBean since the BashWrapperBuilder doesn't have a reference to the task executor

@jorgee
Copy link
Contributor Author

jorgee commented Nov 14, 2025

I have finally implemented the fix by adding the stageFileEnabled flag in the BashWrapperBuilder which is only set in the GridTaskHandler. To pass something from the Executor to the BashWrapperBuilder, we need to add fields in TaskRun and TaskBean. I see it cleaner and simpler.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Staging script not found when running on Google Batch

3 participants