Report actual GCP zone in Google Batch trace records#6854
Closed
Report actual GCP zone in Google Batch trace records#6854
Conversation
The Google Batch API does not expose the actual zone where a task executes. Query the GCP instance metadata service from within the task wrapper script to capture the real zone (e.g. europe-west2-a) into .command.zone, and read it back upon task completion to update the trace record. Falls back gracefully to the configured region when the metadata service is unavailable. Signed-off-by: Jordi Martínez <jmarti@seqera.io> Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> Signed-off-by: Jon Marti <jonathan.marti@seqera.io>
✅ Deploy Preview for nextflow-docs-staging canceled.
|
Member
|
When I suggested this?! 😄 |
Collaborator
Author
|
Collaborator
Author
|
@pditommaso @munishchouhan I checked #6646 today as part of the resolution of https://seqera.atlassian.net/browse/NF-353 and eventually this ticket in Expedite: https://seqera.atlassian.net/browse/ES-190 |
Member
|
Likely an agent, not me 😄 |
5 tasks
Member
|
I want to explore if it can be avoided the use of an external file and rely on the API to detected the zone. See #6855 |
Collaborator
Author
|
Closing in favor of #6855 — Paolo's approach of parsing the zone from
The only trade-off is one extra |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Fixes #6646
The Google Batch API does not expose the actual zone where a task executes — it only provides the configured region (e.g.
europe-west2). This means trace records report the region rather than the specific zone (e.g.europe-west2-a), which is important for cost analysis and debugging placement decisions.This PR implements the workaround suggested by @pditommaso: querying the GCP instance metadata service from within the task to capture the real zone.
Changes
TaskRun.groovy— AddedCMD_ZONE = '.command.zone'constant for the zone metadata fileGoogleBatchScriptLauncher.groovy— Inject acurlcall to the GCP metadata endpoint (http://metadata.google.internal/computeMetadata/v1/instance/zone) inheaderScript(), writing the result to.command.zonein the task work directory. The call is silent and non-fatal (2>/dev/null || true)GoogleBatchTaskHandler.groovy— On task completion, read.command.zone, parse the zone name from the metadata format (projects/<id>/zones/<zone>), and update theCloudMachineInfowith the actual zone. Uses avolatile boolean zoneUpdatedflag to ensure the file is read at most once (avoiding repeated remote I/O on gcsfuse)Design decisions
headerScript()— This runs at the top of.command.run, which executes for both regular tasks and array task children (each child runs its own.command.run). The parent array launcher (.command.sh) does not capture zone since each child writes its own.command.zoneGoogleBatchScriptLauncher), the trace record falls back to the configured region as beforezoneUpdatedflag is set in afinallyblock, so even if reading fails, we don't retry on everygetMachineInfo()call (which is invoked frequently by the polling monitor, Tower observer, and error handlers)Test plan
GoogleBatchScriptLauncherTest— 2 new tests: zone capture present in header script; zone capture absent from array launch commandGoogleBatchTaskHandlerTest— 6 new tests:getMachineInfo()calls🤖 Generated with Claude Code