Skip to content

Add a new DAG that verifies Jobset Uptime metrics#1175

Merged
alfredyu-cienet merged 1 commit intoGoogleCloudPlatform:masterfrom
CIeNET-International:tpu-obs/release/pr-169
Feb 6, 2026
Merged

Add a new DAG that verifies Jobset Uptime metrics#1175
alfredyu-cienet merged 1 commit intoGoogleCloudPlatform:masterfrom
CIeNET-International:tpu-obs/release/pr-169

Conversation

@yuna-tzeng
Copy link
Member

Description

This changes introduces a new DAG that automates the process of creating a TPU v6e-16 node pool, launching a jobset, and monitoring the jobset uptime metric to ensure it behaves correctly. It also includes a negative test case to verify metric behavior over invalid time ranges.

Tests

GCP Composer name: tony-test (under GCP project: cloud-ml-auto-solutions)
GCP Composer version: 2.13.1

Checklist

Before submitting this PR, please make sure (put X in square brackets):

  • I have performed a self-review of my code.
  • I have necessary comments in my code, particularly in hard-to-understand areas.
  • I have run one-shot tests and provided workload links above if applicable.
  • I have made or will make corresponding changes to the doc if needed.

This changes introduces a new DAG that automates the process of creating a TPU v6e-16 node pool, launching a jobset, and monitoring the jobset uptime metric to ensure it behaves correctly.  It also includes a negative test case to verify metric behavior over invalid time ranges.
@alfredyu-cienet alfredyu-cienet merged commit 3fd17db into GoogleCloudPlatform:master Feb 6, 2026
7 checks passed
@alfredyu-cienet alfredyu-cienet deleted the tpu-obs/release/pr-169 branch February 6, 2026 06:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants