Commit 68f4a34
authored
Add a new DAG that validates data provided by the
This change automates a DAG that does cross-validation of TPU performance data. It ensures that the metrics reported directly from the hardware are consistent with the data ingested into the cloud monitoring pipeline. The verification suite includes metrics such as TPU Utilization, TensorCore Activity, Memory Usage, and Latency, and is designed to automatically scale as new metric strategies are added to the validation library.tpu-info CLI (#1190)1 parent ae3b22d commit 68f4a34
File tree
8 files changed
+1059
-10
lines changed- dags
- common/scheduling_helper
- tpu_observability
- utils
8 files changed
+1059
-10
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
52 | 52 | | |
53 | 53 | | |
54 | 54 | | |
| 55 | + | |
55 | 56 | | |
56 | 57 | | |
57 | 58 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
162 | 162 | | |
163 | 163 | | |
164 | 164 | | |
165 | | - | |
| 165 | + | |
166 | 166 | | |
167 | 167 | | |
168 | 168 | | |
| |||
0 commit comments