Commit 02f90e3
feat: Enhance healthiness and interruption reason for "PLX dashboard" (#1191)
Add additional metrics to the XLML metadata logging for JobSet observability.
- `jobset_name`: add jobset name.
- `pod_names`: List of pod names associated with the JobSet, captured before the interruption event.
Co-authored-by: Chris liao <388chris@gmail.com>1 parent 68f4a34 commit 02f90e3
File tree
5 files changed
+76
-28
lines changed- dags/tpu_observability
- utils
5 files changed
+76
-28
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
156 | 156 | | |
157 | 157 | | |
158 | 158 | | |
159 | | - | |
| 159 | + | |
| 160 | + | |
| 161 | + | |
160 | 162 | | |
161 | 163 | | |
162 | 164 | | |
163 | 165 | | |
164 | 166 | | |
165 | 167 | | |
166 | | - | |
| 168 | + | |
| 169 | + | |
| 170 | + | |
| 171 | + | |
| 172 | + | |
167 | 173 | | |
168 | 174 | | |
169 | 175 | | |
170 | 176 | | |
171 | | - | |
| 177 | + | |
172 | 178 | | |
173 | 179 | | |
174 | 180 | | |
| |||
199 | 205 | | |
200 | 206 | | |
201 | 207 | | |
202 | | - | |
| 208 | + | |
203 | 209 | | |
204 | 210 | | |
205 | 211 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
56 | 56 | | |
57 | 57 | | |
58 | 58 | | |
59 | | - | |
| 59 | + | |
| 60 | + | |
60 | 61 | | |
61 | 62 | | |
62 | 63 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
406 | 406 | | |
407 | 407 | | |
408 | 408 | | |
409 | | - | |
410 | | - | |
411 | | - | |
412 | | - | |
| 409 | + | |
| 410 | + | |
413 | 411 | | |
414 | 412 | | |
415 | 413 | | |
416 | 414 | | |
417 | 415 | | |
418 | 416 | | |
419 | 417 | | |
420 | | - | |
| 418 | + | |
| 419 | + | |
| 420 | + | |
| 421 | + | |
| 422 | + | |
421 | 423 | | |
422 | 424 | | |
423 | 425 | | |
424 | 426 | | |
425 | | - | |
| 427 | + | |
426 | 428 | | |
427 | 429 | | |
428 | 430 | | |
| |||
521 | 523 | | |
522 | 524 | | |
523 | 525 | | |
524 | | - | |
| 526 | + | |
525 | 527 | | |
526 | 528 | | |
527 | 529 | | |
| |||
Lines changed: 7 additions & 5 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
162 | 162 | | |
163 | 163 | | |
164 | 164 | | |
165 | | - | |
| 165 | + | |
| 166 | + | |
| 167 | + | |
166 | 168 | | |
167 | 169 | | |
168 | 170 | | |
169 | 171 | | |
170 | 172 | | |
171 | 173 | | |
172 | 174 | | |
173 | | - | |
174 | | - | |
| 175 | + | |
| 176 | + | |
175 | 177 | | |
176 | 178 | | |
177 | 179 | | |
178 | 180 | | |
179 | 181 | | |
180 | 182 | | |
181 | | - | |
| 183 | + | |
182 | 184 | | |
183 | 185 | | |
184 | 186 | | |
| |||
202 | 204 | | |
203 | 205 | | |
204 | 206 | | |
205 | | - | |
| 207 | + | |
206 | 208 | | |
207 | 209 | | |
208 | 210 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
28 | 28 | | |
29 | 29 | | |
30 | 30 | | |
| 31 | + | |
31 | 32 | | |
32 | 33 | | |
33 | 34 | | |
| |||
38 | 39 | | |
39 | 40 | | |
40 | 41 | | |
| 42 | + | |
41 | 43 | | |
42 | 44 | | |
43 | 45 | | |
| |||
507 | 509 | | |
508 | 510 | | |
509 | 511 | | |
| 512 | + | |
| 513 | + | |
| 514 | + | |
| 515 | + | |
| 516 | + | |
| 517 | + | |
| 518 | + | |
| 519 | + | |
| 520 | + | |
| 521 | + | |
| 522 | + | |
| 523 | + | |
| 524 | + | |
| 525 | + | |
| 526 | + | |
| 527 | + | |
| 528 | + | |
| 529 | + | |
510 | 530 | | |
511 | 531 | | |
512 | 532 | | |
| |||
724 | 744 | | |
725 | 745 | | |
726 | 746 | | |
727 | | - | |
| 747 | + | |
| 748 | + | |
728 | 749 | | |
729 | 750 | | |
730 | 751 | | |
| |||
749 | 770 | | |
750 | 771 | | |
751 | 772 | | |
752 | | - | |
753 | | - | |
754 | 773 | | |
755 | 774 | | |
756 | 775 | | |
757 | 776 | | |
758 | 777 | | |
759 | | - | |
760 | | - | |
761 | | - | |
762 | | - | |
763 | | - | |
764 | | - | |
765 | | - | |
| 778 | + | |
| 779 | + | |
| 780 | + | |
| 781 | + | |
| 782 | + | |
| 783 | + | |
| 784 | + | |
| 785 | + | |
| 786 | + | |
| 787 | + | |
| 788 | + | |
| 789 | + | |
| 790 | + | |
| 791 | + | |
| 792 | + | |
| 793 | + | |
| 794 | + | |
766 | 795 | | |
767 | 796 | | |
768 | | - | |
| 797 | + | |
| 798 | + | |
| 799 | + | |
| 800 | + | |
| 801 | + | |
| 802 | + | |
| 803 | + | |
| 804 | + | |
| 805 | + | |
769 | 806 | | |
770 | 807 | | |
771 | 808 | | |
| |||
0 commit comments