Skip to content

Commit c14a40f

Browse files
committed
Release new version for Health Monitoring Agent (1.0.448.0_1.0.115.0) with minor improvements and bug fixes.
* Add support for TRN2 and G6 on Health Monitoring Agent * Fix DetectionLatency metrics publish. This HMA was tested on all new instance types supported (G6, Gr6, G6e, TRN2 families)
1 parent d1b88a0 commit c14a40f

File tree

3 files changed

+33
-14
lines changed

3 files changed

+33
-14
lines changed

helm_chart/HyperPodHelmChart/charts/health-monitoring-agent/templates/health-monitoring-agent.yaml

Lines changed: 19 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -91,6 +91,25 @@ spec:
9191
- ml.inf2.48xlarge
9292
- ml.trn1.32xlarge
9393
- ml.trn1n.32xlarge
94+
- ml.g6.xlarge
95+
- ml.g6.2xlarge
96+
- ml.g6.4xlarge
97+
- ml.g6.8xlarge
98+
- ml.g6.16xlarge
99+
- ml.g6.12xlarge
100+
- ml.g6.24xlarge
101+
- ml.g6.48xlarge
102+
- ml.gr6.4xlarge
103+
- ml.gr6.8xlarge
104+
- ml.g6e.xlarge
105+
- ml.g6e.2xlarge
106+
- ml.g6e.4xlarge
107+
- ml.g6e.8xlarge
108+
- ml.g6e.16xlarge
109+
- ml.g6e.12xlarge
110+
- ml.g6e.24xlarge
111+
- ml.g6e.48xlarge
112+
- ml.trn2.48xlarge
94113
containers:
95114
- name: health-monitoring-agent
96115
args:
Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,2 +1,2 @@
11
namespace: "aws-hyperpod"
2-
hmaimage: "905418368575.dkr.ecr.us-west-2.amazonaws.com/hyperpod-health-monitoring-agent:1.0.408.0_1.0.105.0"
2+
hmaimage: "905418368575.dkr.ecr.us-west-2.amazonaws.com/hyperpod-health-monitoring-agent:1.0.448.0_1.0.115.0"

helm_chart/readme.md

Lines changed: 13 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -108,19 +108,19 @@ helm upgrade dependencies helm_chart/HyperPodHelmChart --namespace kube-system
108108
- Training job auto resume is expected to work with Kubeflow training operator release v1.7.0, v1.8.0, v1.8.1 https://github.com/kubeflow/training-operator/releases
109109
- If you intend to use the Health Monitoring Agent container image from another region, please see below list to find relevant region's URI.
110110
```
111-
IAD 767398015722.dkr.ecr.us-east-1.amazonaws.com/hyperpod-health-monitoring-agent:1.0.408.0_1.0.105.0
112-
PDX 905418368575.dkr.ecr.us-west-2.amazonaws.com/hyperpod-health-monitoring-agent:1.0.408.0_1.0.105.0
113-
CMH 851725546812.dkr.ecr.us-east-2.amazonaws.com/hyperpod-health-monitoring-agent:1.0.408.0_1.0.105.0
114-
SFO 011528288828.dkr.ecr.us-west-1.amazonaws.com/hyperpod-health-monitoring-agent:1.0.408.0_1.0.105.0
115-
FRA 211125453373.dkr.ecr.eu-central-1.amazonaws.com/hyperpod-health-monitoring-agent:1.0.408.0_1.0.105.0
116-
ARN 654654141839.dkr.ecr.eu-north-1.amazonaws.com/hyperpod-health-monitoring-agent:1.0.408.0_1.0.105.0
117-
DUB 533267293120.dkr.ecr.eu-west-1.amazonaws.com/hyperpod-health-monitoring-agent:1.0.408.0_1.0.105.0
118-
LHR 011528288831.dkr.ecr.eu-west-2.amazonaws.com/hyperpod-health-monitoring-agent:1.0.408.0_1.0.105.0
119-
NRT 533267052152.dkr.ecr.ap-northeast-1.amazonaws.com/hyperpod-health-monitoring-agent:1.0.408.0_1.0.105.0
120-
BOM 011528288864.dkr.ecr.ap-south-1.amazonaws.com/hyperpod-health-monitoring-agent:1.0.408.0_1.0.105.0
121-
SIN 905418428165.dkr.ecr.ap-southeast-1.amazonaws.com/hyperpod-health-monitoring-agent:1.0.408.0_1.0.105.0
122-
SYD 851725636348.dkr.ecr.ap-southeast-2.amazonaws.com/hyperpod-health-monitoring-agent:1.0.408.0_1.0.105.0
123-
GRU 025066253954.dkr.ecr.sa-east-1.amazonaws.com/hyperpod-health-monitoring-agent:1.0.408.0_1.0.105.0
111+
IAD 767398015722.dkr.ecr.us-east-1.amazonaws.com/hyperpod-health-monitoring-agent:1.0.448.0_1.0.115.0
112+
PDX 905418368575.dkr.ecr.us-west-2.amazonaws.com/hyperpod-health-monitoring-agent:1.0.448.0_1.0.115.0
113+
CMH 851725546812.dkr.ecr.us-east-2.amazonaws.com/hyperpod-health-monitoring-agent:1.0.448.0_1.0.115.0
114+
SFO 011528288828.dkr.ecr.us-west-1.amazonaws.com/hyperpod-health-monitoring-agent:1.0.448.0_1.0.115.0
115+
FRA 211125453373.dkr.ecr.eu-central-1.amazonaws.com/hyperpod-health-monitoring-agent:1.0.448.0_1.0.115.0
116+
ARN 654654141839.dkr.ecr.eu-north-1.amazonaws.com/hyperpod-health-monitoring-agent:1.0.448.0_1.0.115.0
117+
DUB 533267293120.dkr.ecr.eu-west-1.amazonaws.com/hyperpod-health-monitoring-agent:1.0.448.0_1.0.115.0
118+
LHR 011528288831.dkr.ecr.eu-west-2.amazonaws.com/hyperpod-health-monitoring-agent:1.0.448.0_1.0.115.0
119+
NRT 533267052152.dkr.ecr.ap-northeast-1.amazonaws.com/hyperpod-health-monitoring-agent:1.0.448.0_1.0.115.0
120+
BOM 011528288864.dkr.ecr.ap-south-1.amazonaws.com/hyperpod-health-monitoring-agent:1.0.448.0_1.0.115.0
121+
SIN 905418428165.dkr.ecr.ap-southeast-1.amazonaws.com/hyperpod-health-monitoring-agent:1.0.448.0_1.0.115.0
122+
SYD 851725636348.dkr.ecr.ap-southeast-2.amazonaws.com/hyperpod-health-monitoring-agent:1.0.448.0_1.0.115.0
123+
GRU 025066253954.dkr.ecr.sa-east-1.amazonaws.com/hyperpod-health-monitoring-agent:1.0.448.0_1.0.115.0
124124
```
125125
126126
## 7. Troubleshooting

0 commit comments

Comments
 (0)