Skip to content

Commit 7606606

Browse files
authored
Release new version for Health Monitoring Agent 1.0.1038.0_1.0.305.0 with minor improvements and bug fixes. (#302)
* Calibrate certain threshold to make HMA more resilient to node level OOM * Add Warning Label when Thermal Caused Clock Speed below certain threshold * Fix corner case bugs
1 parent 7233490 commit 7606606

File tree

4 files changed

+15
-18
lines changed

4 files changed

+15
-18
lines changed

helm_chart/HyperPodHelmChart/charts/health-monitoring-agent/templates/_helpers.tpl

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -55,7 +55,7 @@ Generate the health monitoring agent image URI based on AWS region
5555
*/}}
5656
{{- define "health-monitoring-agent.imageUri" -}}
5757
{{- $region := "" -}}
58-
{{- $imageTag := .Values.imageTag | default "1.0.935.0_1.0.282.0" -}}
58+
{{- $imageTag := .Values.imageTag | default "1.0.1038.0_1.0.305.0" -}}
5959

6060
{{/* Debug: Show image tag selection if debug is enabled */}}
6161
{{- if .Values.debug -}}

helm_chart/HyperPodHelmChart/charts/health-monitoring-agent/templates/health-monitoring-agent.yaml

Lines changed: 0 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -108,9 +108,6 @@ spec:
108108
- ml.p6e-gb200.36xlarge
109109
containers:
110110
- name: health-monitoring-agent
111-
args:
112-
- --enable-k8s-exporter=false
113-
- --config.system-log-monitor=/config/system-message-monitor.json
114111
image: {{ include "health-monitoring-agent.imageUri" . }}
115112
resources:
116113
limits:

helm_chart/HyperPodHelmChart/charts/health-monitoring-agent/values.yaml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -25,7 +25,7 @@ imageTag: ""
2525

2626
# Override the health monitoring agent image URI
2727
# If specified, this will override the automatic region-based URI selection
28-
# Example: "905418368575.dkr.ecr.us-west-2.amazonaws.com/hyperpod-health-monitoring-agent:1.0.935.0_1.0.282.0"
28+
# Example: "905418368575.dkr.ecr.us-west-2.amazonaws.com/hyperpod-health-monitoring-agent:1.0.1038.0_1.0.305.0"
2929
hmaimage: ""
3030

3131
# Enable debug output for region selection process

helm_chart/readme.md

Lines changed: 13 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -234,19 +234,19 @@ helm upgrade dependencies helm_chart/HyperPodHelmChart --namespace kube-system
234234

235235
- **Supported Regions and their ECR URIs**:
236236
```
237-
us-east-1 (US East (N. Virginia)): 767398015722.dkr.ecr.us-east-1.amazonaws.com/hyperpod-health-monitoring-agent:1.0.935.0_1.0.282.0
238-
us-west-2 (US West (Oregon)): 905418368575.dkr.ecr.us-west-2.amazonaws.com/hyperpod-health-monitoring-agent:1.0.935.0_1.0.282.0
239-
us-east-2 (US East (Ohio)): 851725546812.dkr.ecr.us-east-2.amazonaws.com/hyperpod-health-monitoring-agent:1.0.935.0_1.0.282.0
240-
us-west-1 (US West (N. California)): 011528288828.dkr.ecr.us-west-1.amazonaws.com/hyperpod-health-monitoring-agent:1.0.935.0_1.0.282.0
241-
eu-central-1 (Europe (Frankfurt)): 211125453373.dkr.ecr.eu-central-1.amazonaws.com/hyperpod-health-monitoring-agent:1.0.935.0_1.0.282.0
242-
eu-north-1 (Europe (Stockholm)): 654654141839.dkr.ecr.eu-north-1.amazonaws.com/hyperpod-health-monitoring-agent:1.0.935.0_1.0.282.0
243-
eu-west-1 (Europe (Ireland)): 533267293120.dkr.ecr.eu-west-1.amazonaws.com/hyperpod-health-monitoring-agent:1.0.935.0_1.0.282.0
244-
eu-west-2 (Europe (London)): 011528288831.dkr.ecr.eu-west-2.amazonaws.com/hyperpod-health-monitoring-agent:1.0.935.0_1.0.282.0
245-
ap-northeast-1 (Asia Pacific (Tokyo)): 533267052152.dkr.ecr.ap-northeast-1.amazonaws.com/hyperpod-health-monitoring-agent:1.0.935.0_1.0.282.0
246-
ap-south-1 (Asia Pacific (Mumbai)): 011528288864.dkr.ecr.ap-south-1.amazonaws.com/hyperpod-health-monitoring-agent:1.0.935.0_1.0.282.0
247-
ap-southeast-1 (Asia Pacific (Singapore)): 905418428165.dkr.ecr.ap-southeast-1.amazonaws.com/hyperpod-health-monitoring-agent:1.0.935.0_1.0.282.0
248-
ap-southeast-2 (Asia Pacific (Sydney)): 851725636348.dkr.ecr.ap-southeast-2.amazonaws.com/hyperpod-health-monitoring-agent:1.0.935.0_1.0.282.0
249-
sa-east-1 (South America (São Paulo)): 025066253954.dkr.ecr.sa-east-1.amazonaws.com/hyperpod-health-monitoring-agent:1.0.935.0_1.0.282.0
237+
us-east-1 (US East (N. Virginia)): 767398015722.dkr.ecr.us-east-1.amazonaws.com/hyperpod-health-monitoring-agent:1.0.1038.0_1.0.305.0
238+
us-west-2 (US West (Oregon)): 905418368575.dkr.ecr.us-west-2.amazonaws.com/hyperpod-health-monitoring-agent:1.0.1038.0_1.0.305.0
239+
us-east-2 (US East (Ohio)): 851725546812.dkr.ecr.us-east-2.amazonaws.com/hyperpod-health-monitoring-agent:1.0.1038.0_1.0.305.0
240+
us-west-1 (US West (N. California)): 011528288828.dkr.ecr.us-west-1.amazonaws.com/hyperpod-health-monitoring-agent:1.0.1038.0_1.0.305.0
241+
eu-central-1 (Europe (Frankfurt)): 211125453373.dkr.ecr.eu-central-1.amazonaws.com/hyperpod-health-monitoring-agent:1.0.1038.0_1.0.305.0
242+
eu-north-1 (Europe (Stockholm)): 654654141839.dkr.ecr.eu-north-1.amazonaws.com/hyperpod-health-monitoring-agent:1.0.1038.0_1.0.305.0
243+
eu-west-1 (Europe (Ireland)): 533267293120.dkr.ecr.eu-west-1.amazonaws.com/hyperpod-health-monitoring-agent:1.0.1038.0_1.0.305.0
244+
eu-west-2 (Europe (London)): 011528288831.dkr.ecr.eu-west-2.amazonaws.com/hyperpod-health-monitoring-agent:1.0.1038.0_1.0.305.0
245+
ap-northeast-1 (Asia Pacific (Tokyo)): 533267052152.dkr.ecr.ap-northeast-1.amazonaws.com/hyperpod-health-monitoring-agent:1.0.1038.0_1.0.305.0
246+
ap-south-1 (Asia Pacific (Mumbai)): 011528288864.dkr.ecr.ap-south-1.amazonaws.com/hyperpod-health-monitoring-agent:1.0.1038.0_1.0.305.0
247+
ap-southeast-1 (Asia Pacific (Singapore)): 905418428165.dkr.ecr.ap-southeast-1.amazonaws.com/hyperpod-health-monitoring-agent:1.0.1038.0_1.0.305.0
248+
ap-southeast-2 (Asia Pacific (Sydney)): 851725636348.dkr.ecr.ap-southeast-2.amazonaws.com/hyperpod-health-monitoring-agent:1.0.1038.0_1.0.305.0
249+
sa-east-1 (South America (São Paulo)): 025066253954.dkr.ecr.sa-east-1.amazonaws.com/hyperpod-health-monitoring-agent:1.0.1038.0_1.0.305.0
250250
```
251251

252252
## 7. Troubleshooting

0 commit comments

Comments
 (0)