Skip to content

Commit 93da1ec

Browse files
Merge branch 'main' into fix/deffered-imports
2 parents c498848 + 96c5b2b commit 93da1ec

File tree

31 files changed

+2725
-276
lines changed

31 files changed

+2725
-276
lines changed

.github/workflows/codebuild-ci.yml

Lines changed: 1 addition & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -2,8 +2,7 @@ name: PR Checks
22
on:
33
pull_request_target:
44
branches:
5-
- "master*"
6-
- "main*"
5+
- "*"
76

87
concurrency:
98
group: ${{ github.workflow }}-${{ github.event.pull_request.number || github.head_ref }}

CHANGELOG.md

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,11 @@
11
# Changelog
22

3+
## v3.1.0 (2025-08-13)
4+
5+
### Features
6+
7+
* Task Governance feature for training jobs.
8+
39
## v3.0.2 (2025-07-31)
410

511
### Features

helm_chart/HyperPodHelmChart/charts/health-monitoring-agent/templates/health-monitoring-agent.yaml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -111,6 +111,7 @@ spec:
111111
- ml.g6e.48xlarge
112112
- ml.trn2.48xlarge
113113
- ml.p6-b200.48xlarge
114+
- ml.p6e-gb200.36xlarge
114115
containers:
115116
- name: health-monitoring-agent
116117
args:

helm_chart/HyperPodHelmChart/values.yaml

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -180,6 +180,8 @@ nvidia-device-plugin:
180180
- ml.p5.48xlarge
181181
- ml.p5e.48xlarge
182182
- ml.p5en.48xlarge
183+
- ml.p6-b200.48xlarge
184+
- ml.p6e-gb200.36xlarge
183185
tolerations:
184186
- key: nvidia.com/gpu
185187
operator: Exists
@@ -197,6 +199,7 @@ aws-efa-k8s-device-plugin:
197199
devicePlugin:
198200
enabled: true
199201
supportedInstanceLabels:
202+
# https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/efa.html#efa-instance-types
200203
values:
201204
- ml.c5n.9xlarge
202205
- ml.c5n.18xlarge
@@ -237,6 +240,8 @@ aws-efa-k8s-device-plugin:
237240
- ml.p5.48xlarge
238241
- ml.p5e.48xlarge
239242
- ml.p5en.48xlarge
243+
- ml.p6-b200.48xlarge
244+
- ml.p6e-gb200.36xlarge
240245
- ml.r7i.large
241246
- ml.r7i.xlarge
242247
- ml.r7i.2xlarge

hyperpod-custom-inference-template/pyproject.toml

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -20,4 +20,5 @@ include-package-data = true
2020

2121
[tool.setuptools.package-data]
2222
# for each versioned subpackage, include schema.json
23-
"hyperpod_custom_inference_template.v1_0" = ["schema.json"]
23+
"*" = ["schema.json"]
24+

hyperpod-jumpstart-inference-template/pyproject.toml

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -20,4 +20,5 @@ include-package-data = true
2020

2121
[tool.setuptools.package-data]
2222
# for each versioned subpackage, include schema.json
23-
"hyperpod_jumpstart_inference_template.v1_0" = ["schema.json"]
23+
"*" = ["schema.json"]
24+

hyperpod-pytorch-job-template/CHANGELOG.md

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,9 @@
1+
## v1.1.0 (2025-08-14)
2+
3+
### Features
4+
5+
* Added parameters for task governance feature
6+
17
## v1.0.2 (2025-07-31)
28

39
### Features

hyperpod-pytorch-job-template/hyperpod_pytorch_job_template/registry.py

Lines changed: 4 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -10,11 +10,13 @@
1010
# distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF
1111
# ANY KIND, either express or implied. See the License for the specific
1212
# language governing permissions and limitations under the License.
13-
from .v1_0.model import PyTorchJobConfig # Import your model
13+
from .v1_0 import model as v1_0_model # Import your model
14+
from .v1_1 import model as v1_1_model
1415
from typing import Dict, Type
1516
from pydantic import BaseModel
1617

1718
# Direct version-to-model mapping
1819
SCHEMA_REGISTRY: Dict[str, Type[BaseModel]] = {
19-
"1.0": PyTorchJobConfig,
20+
"1.0": v1_0_model.PyTorchJobConfig,
21+
"1.1": v1_1_model.PyTorchJobConfig,
2022
}

hyperpod-pytorch-job-template/hyperpod_pytorch_job_template/v1_0/model.py

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -353,6 +353,7 @@ def to_domain(self) -> Dict:
353353
result = {
354354
"name": self.job_name,
355355
"namespace": self.namespace,
356+
"labels": metadata_labels,
356357
"spec": job_kwargs,
357358
}
358359
return result
Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,7 @@
1+
from .model import PyTorchJobConfig
2+
3+
def validate(data: dict):
4+
return PyTorchJobConfig(**data)
5+
6+
7+
__all__ = ["validate", "PyTorchJobConfig"]

0 commit comments

Comments
 (0)