[ROCm][CI] Added MI325 mirrors (stage C)#35239
[ROCm][CI] Added MI325 mirrors (stage C)#35239AndreasKaratzas wants to merge 4 commits intovllm-project:mainfrom
Conversation
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
There was a problem hiding this comment.
Code Review
This pull request adds CI configuration to run more tests on MI325 hardware. While the changes are functionally correct, they introduce significant duplication of the mirror configuration across multiple YAML files. This is a maintainability concern, as it can make future updates difficult and error-prone. My review provides suggestions to use YAML anchors to centralize these configurations, which will improve the long-term health of the CI pipeline definitions.
| mirror: | ||
| amd: | ||
| device: mi325_1 | ||
| depends_on: | ||
| - image-build-amd |
There was a problem hiding this comment.
This mirror configuration is duplicated for the 'Entrypoints Integration (v1)' step later in this file (lines 90-94). To improve maintainability and reduce redundancy, consider using YAML anchors. This will make future updates to the mirror configuration easier and less error-prone.
For example, you could define an anchor (e.g., at the top of the file, before steps:) and then reference it:
.amd_mi325_1_mirror: &amd_mi325_1_mirror
mirror:
amd:
device: mi325_1
depends_on:
- image-build-amd
steps:
- label: Entrypoints Integration (API Server)
# ... other properties
<<: *amd_mi325_1_mirror
# ... other steps
- label: Entrypoints Integration (v1)
# ... other properties
<<: *amd_mi325_1_mirror| mirror: | ||
| amd: | ||
| device: mi325_1 | ||
| depends_on: | ||
| - image-build-amd |
There was a problem hiding this comment.
This mirror configuration for mi325_1 is identical to configurations added in other files in this PR (e.g., .buildkite/test_areas/entrypoints.yaml, .buildkite/test_areas/models_language.yaml). To centralize configuration and improve maintainability, consider defining this as a reusable YAML anchor in a shared file if your Buildkite pipeline setup supports it. This would prevent potential inconsistencies if the mi325_1 configuration needs to be updated in the future.
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
|
Added the ready label for evaluations to start running on this. |
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
| amd: | ||
| device: mi325 | ||
| depends_on: | ||
| - image-build-amd |
There was a problem hiding this comment.
I'll see if I need num_devices: 2 property here as well.
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
In this PR, we are gating more tests. This PR is dependent on:
I've put this PR up to start the evaluation of these tests and get more gating tests ready for merging as soon as issues are confirmed to be resolved.
Proposed list of mirrors:
cc @kenroche