Skip to content

Comments

[ROCm][CI] Added MI325 mirrors (stage C)#35239

Open
AndreasKaratzas wants to merge 4 commits intovllm-project:mainfrom
ROCm:akaratza_gating_amdci_stage_c
Open

[ROCm][CI] Added MI325 mirrors (stage C)#35239
AndreasKaratzas wants to merge 4 commits intovllm-project:mainfrom
ROCm:akaratza_gating_amdci_stage_c

Conversation

@AndreasKaratzas
Copy link
Collaborator

@AndreasKaratzas AndreasKaratzas commented Feb 24, 2026

In this PR, we are gating more tests. This PR is dependent on:

I've put this PR up to start the evaluation of these tests and get more gating tests ready for merging as soon as issues are confirmed to be resolved.

Proposed list of mirrors:

  • Entrypoints Integration (API Server 1)
  • Entrypoints V1
  • Examples
  • Language Models Tests (Standard)
  • Multi-Modal Models (Standard)
  • Plugins

cc @kenroche

Signed-off-by: Andreas Karatzas <akaratza@amd.com>
@mergify mergify bot added ci/build rocm Related to AMD ROCm labels Feb 24, 2026
@github-project-automation github-project-automation bot moved this to Todo in AMD Feb 24, 2026
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request adds CI configuration to run more tests on MI325 hardware. While the changes are functionally correct, they introduce significant duplication of the mirror configuration across multiple YAML files. This is a maintainability concern, as it can make future updates difficult and error-prone. My review provides suggestions to use YAML anchors to centralize these configurations, which will improve the long-term health of the CI pipeline definitions.

Comment on lines +39 to +43
mirror:
amd:
device: mi325_1
depends_on:
- image-build-amd
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

This mirror configuration is duplicated for the 'Entrypoints Integration (v1)' step later in this file (lines 90-94). To improve maintainability and reduce redundancy, consider using YAML anchors. This will make future updates to the mirror configuration easier and less error-prone.

For example, you could define an anchor (e.g., at the top of the file, before steps:) and then reference it:

.amd_mi325_1_mirror: &amd_mi325_1_mirror
  mirror:
    amd:
      device: mi325_1
      depends_on:
      - image-build-amd

steps:
  - label: Entrypoints Integration (API Server)
    # ... other properties
    <<: *amd_mi325_1_mirror

  # ... other steps

  - label: Entrypoints Integration (v1)
    # ... other properties
    <<: *amd_mi325_1_mirror

Comment on lines +89 to +93
mirror:
amd:
device: mi325_1
depends_on:
- image-build-amd
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

This mirror configuration for mi325_1 is identical to configurations added in other files in this PR (e.g., .buildkite/test_areas/entrypoints.yaml, .buildkite/test_areas/models_language.yaml). To centralize configuration and improve maintainability, consider defining this as a reusable YAML anchor in a shared file if your Buildkite pipeline setup supports it. This would prevent potential inconsistencies if the mi325_1 configuration needs to be updated in the future.

Signed-off-by: Andreas Karatzas <akaratza@amd.com>
@AndreasKaratzas AndreasKaratzas added the ready ONLY add when PR is ready to merge/full CI is needed label Feb 24, 2026
@AndreasKaratzas
Copy link
Collaborator Author

Added the ready label for evaluations to start running on this.

Signed-off-by: Andreas Karatzas <akaratza@amd.com>
amd:
device: mi325
depends_on:
- image-build-amd
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll see if I need num_devices: 2 property here as well.

Signed-off-by: Andreas Karatzas <akaratza@amd.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ci/build ready ONLY add when PR is ready to merge/full CI is needed rocm Related to AMD ROCm

Projects

Status: Todo

Development

Successfully merging this pull request may close these issues.

1 participant