Migrate to consolidated e2e test suite by mamy-CS · Pull Request #760 · llm-d/llm-d-workload-variant-autoscaler

mamy-CS · 2026-02-18T19:27:25Z

Summary

Consolidated e2e test suite (test/e2e/) infrastructure

All tests use an environment-agnostic consolidated suite
Added test fixtures and builders
Added test configuration system
Added better error handling and diagnostics throughout test suite
Updated to run test-e2e-smoke-with-setup automatically on PRs with code changes
Added deprecation notes to test-e2e and test-e2e-openshift Makefile targets
Old targets remain functional for backward compatibility during migration, and will be removed after full verification
Some tests are labeled flaky/ skipped (currently being worked on by other folks)
More prs coming to update/ add/ remove e2es as needed, this pr is focused on e2e infrastructure

Signed-off-by: Mohammed Abdi <mohammed.munir.abdi@ibm.com>

gger ci Signed-off-by: Mohammed Abdi <mohammed.munir.abdi@ibm.com>

mamy-CS · 2026-02-18T21:27:15Z

/ok-to-test

github-actions · 2026-02-18T21:27:25Z

🚀 E2E tests triggered by /ok-to-test

View the OpenShift E2E workflow run

github-actions · 2026-02-18T21:29:57Z

GPU Pre-flight Check ✅

GPUs are available for e2e-openshift tests. Proceeding with deployment.

Resource	Total	Allocated	Available
GPUs	50	18	32

Cluster	Value
Nodes	16 (7 with GPUs)
Total CPU	993 cores
Total Memory	10383 Gi
GPUs required	4 (min) / 6 (recommended)

Signed-off-by: Mohammed Abdi <mohammed.munir.abdi@ibm.com>

lionelvillard · 2026-02-19T18:06:42Z

deploy/kind-emulator/install.sh

    else
-        log_success "Successfully pulled image '$WVA_IMAGE_REPO:$WVA_IMAGE_TAG' from registry"
+        # Try to pull the image, or use local image if pull fails
+        if ! docker pull "$WVA_IMAGE_REPO:$WVA_IMAGE_TAG"; then


what's the usecase for pulling wva image from a (I suppose) remote repository?

This is for testing a released version without building locally, if I understand your question correctly.

lionelvillard · 2026-02-19T18:10:25Z

internal/engines/saturation/engine.go

 )

 // Constants for MetricsAvailable condition
+// Note: Reasons should match api/v1alpha1 constants for consistency


what about moving the 4 constants below to llmdVariantAutoscalingV1alpha1? Less code that way

lionelvillard · 2026-02-19T18:19:12Z

test/e2e/fixtures/hpa_builder.go

+)
+
+// CreateHPA creates a HorizontalPodAutoscaler resource for WVA integration
+func CreateHPA(


People familiar with the k8s API verbs would expect Create to fail if the object already exists. It may be a source of confusion. What about having (at least) 3 functions: CreateHPA, DeleteHPA and EnsureHPA? The 2 first functions match the semantic of the k8s API verbs, and the third one is a convenient function calling lower-level function. If you agree then make sure to also update the other builder functions

lionelvillard

LGTM, thanks! I added minor comments.

lionelvillard · 2026-02-19T18:21:10Z

test/e2e/fixtures/infra_builder.go

+// InferencePool compatibility via llm-d.ai/model-pool label.
+// This function is idempotent: it will delete any existing deployment with the same name
+// before creating a new one to handle leftover resources from previous test runs.
+func CreateModelService(ctx context.Context, k8sClient *kubernetes.Clientset, namespace, name, poolName, modelID string, useSimulator bool, maxNumSeqs int) error {


I would put this builder function in model_service_builder.go for consistency reason.

lionelvillard · 2026-02-19T18:23:00Z

test/e2e/config.go

+}
+
+// LoadConfigFromEnv reads e2e test configuration from environment variables
+func LoadConfigFromEnv() E2EConfig {


maybe consider return a pointer? Not very important.

yes, the struct isn't large enough to justify a pointer. Better keep the current code simple and clear

lionelvillard · 2026-02-19T18:23:52Z

test/e2e/config.go

+
+// Helper functions for environment variable parsing
+
+func getEnv(key, defaultValue string) string {


I'm not 100% sure but all getEnvXX function can be replaced by one generic function

yes, avoiding verbosity there, and the intent is to keep it clear and explicit.

lionelvillard · 2026-02-19T18:39:25Z

test/e2e/config.go

+		// Feature gate defaults
+		ScaleToZeroEnabled: getEnvBool("SCALE_TO_ZERO_ENABLED", false),
+
+		// EPP defaults


I would be cleaner (IMO) to create "stacks" (InferencePool, VA, ModelService, etc..) when running the tests, to not duplicate this logic. Eventually e2e tests may be deploying helm releases, one per stack. This is beyond the scope of this PR, just something to keep in mind.

Yes, good suggestion.

asm582

/lgtm

We need to evaluate the use of hermetic tests and remove redundant tests in the subsequent PRs.

mamy-CS · 2026-02-19T20:32:26Z

Thanks for the review, merging this pr to unblock other work. Kept a note of some of the relevant comments. Will address them in subsequent pr.

mamy-CS added 8 commits February 18, 2026 12:48

consolidate e2es infra

d9445c0

Signed-off-by: Mohammed Abdi <mohammed.munir.abdi@ibm.com>

Merge remote-tracking branch 'upstream/main' into e2e-reorganize

2735d07

update ci

cbd2dfc

Signed-off-by: Mohammed Abdi <mohammed.munir.abdi@ibm.com>

add full e2e tests on kind with approval on ci

a4b0b90

Signed-off-by: Mohammed Abdi <mohammed.munir.abdi@ibm.com>

rm unused file

92e8611

Signed-off-by: Mohammed Abdi <mohammed.munir.abdi@ibm.com>

update ci image build

3209d24

Signed-off-by: Mohammed Abdi <mohammed.munir.abdi@ibm.com>

ci updates

4845536

Signed-off-by: Mohammed Abdi <mohammed.munir.abdi@ibm.com>

make ci final output clear

40d97af

Signed-off-by: Mohammed Abdi <mohammed.munir.abdi@ibm.com>

mamy-CS force-pushed the e2e-reorganize branch from 3d2cb38 to 40d97af Compare February 18, 2026 20:43

update e2e test full tri

21c173b

gger ci Signed-off-by: Mohammed Abdi <mohammed.munir.abdi@ibm.com>

mamy-CS requested review from asm582, clubanderson and lionelvillard February 18, 2026 21:19

mamy-CS added 2 commits February 18, 2026 16:39

ci manual trigger update

639a443

Signed-off-by: Mohammed Abdi <mohammed.munir.abdi@ibm.com>

valid ci e2e triggers group

04e081e

Signed-off-by: Mohammed Abdi <mohammed.munir.abdi@ibm.com>

mamy-CS self-assigned this Feb 18, 2026

mamy-CS force-pushed the e2e-reorganize branch from a6175b4 to 04e081e Compare February 18, 2026 22:13

lionelvillard reviewed Feb 19, 2026

View reviewed changes

asm582 approved these changes Feb 19, 2026

View reviewed changes

mamy-CS merged commit f8d74f2 into llm-d:main Feb 19, 2026
46 checks passed


		// Helper functions for environment variable parsing

		func getEnv(key, defaultValue string) string {

Comments

Conversation

mamy-CS commented Feb 18, 2026

Summary

Uh oh!

mamy-CS commented Feb 18, 2026

Uh oh!

github-actions bot commented Feb 18, 2026

Uh oh!

github-actions bot commented Feb 18, 2026

GPU Pre-flight Check ✅

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

lionelvillard left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

asm582 left a comment

Choose a reason for hiding this comment

Uh oh!

mamy-CS commented Feb 19, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants