test: Scale up and down the model server during an end to end test #354

shmuelk · 2025-09-17T12:40:14Z

This PR adds a test to the end to tests.

In particular it adds a test in which:

The "system" is brought up
Inference requests are sent.
The model server deployment is scaled up by one pod
Additional inference requests are made
The test validates that some of the requests go to the added pod.
The model server deployment is scaled down by one
Additional inference requests are made

Fixes: #347

Signed-off-by: Shmuel Kallner <[email protected]>

elevran · 2025-10-20T13:38:50Z

test/e2e/e2e_test.go

 	})
+
+	ginkgo.When("Scaling up and down the model servers", func() {
+		ginkgo.It("work should be distributed across all model servers", func() {


nit: if using ginkgo.It the text should probably align with that...

Suggested change

ginkgo.It("work should be distributed across all model servers", func() {

ginkgo.It("should distribute inference requests across all model servers", func() {

elevran · 2025-10-20T13:40:18Z

test/e2e/e2e_test.go

+			epp := createEndPointPicker(scaleConfig)
+
+			prefillPods, decodePods := getModelServerPods(podSelector, prefillSelector, decodeSelector)
+			gomega.Expect(prefillPods).Should(gomega.BeEmpty())


when is the goal of validating that there are no prefill Pods?
Can it somehow fail and if so, how does a failure affect the test if at all?

elevran · 2025-10-20T13:40:43Z

test/e2e/e2e_test.go

+			scaleDeployment(modelServers, 1)
+
+			scaledUpPrefillPods, scaledUpDecodePods := getModelServerPods(podSelector, prefillSelector, decodeSelector)
+			gomega.Expect(scaledUpPrefillPods).Should(gomega.BeEmpty())


elevran · 2025-10-20T13:41:58Z

test/e2e/e2e_test.go

+			gomega.Expect(scaledUpDecodePods).Should(gomega.HaveLen(2))
+
+			time.Sleep(time.Second)


might it be worthwhile to check that pods are in ready state and not rely on the 1s sleep?
Or maybe there's a different reason for sleeping...?

elevran · 2025-10-20T13:42:41Z

test/e2e/e2e_test.go

+			scaleDeployment(modelServers, -1)
+
+			scaledDownPrefillPods, scaledDownDecodePods := getModelServerPods(podSelector, prefillSelector, decodeSelector)
+			gomega.Expect(scaledDownPrefillPods).Should(gomega.BeEmpty())


elevran · 2025-10-20T13:43:11Z

test/e2e/e2e_test.go

+			gomega.Expect(scaledDownDecodePods).Should(gomega.HaveLen(1))
+			gomega.Expect(scaledDownDecodePods[0]).Should(gomega.BeElementOf(scaledUpDecodePods))
+
+			time.Sleep(time.Second)


same. Expect on a condition (with Eventually...) would be quicker.

elevran · 2025-10-20T13:45:04Z

test/e2e/utils_test.go

+
+			scale.Spec.Replicas += int32(increment)
+			_, err = client.AppsV1().Deployments(nsName).UpdateScale(ctx, split[1], scale, v1.UpdateOptions{})
+			gomega.Expect(err).NotTo(gomega.HaveOccurred())


the scale change can have succeeded but the replica count not have increased or the new pods not ready (or deleted) yet...

shmuelk force-pushed the scale-up-down-test branch from 8f159b3 to 22c5fa8 Compare September 17, 2025 12:53

shmuelk added 6 commits September 28, 2025 16:31

Added a helper to scale up/down deployments

84640ce

Signed-off-by: Shmuel Kallner <[email protected]>

Added a test in which the model server is scaled up and down

c71b925

Signed-off-by: Shmuel Kallner <[email protected]>

Use the latest llm-d-inference-sim release

ae59884

Signed-off-by: Shmuel Kallner <[email protected]>

Fixed typo

c9b5053

Signed-off-by: Shmuel Kallner <[email protected]>

Fixed lint issue

36fe769

Signed-off-by: Shmuel Kallner <[email protected]>

Restored code commented out for debugging

793c1de

Signed-off-by: Shmuel Kallner <[email protected]>

shmuelk force-pushed the scale-up-down-test branch from cd7e869 to 793c1de Compare September 28, 2025 13:33

elevran added this to llm-d-inference-scheduler Oct 16, 2025

elevran moved this to In review in llm-d-inference-scheduler Oct 19, 2025

elevran requested changes Oct 20, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

test: Scale up and down the model server during an end to end test #354

test: Scale up and down the model server during an end to end test #354

Uh oh!

shmuelk commented Sep 17, 2025

Uh oh!

elevran Oct 20, 2025

Uh oh!

elevran Oct 20, 2025

Uh oh!

elevran Oct 20, 2025

Uh oh!

elevran Oct 20, 2025

Uh oh!

elevran Oct 20, 2025

Uh oh!

elevran Oct 20, 2025

Uh oh!

elevran Oct 20, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

	ginkgo.It("work should be distributed across all model servers", func() {
	ginkgo.It("should distribute inference requests across all model servers", func() {

		gomega.Expect(scaledUpDecodePods).Should(gomega.HaveLen(2))

		time.Sleep(time.Second)

test: Scale up and down the model server during an end to end test #354

Are you sure you want to change the base?

test: Scale up and down the model server during an end to end test #354

Uh oh!

Conversation

shmuelk commented Sep 17, 2025

Uh oh!

elevran Oct 20, 2025

Choose a reason for hiding this comment

Uh oh!

elevran Oct 20, 2025

Choose a reason for hiding this comment

Uh oh!

elevran Oct 20, 2025

Choose a reason for hiding this comment

Uh oh!

elevran Oct 20, 2025

Choose a reason for hiding this comment

Uh oh!

elevran Oct 20, 2025

Choose a reason for hiding this comment

Uh oh!

elevran Oct 20, 2025

Choose a reason for hiding this comment

Uh oh!

elevran Oct 20, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants