Skip to content

Conversation

shmuelk
Copy link
Collaborator

@shmuelk shmuelk commented Sep 17, 2025

This PR adds a test to the end to tests.

In particular it adds a test in which:

  1. The "system" is brought up
  2. Inference requests are sent.
  3. The model server deployment is scaled up by one pod
  4. Additional inference requests are made
  5. The test validates that some of the requests go to the added pod.
  6. The model server deployment is scaled down by one
  7. Additional inference requests are made

Fixes: #347

})

ginkgo.When("Scaling up and down the model servers", func() {
ginkgo.It("work should be distributed across all model servers", func() {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: if using ginkgo.It the text should probably align with that...

Suggested change
ginkgo.It("work should be distributed across all model servers", func() {
ginkgo.It("should distribute inference requests across all model servers", func() {

epp := createEndPointPicker(scaleConfig)

prefillPods, decodePods := getModelServerPods(podSelector, prefillSelector, decodeSelector)
gomega.Expect(prefillPods).Should(gomega.BeEmpty())
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

when is the goal of validating that there are no prefill Pods?
Can it somehow fail and if so, how does a failure affect the test if at all?

scaleDeployment(modelServers, 1)

scaledUpPrefillPods, scaledUpDecodePods := getModelServerPods(podSelector, prefillSelector, decodeSelector)
gomega.Expect(scaledUpPrefillPods).Should(gomega.BeEmpty())
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

Comment on lines +153 to +155
gomega.Expect(scaledUpDecodePods).Should(gomega.HaveLen(2))

time.Sleep(time.Second)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

might it be worthwhile to check that pods are in ready state and not rely on the 1s sleep?
Or maybe there's a different reason for sleeping...?

scaleDeployment(modelServers, -1)

scaledDownPrefillPods, scaledDownDecodePods := getModelServerPods(podSelector, prefillSelector, decodeSelector)
gomega.Expect(scaledDownPrefillPods).Should(gomega.BeEmpty())
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

gomega.Expect(scaledDownDecodePods).Should(gomega.HaveLen(1))
gomega.Expect(scaledDownDecodePods[0]).Should(gomega.BeElementOf(scaledUpDecodePods))

time.Sleep(time.Second)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same. Expect on a condition (with Eventually...) would be quicker.


scale.Spec.Replicas += int32(increment)
_, err = client.AppsV1().Deployments(nsName).UpdateScale(ctx, split[1], scale, v1.UpdateOptions{})
gomega.Expect(err).NotTo(gomega.HaveOccurred())
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the scale change can have succeeded but the replica count not have increased or the new pods not ready (or deleted) yet...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: In review

Development

Successfully merging this pull request may close these issues.

Create scale up/down test

2 participants