Skip to content

Conversation

@sudiptob2
Copy link
Collaborator

@sudiptob2 sudiptob2 force-pushed the feat/153/client-mode-e2e branch from 3e29491 to e21a03c Compare December 22, 2025 21:35
@sudiptob2 sudiptob2 marked this pull request as ready for review December 24, 2025 00:18
@sudiptob2 sudiptob2 requested a review from GeorgeJahad January 6, 2026 20:20
Signed-off-by: Sudipto Baral <[email protected]>
Executors were removed from pendingExecutors in onExecutorRunning() before they registered with Spark, causing the allocator's gap calculation to see pending=0 and submit duplicates. Keep executors in pendingExecutors until they register.getPendingExecutorCount() removes registered executors during the count check, ensuring accurate gap calculation and preventing duplicate submissions.

Signed-off-by: Sudipto Baral <[email protected]>
Signed-off-by: Sudipto Baral <[email protected]>
Signed-off-by: Sudipto Baral <[email protected]>
Signed-off-by: Sudipto Baral <[email protected]>
Signed-off-by: Sudipto Baral <[email protected]>
Signed-off-by: Sudipto Baral <[email protected]>
Signed-off-by: Sudipto Baral <[email protected]>
Signed-off-by: Sudipto Baral <[email protected]>
Signed-off-by: Sudipto Baral <[email protected]>
@sudiptob2 sudiptob2 force-pushed the feat/153/client-mode-e2e branch from 3eda9d6 to 0db8465 Compare January 8, 2026 22:23
@GeorgeJahad
Copy link
Collaborator

Still not quite what I'm looking for.

Take a look at this code for a "more complete" baseSparkPiTest(), (which is meant to be used with all spark pi tests.):

https://github.com/GeorgeJahad/armada-spark/blob/04aaed7b7f33b7db966bc5957bf560b88047c9f1/src/test/scala/org/apache/spark/deploy/armada/e2e/ArmadaSparkE2E.scala#L184-L203

and this is my version of baseSparkPiGangTest():

https://github.com/GeorgeJahad/armada-spark/blob/04aaed7b7f33b7db966bc5957bf560b88047c9f1/src/test/scala/org/apache/spark/deploy/armada/e2e/ArmadaSparkE2E.scala#L208-L220

With those methods defined, the deployMode specific gang tests become very simple:

  test("Basic SparkPi job with gang scheduling - staticCluster", E2ETest) {
    baseSparkPiGangTest("cluster")
      .assertGangJob("armada-spark", 3) // 1 driver + 2 executors
      .run()
  }

  test("Basic SparkPi job with gang scheduling - staticClient", E2ETest) {
    baseSparkPiGangTest("client")
      .assertExecutorGangJob("armada-spark", 2) // Only 2 executors, no driver
      .run()
  }

All the boilerplate is in the moreCompleteBaseSparkPiTest(). The baseSparkPiGangTest() contains just the interesting parts of the test that are common to both. The deployMode specific tests just contain the parts that apply to that deployMode. Doesn't that seem clearer?

FYI I have run my version of the gang tests and they seem to work as expected:
https://github.com/GeorgeJahad/armada-spark/actions/runs/20840130283

Signed-off-by: Sudipto Baral <[email protected]>
Signed-off-by: Sudipto Baral <[email protected]>
Signed-off-by: Sudipto Baral <[email protected]>
@sudiptob2
Copy link
Collaborator Author

@GeorgeJahad updated now. While consolidating under a common base method, there was at least one test which was doing something different, I refactored them slightly to make use of the common base methods.

test("Basic SparkPi job with gang scheduling", E2ETest) {
E2ETestBuilder("basic-spark-pi-gang")
// ========================================================================
// Base helper method
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does this comment need to be moved?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes lets move it before the base method

Signed-off-by: Sudipto Baral <[email protected]>
Signed-off-by: Sudipto Baral <[email protected]>
Signed-off-by: Sudipto Baral <[email protected]>
@sudiptob2 sudiptob2 requested a review from GeorgeJahad January 9, 2026 22:06
set("armada.queue", "ARMADA_QUEUE")
private def baseIngressCLITest(executorCount: Int): E2ETestBuilder = {
baseSparkPiTest("spark-pi-ingress", "cluster", Map("test-type" -> "ingress"))
.withDriverIngress(ingressAnnotations)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Everything looks good. The last question I have is that the ingressAnnotations for the cli test now include the "kubernetes.io/ingress.class", but didn't before: https://github.com/armadaproject/armada-spark/blob/master/src/test/scala/org/apache/spark/deploy/armada/e2e/ArmadaSparkE2E.scala#L232-L235

Why the change?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added it to make it similar, I far as I understood, it doesn't affect the intention of the test. Let me know if you know of any specific reason that might affect the test.

Copy link
Collaborator

@GeorgeJahad GeorgeJahad left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm, thanks @sudiptob2 !

@sudiptob2 sudiptob2 merged commit 43fbe54 into armadaproject:master Jan 12, 2026
12 checks passed
@sudiptob2 sudiptob2 deleted the feat/153/client-mode-e2e branch January 12, 2026 14:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add static client mode e2e tests (see #148)

3 participants