[Feature]: Do we have any plans to supporting spark cluster in test container?  

### Module

None

### Problem

Hello everyone, 

I'm a newcomer to using testcontainers. Due to my job requirements, I frequently need to develop Spark applications. For me, the most time-consuming part of working on Spark application, from design and development to debugging and deployment, is the need to recompile the code and generate a JAR file every time I make changes to the code logic and then submit it to the cluster to wait for the results. 

While Spark provides relatively robust JUnit test cases, setting the master to `local` doesn't truly replicate the issues that may arise in a distributed environment encounters data skew when trying to consume data from Kafka Cluster with more than 3 partitions, or if I want to develop Spark Shuffle components further, the existing JUnit test cases don't cover the potential problems in a distributed environment. 

So, when I first add testcontainer to my JUnit environment and execute it. I was very excited about the convenience it provides and really let me focus on the inner logics of the codes. 

That's why I wonder if testcontainer group plans to support Spark in the future like based on Yarn, Mesos, or even Kubernetes? 

While I am aware that AWS and Azure's platforms provide robust solutions for Spark with Databricks and related serverless API services. I still believe that there is a pressing need for heavy-duty computing frameworks like Spark, and Flink for beginners and their app developers. I also think that I'm certainly not the only one who has experienced reduced development efficiency due to environmental issues. 


If the solution is feasible, I will actively participate in building this feature. And looking forward to your reply and plans in this direction.  


### Solution

After referencing the implementation code of `KafkaContainerCluster`, I suppose that its container construction method is quite similar to the deployment approach of a Spark Cluster. In `KafkaContainerCluster`, a cluster is built with one Zookeeper and three KafkaContainers. 

Following the same solution approach, for the Spark Standalone deployment mode, we can refer to this folder's [docker](https://github.com/Rembrant777/bigdata_topic/tree/main/spark/docker) `docker-compose.yml` and `Dockerfile`(s) deployment method that to deploy different components into separate containers to achieve cluster deployment (Perhaps this solution is not very mature and requires more in-depth discussion about the details.). 

### Benefit

### For Spark Beginners: 
1. Simplify the process of running Spark code for beginners.

### For Spark application developers:
1. Require minimal resources and datasets to validate the data processing logic of their Spark applications, enabling debugging and optimization in a local environment.

2. Reduce the compilation time of the Spark application JAR file after each code modification and minimize the time spent on submitting the Spark application to a remote or local cluster for execution.

3. Enable quick regression testing of Spark application functionality improvements through JUnit test cases, reducing the need for additional maintenance costs in DevOps for operations personnel (as the setup is already established at the JUnit test level).

### For Spark Infra developers: 
1. Some of the underlying logic, which originally required the development of a MockServer, can be completely achieved through the use of testcontainers. This saves a significant amount of development time for creating mock cases.

### Alternatives

Nope, since it is a new module. As far as I know, it will not affect other modules. 

### Would you like to help contributing this feature?

Yes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[Feature]: Do we have any plans to supporting spark cluster in test container? #7657

Module

Problem

Solution

Benefit

For Spark Beginners:

For Spark application developers:

For Spark Infra developers:

Alternatives

Would you like to help contributing this feature?

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

[Feature]: Do we have any plans to supporting spark cluster in test container? #7657

Description

Module

Problem

Solution

Benefit

For Spark Beginners:

For Spark application developers:

For Spark Infra developers:

Alternatives

Would you like to help contributing this feature?

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions