-
-
Notifications
You must be signed in to change notification settings - Fork 1.8k
Description
Module
New Module
Problem
We don't have a simple way to run integration tests for spark programs. In my case, I either mock them to test the business logic, which defeats the purpose or end up using a large docker-compose file by manually updating the ports, environment variables, memory, and the number of work nodes depending upon my use case. If my co-worker wants to run the same program, I need to either share the docker-compose file, or we will probably end up setting up a spark cluster (like EMR) using one of the cloud providers. It would be great if testcontainers could support this use case, where we can have a consistently reproducible environment for running distributed programs using spark based on custom configurations and I would be happy to contribute this feature if we agree to go with this.
Solution
New spark module that would allow users to create custom spark clusters based on their business needs.
Benefit
Here are a few benefits:
- Eliminate the need to manage multiple docker-compose files for each spark program depending on their needs
- Consistently reproducible environment for running distributed programs using spark
- No additional cost, eliminates the need to host spark clusters on cloud
- Simplifies the developer & QA experience for building big data solutions using spark
Alternatives
Here are a few alternatives that we follow:
- Pass around multiple versions of
docker-compose.ymlfile that would create different configurations of spark cluster depending upon the business needs - Host a monolith or multiple small spark clusters using cloud provider like AWS, GCP and Azure.
Would you like to help contributing this feature?
Yes