Skip to content

[Feature]: Module to support execution for spark programs using testcontainers #5856

@hariohmprasath

Description

@hariohmprasath

Module

New Module

Problem

We don't have a simple way to run integration tests for spark programs. In my case, I either mock them to test the business logic, which defeats the purpose or end up using a large docker-compose file by manually updating the ports, environment variables, memory, and the number of work nodes depending upon my use case. If my co-worker wants to run the same program, I need to either share the docker-compose file, or we will probably end up setting up a spark cluster (like EMR) using one of the cloud providers. It would be great if testcontainers could support this use case, where we can have a consistently reproducible environment for running distributed programs using spark based on custom configurations and I would be happy to contribute this feature if we agree to go with this.

Solution

New spark module that would allow users to create custom spark clusters based on their business needs.

Benefit

Here are a few benefits:

  • Eliminate the need to manage multiple docker-compose files for each spark program depending on their needs
  • Consistently reproducible environment for running distributed programs using spark
  • No additional cost, eliminates the need to host spark clusters on cloud
  • Simplifies the developer & QA experience for building big data solutions using spark

Alternatives

Here are a few alternatives that we follow:

  • Pass around multiple versions of docker-compose.yml file that would create different configurations of spark cluster depending upon the business needs
  • Host a monolith or multiple small spark clusters using cloud provider like AWS, GCP and Azure.

Would you like to help contributing this feature?

Yes

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions