generated from OPCODE-Open-Spring-Fest/template
-
Notifications
You must be signed in to change notification settings - Fork 8
Open
Description
So, what is it about?
I propose we containerize the ETL pipeline using Docker and Docker Compose.
This will create a consistent, isolated, and reproducible development environment for all contributors. Currently, the project setup relies on a contributor's local Python environment, which can lead to setup friction (OS differences, Python versions, package conflicts).
A containerized setup means anyone can clone the repo and get the pipeline running with a single command (docker-compose up) without worrying about local Python setup.
Acceptance Criteria
- Create a
Dockerfile- Use a lightweight base image (e.g.,
python:3.10-slim). - Copy and install dependencies from
requirements.txt. - Set the default command to run the pipeline (e.g.,
CMD ["python", "main.py"]).
- Use a lightweight base image (e.g.,
- Create a
docker-compose.ymlfile- Define a single service (e.g.,
etl). - It should build from the local
Dockerfile. - It must use a volume to mount the local code directory into the container (e.g.,
volumes: ['.:/app']). This is the most important part, as it allows contributors to fix theTODOs in the code and see their changes reflected inside the container without rebuilding.
- Define a single service (e.g.,
- Create a
.dockerignorefile- Should ignore common files like
venv/,__pycache__/,.git,.vscode/,.idea/, etc., to keep the build context clean and fast.
- Should ignore common files like
- Update
README.md- Add a new "Running with Docker" section with the new setup instructions.
Adding Docker is a valuable learning opportunity in itself, as it's a core tool in modern data engineering and lowers the barrier to entry for new contributors.
Code of Conduct
- I agree to follow this project's Code of Conduct
Metadata
Metadata
Assignees
Labels
No labels