Skip to content

Commit e94aec0

Browse files
authored
Merge pull request #6 from labrijisaad/dockerize-the-code
Added the Docker version of the Pipeline
2 parents 1654654 + 8917c62 commit e94aec0

15 files changed

+126
-3164
lines changed

.dockerignore

Lines changed: 23 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,23 @@
1+
# ignore Dockerfile and .dockerignore
2+
Dockerfile
3+
.dockerignore
4+
5+
# ignore potentially sensitive credentials files
6+
conf/**/*credentials*
7+
8+
# ignore all local configuration
9+
conf/local
10+
!conf/local/.gitkeep
11+
12+
# ignore everything in the following folders
13+
data
14+
logs
15+
notebooks
16+
references
17+
results
18+
19+
# except the following
20+
!logs/.gitkeep
21+
!notebooks/.gitkeep
22+
!references/.gitkeep
23+
!results/.gitkeep

Dockerfile

Lines changed: 26 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,26 @@
1+
ARG BASE_IMAGE=python:3.9-slim
2+
FROM $BASE_IMAGE as runtime-environment
3+
4+
# install project requirements
5+
COPY docker-requirements.txt /tmp/requirements.txt
6+
RUN pip install --no-cache -r /tmp/requirements.txt && rm -f /tmp/requirements.txt
7+
8+
# add kedro user
9+
ARG KEDRO_UID=999
10+
ARG KEDRO_GID=0
11+
RUN groupadd -f -g ${KEDRO_GID} kedro_group && \
12+
useradd -m -d /home/kedro_docker -s /bin/bash -g ${KEDRO_GID} -u ${KEDRO_UID} kedro_docker
13+
14+
WORKDIR /home/kedro_docker
15+
USER kedro_docker
16+
17+
FROM runtime-environment
18+
19+
# copy the whole project except what is in .dockerignore
20+
ARG KEDRO_UID=999
21+
ARG KEDRO_GID=0
22+
COPY --chown=${KEDRO_UID}:${KEDRO_GID} . .
23+
24+
EXPOSE 8888
25+
26+
CMD ["kedro", "run"]

Makefile

Lines changed: 29 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -1,29 +1,46 @@
1+
# Author information
12
AUTHOR := Labriji Saad
23

3-
# Default target
4+
# Default target when no arguments are provided to make
45
.DEFAULT_GOAL := help
56

6-
# run Jupyter Lab
7+
# Run Jupyter Lab - starts Jupyter Lab to allow for interactive development
78
jupy:
9+
@echo "Starting Jupyter Lab..."
810
@jupyter lab
911

10-
# run Kedro pipelines
12+
# Run Kedro pipelines - executes the main pipeline defined in your Kedro project
1113
run:
14+
@echo "Running Kedro pipeline..."
1215
@kedro run
1316

14-
# run Kedro Viz
17+
# Run Kedro Viz - launches Kedro's visualization tool to view the pipeline structure
1518
viz:
19+
@echo "Running Kedro Viz..."
1620
@kedro viz run
1721

18-
# run Kedro Viz in autoreload mode
22+
# Run Kedro Viz in autoreload mode - automatically refreshes the visualization when changes are detected
1923
autoviz:
24+
@echo "Running Kedro Viz in autoreload mode..."
2025
@kedro viz run --autoreload
2126

22-
# Display available make targets
27+
# Build Docker image for the project - creates a Docker image based on your Kedro project's specifications
28+
build:
29+
@echo "Building Docker image..."
30+
@kedro docker build
31+
32+
# Run Kedro project inside a Docker container - executes the project within a Docker container
33+
dockerun:
34+
@echo "Running Kedro project in Docker..."
35+
@kedro docker run
36+
37+
# Display help with available make targets
2338
help:
24-
@echo Available targets:
25-
@echo make jupy - Activate the virtual environment and run Jupyter Lab
26-
@echo make run - Run Kedro pipelines
27-
@echo make viz - Run Kedro Viz
28-
@echo make autoviz - Run Kedro Viz in autoreload mode
29-
@echo Author: $(AUTHOR)
39+
@echo Available targets:
40+
@echo make jupy - Activate the virtual environment and run Jupyter Lab
41+
@echo make run - Run Kedro pipelines
42+
@echo make viz - Run Kedro Viz
43+
@echo make autoviz - Run Kedro Viz in autoreload mode
44+
@echo make build - Build Docker image for the project
45+
@echo make dockerun - Run Kedro project inside a Docker container
46+
@echo Author: $(AUTHOR)

README.md

Lines changed: 32 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -94,22 +94,46 @@ Kedro-Energy-Forecasting/
9494
9595
├── .gitignore # Untracked files to ignore
9696
├── Makefile # Set of tasks to be executed
97+
├── Dockerfile # Instructions for building a Docker image
98+
├── .dockerignore # Files and directories to ignore in Docker builds
9799
├── README.md # Project documentation and setup guide
98100
└── requirements.txt # Project dependencies
99101
```
100102

101103
## 🚀 Getting Started
102104

103-
Turn **raw CSV data** into a **trained pickle Machine Learning model** with these steps:
105+
First, **Clone the Repository** to download a copy of the code onto your local machine, and before diving into transforming **raw data** into a **trained pickle Machine Learning model**, please note:
104106

105-
1. **Clone the Repository**: Download a copy of the code to your computer.
106-
2. **Set Up the Environment**: Create a virtual environment using Conda or venv.
107-
3. **Install Dependencies**: Run `pip install -r requirements.txt` in your environment to install the required libraries.
108-
4. **Run the Kedro Pipeline**: `make run` or `kedro run` – and witness magic 🪄
109-
5. **Review the Results**: After running the pipeline, look in the `04_reporting` and `05_model_output` directories to see your model's performance and results.
110-
6. **(Optional) Launch Kedro Viz**: To see a visual representation of your pipeline, run `make viz` or `kedro run viz`.
107+
🔴 **Important Preparation Steps**:
108+
- If you intend to run the code, it's better to remove the following directories if they exist: `data/02_processed`, `data/03_training_data`, `data/04_reporting`, and `data/05_model_output`. These directories will be regenerated or overwritten after executing the pipeline. They are **included** in the version control to **give you a preview of the expected outcomes**.
111109

112-
_Need guidance on commands? Peek into the **Makefile** or use `kedro --help` for assistance._
110+
111+
112+
### Standard Method (Conda / venv) 🌿
113+
114+
Adopt this method if you prefer a traditional Python development environment setup using Conda or venv.
115+
116+
1. **Set Up the Environment**: Initialize a virtual environment with Conda or venv to isolate and manage your project's dependencies.
117+
118+
2. **Install Dependencies**: Inside your virtual environment, execute `pip install -r dev-requirements.txt` to install the necessary Python libraries.
119+
120+
3. **Run the Kedro Pipeline**: Trigger the pipeline processing by running `make run` or directly with `kedro run`. This step orchestrates your data transformation and modeling.
121+
122+
4. **Review the Results**: Inspect the `04_reporting` and `05_model_output` directories to assess the performance and outcomes of your models.
123+
124+
5. **(Optional) Explore with Kedro Viz**: To visually explore your pipeline's structure and data flows, initiate Kedro Viz using `make viz` or `kedro run viz`.
125+
126+
### Docker Method 🐳
127+
128+
Prefer this method for a containerized approach, ensuring a consistent development environment across different machines. Ensure Docker is operational on your system before you begin.
129+
130+
1. **Build the Docker Image**: Construct your Docker image with `make build` or `kedro docker build`. This command leverages `dev-requirements.txt` for environment setup. For advanced configurations, see the [Kedro Docker Plugin Documentation](https://github.com/kedro-org/kedro-plugins/tree/main/kedro-docker).
131+
132+
2. **Run the Pipeline Inside a Container**: Execute the pipeline within Docker using `make dockerun` or `kedro docker run`. Kedro-Docker meticulously handles volume mappings to ensure seamless data integration between your local setup and the Docker environment.
133+
134+
3. **Access the Results**: Upon completion, the `04_reporting` and `05_model_output` directories will contain your model's reports and trained files, ready for review.
135+
136+
For additional assistance or to explore more command options, refer to the **Makefile** or consult `kedro --help`.
113137

114138
## 🌐 Let's Connect!
115139

0 commit comments

Comments
 (0)