1. Table of Contents

Docker for Data Science Projects

A starter template for data science workflows utilizing Docker as an alternative to Conda or venv environments.

1. Table of Contents

1.1. Who Is This Project For?
1.2. What Will You Learn?
1.3. Prerequisites
1.4. Contents of this Repository

2.1. Dockerfile
2.2. Build Command
2.3. Docker Image
2.4. Run Command
2.5. Docker Container
2.6. Docker Ignore (.dockerignore)

3. Installing Docker

3.1. Installing Docker on Ubuntu
3.2. Installing Docker on Windows
3.3. After Installing Docker
3.4. Automating Docker Startup in WSL

4. Setting Up Docker for a Data Science Project

4.1. Step 1: Install Prerequisites
4.2. Step 2: Set Up Your Project Repository
4.3. Step 3: Write the Dockerfile
4.4. Step 4: Write the .dockerignore file
4.5. Step 5: Write the Docker Compose File
4.6. Step 6: requirements.txt
4.7. Step 7: Build and Run Your Container
4.8. Step 8: Verify the Container
4.9. Step 9: Attach VS Code to the Container
4.10. Step 10: Run the Python Script
4.11. Step 11: Work with Jupyter Notebooks in VS Code
4.12. Step 12: Stop and remove the container
4.13. Note 1: Jupyter on browser
4.14. Note2: Keeping Your Environment Up-to-Date

5. Essential Docker Commands

5.1. Managing Images
5.2. Managing Containers
5.3. Port Mapping Commands
5.4. Working with Containers
5.5. Custom Container Names

6. Advanced Topics and FAQ

6.1. Understanding Network Ports
6.2. Docker Port Mapping in Detail
6.3. Common Issues and Solutions
6.4. Data Science Specific Considerations
6.5. Docker Shortcuts (alias)
6.6. Understanding and Cleaning Dangling Images
6.7. Tagging Docker Images
6.8. Working with Docker Volumes
6.9. Frequently Asked Questions (FAQ)

1. About this Repository

This repository provides a complete Docker workflow for data science projects. It enables you to train machine learning models, develop Python scripts, experiment with Jupyter notebooks, and manage datasets—all within Docker containers. The setup is designed for reproducibility and maintainability.

1.1. Who Is This Project For?

Anyone interested in data science, Python programming, or Docker containerization can benefit from this project. Whether you are a student, developer, or data scientist, this resource will walk you through building and deploying a data science environment using Docker.

1.2. What Will You Learn?

By working through this project, you will:

Gain a solid understanding of Docker and containerization
Learn to set up a full data science environment inside containers
Discover how to manage dependencies with Docker
See how to develop and execute Python scripts and Jupyter notebooks in containers
Work through practical examples for reproducible data science workflows
Learn Docker best practices tailored for data science

1.3. Prerequisites

This project is suitable for three types of users:

If you already know Docker:
- Jump right into the data science applications. The provided examples and configurations will help you refine your skills and explore best practices.
If you know Python/data science but are new to Docker:
- This project will introduce you to containerization, guiding you through building and deploying reproducible environments.
If you are a beginner:
- This project is beginner-friendly. You will start with the basics, learning how to set up Docker, then move on to building data science applications in containers.

1.4. Contents of this Repository

Folder PATH listing
.
+---data                          <-- Contains sample datasets
|       README.md                 <-- Documentation for the data folder
|       sample.csv                <-- Example dataset for experimentation
|
+---figures                       <-- Contains images for documentation
|       README.md                 <-- Documentation for the figures folder
|       docker.jpg                <-- Docker concepts illustration
|       port.jpg                  <-- Network port illustration
|       volume.jpg                <-- Docker volumes illustration
|
+---notebooks                     <-- Jupyter notebooks
|       README.md                 <-- Documentation for the notebooks folder
|       01_data_exploration.ipynb <-- Notebook for data exploration
|       02_model_training.ipynb   <-- Notebook for model training
|
+---scripts                       <-- Python scripts
|       README.md                 <-- Documentation for the scripts folder
|       data_prep.py              <-- Sample data preparation script
|
|   .dockerignore                 <-- Files to exclude from Docker build
|   .gitignore                    <-- Files to exclude from git
|   docker-compose.yml            <-- Docker Compose configuration
|   Dockerfile                    <-- Docker image definition
|   LICENSE                       <-- License information
|   README.md                     <-- This documentation file
|   requirements.txt              <-- Python dependencies

2. Docker Concepts

In simple terms:

Docker: The most advanced environment manager
Dockerfile: A recipe for a dish
Docker Image: A cooked dish
Docker Compose: Instructions for serving the dish
Docker Container: A served dish

In technical terms:

2.1. Dockerfile

The "Dockerfile" (capital D) specifies how to build the image. For example, it defines the Python version and points to the requirements.txt file for dependencies.
This file is typically located at the root of your project.

2.2. Build Command

This command creates an image based on the instructions in the Dockerfile.

2.3. Docker Image

The resulting image is essentially a file containing a lightweight Ubuntu Linux with installed packages, such as Python and its libraries.
The image acts like a compressed archive.
It is portable and easy to share.
However, it cannot be used until it is unpacked.

2.4. Run Command

This command generates a container from an image.

It unpacks the image (like extracting a compressed file) to make it usable.
The command is often lengthy and varies for each image, making it hard to memorize.
docker-compose.yml file: To simplify running containers, this command is written in a yml file and placed at the root of the project. From then on, you can start and stop containers with a simple, consistent command:
```
docker-compose up --build -d
docker-compose down
```
Writing the docker-compose.yml file is often the most challenging part of Docker and has project-specific requirements. This repository provides a ready-to-use file for common data science tasks. For other use cases, such as web development, you may need to learn more or consult ChatGPT.

2.5. Docker Container

A container is a lightweight Ubuntu Linux system with installed packages.
Containers are not portable or shareable. If you make changes and want to share them, you must create a new image and distribute that image.

2.6. Docker Ignore (.dockerignore)

The following questions are addressed:

Is a .dockerignore file necessary if there is already a .gitignore? Yes.
What distinguishes .dockerignore from .gitignore?
What should a .dockerignore file for a data science project look like?
Explanation of .dockerignore contents.

A .dockerignore file is essential even if you have a .gitignore. While both exclude files, they serve different purposes:

.gitignore prevents files from being tracked by Git
.dockerignore prevents files from being copied into Docker images during builds

A .dockerignore file is important because it:

Reduces the build context size, speeding up builds
Keeps sensitive files out of Docker images
Improves build cache efficiency
Prevents unnecessary files from bloating images

A .dockerignore for data science should exclude:

Python-specific: Compiled files, cache, and build artifacts
Virtual environments: Local venvs should not be copied
Development/IDE files: Editor configs and Git files
Docker-specific: Dockerfile, docker-compose files, and .dockerignore itself
Build/distribution: Local build artifacts
System files: OS-specific files like .DS_Store and Windows Zone identifiers

3. Installing Docker

3.1. Installing Docker on Ubuntu

Install Docker using the official documentation or with ChatGPT's help. After installation, verify with commands like docker images, docker ps, and by running the hello-world container.

# Add Docker's official GPG key:
sudo apt-get update
sudo apt-get install ca-certificates curl
sudo install -m 0755 -d /etc/apt/keyrings
sudo curl -fsSL https://download.docker.com/linux/ubuntu/gpg -o /etc/apt/keyrings/docker.asc
sudo chmod a+r /etc/apt/keyrings/docker.asc

# Add the repository to Apt sources:
echo \
  "deb [arch=$(dpkg --print-architecture) signed-by=/etc/apt/keyrings/docker.asc] https://download.docker.com/linux/ubuntu \
  $(. /etc/os-release && echo "$VERSION_CODENAME") stable" | \
  sudo tee /etc/apt/sources.list.d/docker.list > /dev/null
sudo apt-get update

# Install Docker and Docker Compose
sudo apt-get install docker-ce docker-ce-cli containerd.io docker-buildx-plugin docker-compose-plugin

# Install the latest Docker Compose
sudo curl -L "https://github.com/docker/compose/releases/latest/download/docker-compose-$(uname -s)-$(uname -m)" -o /usr/local/bin/docker-compose

sudo chmod +x /usr/local/bin/docker-compose

3.2. Installing Docker on Windows

The process is straightforward. Download and install the 64-bit Docker Desktop for Windows.

Note: To connect VS Code to Docker, Docker must be installed on Windows itself; installing it in WSL is not enough.

3.3. After Installing Docker

After installation, confirm Docker is working:

docker --version
sudo systemctl enable docker
sudo service docker start

Note for WSL users: WSL does not use systemd, so systemctl commands do not work inside WSL. In WSL, you must run sudo service docker start each time you boot. You can automate this with a script or alias.

# Check Docker installation
sudo docker images
sudo docker ps

To use Docker without sudo:

sudo usermod -aG docker $USER

Test it:

docker images
docker ps

3.4. Automating Docker Startup in WSL

In WSL, sudo systemctl enable docker does not work because WSL lacks systemd. Here are ways to start Docker automatically:

Option 1: Manual

If you do not mind running a command daily, use:

sudo service docker start

Option 2: Using an alias

Create an alias to shorten the command:

echo 'alias start-docker="sudo service docker start"' >> ~/.bashrc
source ~/.bashrc

Now, you can simply type:

start-docker

Option 3: Automatic

To start Docker automatically when WSL starts:

Open WSL and edit the WSL configuration file:
```
sudo nano /etc/wsl.conf
```
Add these lines:
```
[boot]
command="service docker start"
```
Save and exit (Ctrl + X, then Y, then Enter).
Restart WSL:
```
wsl --shutdown
```

4. Setting Up Docker for a Data Science Project

A guide to creating a portable and reproducible Docker project template for developing Python scripts and Jupyter notebooks in a containerized environment using VS Code.

4.1. Step 1: Install Prerequisites

Install Docker Desktop with WSL integration on Windows 11.
Install Visual Studio Code.
In VS Code, add these extensions: Docker, Remote - Containers, Python, and Jupyter.

4.2. Step 2: Set Up Your Project Repository

Create a new Git repository (or clone an existing one).
In the repository folder, create these files:
- Dockerfile
- .dockerignore
- docker-compose.yml
- requirements.txt
- data_prep.py
- 01_data_exploration.ipynb
- 02_model_training.ipynb

4.3. Step 3: Write the Dockerfile

Add the following to your Dockerfile:

# Base image with Python 3.9
FROM python:3.9

# Set the working directory
WORKDIR /app

# Copy requirements and install dependencies
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt

# Install Jupyter Notebook and JupyterLab
RUN pip install notebook jupyterlab

# Expose port 8888 for Jupyter
EXPOSE 8888

# Start Jupyter Notebook with no token for development
ENTRYPOINT ["sh", "-c", "exec jupyter notebook --ip=0.0.0.0 --port=8888 --no-browser --allow-root --NotebookApp.token=''" ]

4.4. Step 4: Write the .dockerignore file

Create a .dockerignore file in your project root to exclude unnecessary files from your Docker image.

4.5. Step 5: Write the Docker Compose File

In docker-compose.yml, include:

services:
  your-project:
    build: .
    image: your-project_image
    container_name: your-project_container
    volumes:
      - .:/app
    stdin_open: true
    tty: true
    ports:
      - "8888:8888"

This mounts your entire project folder into the container at /app. Note: In lines 2, 3, and 4, replace your-project with your project's name, for example: dockerproject1.

4.6. Step 6: requirements.txt

Keep the requirements.txt file clean and current. This ensures all dependencies are installed and maintains compatibility and performance. The ipykernel package is essential for Jupyter notebook support.

ipykernel # This package is essential for running Jupyter notebooks.
numpy==1.26.0
pandas==2.1.3
matplotlib==3.8.0

Best Practice: Use pip in Docker unless Conda is required. Stick to requirements.txt for best compatibility and performance.

4.7. Step 7: Build and Run Your Container

On your host machine (in the project folder), you have two options:

First (recommended): This method extracts the project name to use as the image and container names.

To make start.sh executable if it is not:
```
chmod +x start.sh 
```
To extract the project name and then build the image and run the container:
```
./start.sh 
```
Second:
In this method, the image and container names default to data-science-project.
```
docker-compose up --build -d
```

Note:

--build: Omitting "--build" means changes to Dockerfile or dependencies will not be applied.
-d: The "-d" flag runs the container in detached mode, so you can keep using the terminal.

4.8. Step 8: Verify the Container

Run:

docker-compose ps

Ensure the container status is "Up" and port 8888 is mapped.

4.9. Step 9: Attach VS Code to the Container

Follow these steps:

Press Ctrl+Shift+P to open the command palette.
Type and select Dev Containers: Attach to Running Container….
Choose the container named your-project_name. A second VS Code window will open.
In the new window, click Open Folder. At the top, you will see /root. Delete root to reveal app. Select app and click OK. You will then see all your project's folders and files.
In the second VS Code window, install the following extensions: Docker, Dev Containers, Python, and Jupyter. If you see a Reload the window button after installing each extension, make sure to click it every time.
You are all set and can continue.

Note: In Step 11, if you cannot select the kernel, close the second VS Code window and repeat steps 1–4. The correct kernel will then be automatically attached to the notebooks.

4.10. Step 10: Run the Python Script

In the VS Code terminal, open the terminal. You will see a bash prompt, indicating you are inside the container. Run:

python scripts/data_prep.py

You should see the expected output (for example, "hi").

4.11. Step 11: Work with Jupyter Notebooks in VS Code

Open 01_data_exploration.ipynb in VS Code.
In the top-right corner of the notebook, you should see a kernel with the same name as your project. If not, click the Select Kernel button and choose the Jupyter kernel option. This will display a kernel with your project's name and the Python kernel specified in the Dockerfile. The libraries from the requirements.txt file, installed in the Docker container, will be automatically available for use.
You can now run and edit cells within the container.

4.12. Step 12: Stop and remove the container

docker-compose down

4.13. Note 1: Jupyter on browser

See localhost:8888/tree?

4.14. Note2: Keeping Your Environment Up-to-Date

To rebuild your container with any changes, run on your host:
```
docker-compose up --build
```
After installing a new package, update requirements.txt inside the container by running:
```
pip freeze > requirements.txt
```
For pulling the latest base image, run:
```
docker-compose build --pull
```

5. Essential Docker Commands

5.1. Managing Images

# Pull images from Docker Hub
docker pull nginx
docker pull hello-world

# List all images
docker images

# Remove images
docker rmi <image1> <image2> ...

5.2. Managing Containers

# List running containers
docker ps

# List all containers (including stopped ones)
docker ps -a

# List only container IDs
docker ps -aq

# Remove containers
docker rm <CONTAINER1> <CONTAINER2> ...

# Remove all containers
docker rm $(docker ps -aq)

# Run a container in detached mode
docker run -d <IMAGE name or ID>

# Start/stop containers
docker start <CONTAINER name or ID>
docker stop <CONTAINER name or ID>

# Start/stop all containers at once
docker start $(docker ps -aq)
docker stop $(docker ps -aq)

Note: You can use just the first two letters of a container ID for identification. For example: docker stop 2f

5.3. Port Mapping Commands

# Run nginx and map port 80 of the host to port 80 of the container
docker run -d -p 80:80 nginx

# Run another nginx instance on a different host port
docker run -d -p 8080:80 nginx

# Map multiple ports
docker run -d -p 80:80 -p 443:443 nginx

# Map all exposed ports to random ports
docker run -d -P nginx

The -p host_port:container_port option maps ports between your host system and the container.

5.4. Working with Containers

# Enter a container's bash shell
docker exec -it <CONTAINER name or ID> bash

# Save an image to a tar file
docker save -o /home/mostafa/docker-projects/nginx.tar nginx

# Load an image from a tar file
docker load -i /home/mostafa/docker-projects/nginx.tar

5.5. Custom Container Names

Docker assigns random names to containers by default. To specify a custom name:

docker run -d --name <arbitrary-name> -p 80:80 <image-name>

Example:

docker run -d --name webserver -p 80:80 nginx

6. Advanced Topics and FAQ

6.1. Understanding Network Ports

In networking:

IP address identifies which device you're communicating with ("who")
Port number specifies which service or application on that device ("what")

For example, when you access: google.com => 215.114.85.17:80

215.114.85.17 is Google's IP address (who you're talking to)
80 is the port number for HTTP (what service you're requesting)

Ports can range from 0 to 65,535 (2^16 - 1), with standard services typically using well-known ports:

Web servers:
- HTTP: port 80
- HTTPS: port 443
Development servers:
- FastAPI: port 8000
- Jupyter: port 8888
- SSH: port 22
Database Management Systems (DBMS):
- MySQL: port 3306
- PostgreSQL: port 5432
- MongoDB: port 27017

Important Notes on Database Ports:

Databases themselves don't have ports; the Database Management Systems (DBMS) do.
All databases within a single DBMS instance typically use the same port.
If you want to run two versions of the same DBMS on one server, you must use different ports.
Exception: Some DBMS like MongoDB allow each database to run on a different port, but by default, all databases share a common port.

6.2. Docker Port Mapping in Detail

Port mapping in Docker (-p 80:80) allows you to:

Access containerized services from your host machine
Run multiple instances of the same service on different host ports
Avoid port conflicts when multiple containers need the same internal port

With these commands:

First container: access via localhost:80 in browser
Second container: access via localhost:8080 in browser
Both containers are running nginx on their internal port 80

This approach is especially useful for data science projects when you need to:

Run multiple Jupyter servers
Access databases from both containerized applications and host tools
Expose machine learning model APIs

6.3. Common Issues and Solutions

Container Won't Start

If your container won't start, check:

Port conflicts: Is another service using the same port?
Resource limitations: Do you have enough memory/CPU?
Permission issues: Are volume mounts correctly configured?

File Permissions Issues

When using volume mounts, file permission issues can occur. Solutions:

Use the --user flag when running the container
Set appropriate permissions in the Dockerfile
Use Docker Compose's user option

Performance Considerations

Use .dockerignore to reduce build context size
Minimize the number of layers in your Dockerfile
Consider multi-stage builds for smaller images

6.4. Data Science Specific Considerations

Jupyter Notebook Security

For production:

Don't use --NotebookApp.token=''
Set up proper authentication
Use HTTPS for connections

GPU Support

For deep learning:

Install NVIDIA Container Toolkit
Use the --gpus all flag with docker run
Use appropriate base images (e.g., tensorflow/tensorflow:latest-gpu)

Large Data Files

When working with large datasets:

Don't include data in the Docker image
Use volume mounts for data directories
Consider using data volumes or bind mounts

6.5. Docker Shortcuts (alias)

Add these aliases to your .bashrc or .zshrc file to make Docker commands more convenient:

#-----------------------------------------------------------------------------------------
# Docker aliases

# --- Image Management ---
alias di="    docker images    --format 'table {{.ID}}\t{{.Repository}}\t{{.Tag}}\t{{.Size}}\t{{.CreatedSince}}'"
alias dia="   docker images -a --format 'table {{.ID}}\t{{.Repository}}\t{{.Tag}}\t{{.Size}}\t{{.CreatedSince}}'"
alias drmi="  docker rmi"

drmia() {     docker rmi $(docker images -aq)       }  # Remove All Images
drmif() {                                              # Remove All dangling images
 local images=$(docker images -q -f dangling=true)
 if [ -n "$images" ]; then
   echo "Removing dangling images: $images"
   docker rmi $images
 else
   echo "No dangling images to remove."
 fi
}

# --- Container Management ---
alias dps="   docker ps     --format 'table {{.ID}}\t{{.Image}}\t{{.Names}}\t{{.Status}}\t{{.Ports}}'"
alias dpsa="  docker ps -a  --format 'table {{.ID}}\t{{.Image}}\t{{.Names}}\t{{.Status}}\t{{.Ports}}'"
alias dpsaq=" docker ps -aq --format 'table {{.ID}}\t{{.Image}}\t{{.Names}}\t{{.Status}}\t{{.Ports}}'"

alias dst="   docker start"
alias dsp="   docker stop"
alias drm="   docker rm"

dsta() {      docker start $(docker ps -aq)   }  # Start  All Containers
dspa() {      docker stop  $(docker ps -aq)   }  # Stop   All Containers
drma() {      docker rm    $(docker ps -aq)   }  # Remove All Containers

# --- Docker Compose Commands ---
alias dcu="   docker compose up   -d --build"
alias dcd="   docker compose down"

# --- Docker Exec Bash ---
deb() {       docker exec -it "$1" bash   }

These shortcuts provide:

Better Formatted Output

di: Lists images with formatted output showing ID, repository, tag, size, and age
dps/dpsa: Shows running/all containers with formatted output

Bulk Operations

drmia: Removes all images
drmif: Removes only "dangling" images (untagged images)
dsta/dspa: Starts/stops all containers
drma: Removes all containers

Shorter Commands

dst/dsp: Quick container start/stop
dcu/dcd: Docker compose up/down with build and detached mode

To use these aliases:

Add the code block to your shell profile file (~/.bashrc or ~/.zshrc)
Run source ~/.bashrc or source ~/.zshrc to apply changes
Start using the shortened commands

6.6. Understanding and Cleaning Dangling Images

When you run docker images, you may see entries with <none> as their repository and tag:

REPOSITORY                                 TAG     IMAGE ID       CREATED        SIZE
p1-ml-engineering-api-fastapi-docker-jupyter  latest  5afe18f4594a  13 hours ago   745MB
<none>                                     <none>  808f843b9362  13 hours ago   748MB
<none>                                     <none>  5706fd96eca0  14 hours ago   742MB
<none>                                     <none>  1e904ba38c6d  14 hours ago   742MB

What are these `<none>` images?

These are called "dangling images" and usually appear when:

You rebuild an image with the same tag—the old image becomes "dangling" and shows as <none>:<none>
A build fails or is interrupted
You pull a new version of an image, and the old one loses its tag

Why should you care?

Dangling images:

Consume disk space unnecessarily
Make your image list harder to read
Serve no practical purpose

How to remove dangling images:

You can safely remove all dangling images with:

docker image prune -f

Or use the alias defined earlier:

drmif

After running this command, you'll see output listing all deleted images:

Deleted Images:
deleted: sha256:1e904ba38c6dabb0c8c9dd896954c07b5f1b1cf196364ff1de5da46d18aa9fb
deleted: sha256:c73b8c1cc3550886ac1cc5965f89c6c2553b08fb0c472e1a1f9106b26ee4b14
...

This helps keep your Docker environment clean and efficient.

6.7. Tagging Docker Images

Proper tagging of Docker images is crucial for organizing, versioning, and deploying your containerized applications, especially in data science projects where model versions matter.

Best Practices for Tagging Images

Use semantic versioning (e.g., v1.0.1, v2.1)
Avoid using latest in production
Use environment-specific tags (dev, staging, prod)
Tag images before pushing to a registry

Basic Tagging Command

To tag a Docker image, use:

docker tag SOURCE_IMAGE[:TAG] TARGET_IMAGE[:TAG]

Examples

Simple version tagging:

# Tag the current 'latest' image with a version number
docker tag my-datascience-app:latest my-datascience-app:v1.0

Preparing for Docker Hub:

# Tag for pushing to Docker Hub
docker tag my-datascience-app:latest username/my-datascience-app:v1.0

# Then push to Docker Hub
docker push username/my-datascience-app:v1.0

Multiple tags for different environments:

# Create production-ready tag
docker tag my-ml-model:v1.2.3 my-ml-model:prod

# Create development tag
docker tag my-ml-model:latest my-ml-model:dev

For Data Science Projects

For data science projects, consider including model information in your tags:

# Include model architecture and training data version
docker tag my-model:latest my-model:lstm-v2-dataset20230512

# Include accuracy metrics
docker tag my-model:latest my-model:v1.2-acc95.4

Proper tagging helps maintain reproducibility and track which model version is deployed where.

6.8. Working with Docker Volumes

By default, when a container is stopped or removed, all data inside it is lost. Docker volumes provide persistent storage that exists outside of containers.

Why Use Volumes?

Data Persistence: Retain data even when containers are removed
Data Sharing: Share data between multiple containers
Performance: Better I/O performance than bind mounts, especially on Windows/Mac
Isolation: Manage container data separately from the host filesystem

Basic Volume Usage

Syntax for mounting volumes:

docker run -v /host/path:/container/path[:options] image_name

Examples

Example 1: Exploring a Container's Default Storage

First, see what's inside a container without volumes:

# Start an nginx container
docker run -d --name nginx-test -p 80:80 nginx

# Enter the container
docker exec -it nginx-test bash

# Check the content of nginx's web directory
cd /usr/share/nginx/html
ls -la

Example 2: Using a Volume for Persistence

Now mount a local directory to nginx's web directory:

docker run -d -p 3000:80 -v /home/username/projects/my-website:/usr/share/nginx/html nginx

This mounts your local directory /home/username/projects/my-website to the container's /usr/share/nginx/html directory. Any changes in either location will be reflected in the other.

Security Considerations

The previous example gives full read/write access to the container. For better security, add the :ro (read-only) option:

docker run -d -p 3000:80 -v /home/username/projects/my-website:/usr/share/nginx/html:ro nginx

This prevents the container from modifying files in your local directory.

Volumes in Data Science Projects

For data science projects, volumes are especially useful for:

Persisting Jupyter notebooks and data:

docker run -d -p 8888:8888 -v /home/username/ds-project:/app jupyter/datascience-notebook

Sharing datasets between containers:

# Create a named volume
docker volume create dataset-vol

# Mount the volume to multiple containers
docker run -d --name training -v dataset-vol:/data training-image
docker run -d --name inference -v dataset-vol:/data inference-image

Storing model artifacts:

docker run -d -p 8501:8501 -v /home/username/models:/models -e MODEL_PATH=/models/my_model ml-serving-image

Volume Types in Docker

Named Volumes (managed by Docker):

docker volume create my-volume
docker run -v my-volume:/container/path image_name

Bind Mounts (direct mapping to host):

docker run -v /absolute/host/path:/container/path image_name

Tmpfs Mounts (stored in host memory):

docker run --tmpfs /container/path image_name

</rewritten_file>

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
data		data
figures		figures
models		models
notebooks		notebooks
scripts		scripts
utils		utils
.dockerignore		.dockerignore
.gitignore		.gitignore
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
docker-compose.yml		docker-compose.yml
requirements.txt		requirements.txt
start.sh		start.sh

License

Alireza-Motazedian/Dockerized_LSTM

Folders and files

Latest commit

History

Repository files navigation

1. Table of Contents

1. About this Repository

1.1. Who Is This Project For?

1.2. What Will You Learn?

1.3. Prerequisites

1.4. Contents of this Repository

2. Docker Concepts

2.1. Dockerfile

2.2. Build Command

2.3. Docker Image

2.4. Run Command

2.5. Docker Container

2.6. Docker Ignore (.dockerignore)

3. Installing Docker

3.1. Installing Docker on Ubuntu

3.2. Installing Docker on Windows

3.3. After Installing Docker

3.4. Automating Docker Startup in WSL

Option 1: Manual

Option 2: Using an alias

Option 3: Automatic

4. Setting Up Docker for a Data Science Project

4.1. Step 1: Install Prerequisites

4.2. Step 2: Set Up Your Project Repository

4.3. Step 3: Write the Dockerfile

4.4. Step 4: Write the .dockerignore file

4.5. Step 5: Write the Docker Compose File

4.6. Step 6: requirements.txt

4.7. Step 7: Build and Run Your Container

4.8. Step 8: Verify the Container

4.9. Step 9: Attach VS Code to the Container

4.10. Step 10: Run the Python Script

4.11. Step 11: Work with Jupyter Notebooks in VS Code

4.12. Step 12: Stop and remove the container

4.13. Note 1: Jupyter on browser

4.14. Note2: Keeping Your Environment Up-to-Date

5. Essential Docker Commands

5.1. Managing Images

5.2. Managing Containers

5.3. Port Mapping Commands

5.4. Working with Containers

5.5. Custom Container Names

6. Advanced Topics and FAQ

6.1. Understanding Network Ports

6.2. Docker Port Mapping in Detail

6.3. Common Issues and Solutions

Container Won't Start

File Permissions Issues

Performance Considerations

6.4. Data Science Specific Considerations

Jupyter Notebook Security

GPU Support

Large Data Files

6.5. Docker Shortcuts (alias)

Better Formatted Output

Bulk Operations

Shorter Commands

6.6. Understanding and Cleaning Dangling Images

What are these <none> images?

Why should you care?

How to remove dangling images:

6.7. Tagging Docker Images

Best Practices for Tagging Images

Basic Tagging Command

Examples

For Data Science Projects

6.8. Working with Docker Volumes

Why Use Volumes?

Basic Volume Usage

Examples

Security Considerations

Volumes in Data Science Projects

Volume Types in Docker

About

What are these `<none>` images?