Skip to content

Latest commit

 

History

History
155 lines (108 loc) · 3.48 KB

File metadata and controls

155 lines (108 loc) · 3.48 KB

Python Environment Management and Docker Configuration

Date: 2024-11-18 Topic: Conda vs Pip, and Docker network optimization


Background

Today I faced a decision point: should we use Conda or Pip for dependency management? This involved understanding the fundamental differences between these tools.


Conda vs Pip

The core difference:

Conda is an environment management tool that also functions as a package manager. Pip is purely a package management tool.

Key considerations:

  • Complexity of environment management
  • Build process efficiency
  • Deployment environment consistency
  • Maintenance costs

Docker Environment Configuration

Infrastructure setup:

FROM python:3.9-slim

WORKDIR /app

# Configure apt sources for China
RUN echo \
    deb https://mirrors.tuna.tsinghua.edu.cn/debian/ bullseye main contrib non-free \
    > /etc/apt/sources.list

# System dependency installation
RUN apt-get update && apt-get install -y \
    git \
    poppler-utils \
    libgl1-mesa-glx \
    libglib2.0-0 \
    && rm -rf /var/lib/apt/lists/*

Network Configuration

Optimized Docker daemon settings:

{
  "registry-mirrors": [
    "https://hub-mirror.c.163.com",
    "https://mirror.baidubce.com",
    "https://registry.docker-cn.com"
  ],
  "dns": [
    "8.8.8.8",
    "8.8.4.4"
  ],
  "max-concurrent-downloads": 3,
  "max-concurrent-uploads": 3,
  "mtu": 1400
}

Python Package Management Evolution

Historical progression:

Basic Package Management → Virtual Environments → Integrated Environment Management
(pip)                    (virtualenv)          (Conda)

Modern trends:

  • Rise of Poetry
  • Containerized deployment
  • Improved dependency resolution algorithms

Version Control Strategy

# requirements.txt
fastapi>=0.68.0
uvicorn>=0.15.0
python-multipart>=0.0.5
pillow>=8.3.1
pdf2image>=1.16.0
python-magic>=0.4.24
loguru>=0.5.3
pydantic<2.0.0  # Version locking example

Best practices:

  • Clear version constraints
  • Grouped dependency management
  • Environment isolation
  • Regular updates

Technology Selection Criteria

When choosing a technology stack, consider:

Dimension Considerations
Maturity How stable is it?
Community Is there active support?
Learning Curve How long to get productive?
Maintenance Cost Long-term overhead?
Ecosystem Integration options?

Environment Configuration Principles

Simplicity: Avoid unnecessary complexity

Maintainability: Focus on long-term maintenance costs

Scalability: Reserve space for future expansion


Today's Reflection

Today's main learning was about making technology choices. The Conda vs Pip debate is a good example of "no best choice, only the most suitable choice."

Conda is powerful but adds complexity. It manages its own Python installation, which can conflict with system Python. For simple projects, this overhead isn't justified.

Pip is simpler but doesn't handle non-Python dependencies well. If you need system libraries (like OpenCV's dependencies), you're on your own.

For our Docker-based deployment, we chose pip inside Docker. Docker handles system dependencies through apt-get, and pip handles Python packages. This separation keeps each tool doing what it does best.


Further Learning

  • Poetry for modern Python packaging
  • Docker network modes (bridge, host, overlay)
  • Dependency resolution algorithms
  • Virtual environment internals