When using the quantization features (uv sync --extra quantization) on Ubuntu systems, you may encounter compilation errors during runtime when initializing models with quantization enabled.
Error:
RuntimeError: Failed to find C compiler. Please specify via CC environment variable.
Root Cause:
bitsandbytesrequires compilation of CUDA utilities at runtime- Triton (dependency of bitsandbytes) needs to compile C code on-the-fly
- Ubuntu systems often don't have build tools installed by default
Solution:
sudo apt-get update && sudo apt-get install -y build-essentialWhat this installs:
gcc- GNU C compilerg++- GNU C++ compilermake- Build automation tool- Other essential build tools
Error:
fatal error: Python.h: No such file or directory
5 | #include <Python.h>
| ^~~~~~~~~~
compilation terminated.
Root Cause:
- Triton needs Python development headers to compile C extensions
Python.his required for creating Python C extensions- Standard Python installation doesn't include development headers
Solution:
sudo apt-get install -y python3.12-devWhat this installs:
- Python development headers (
Python.h) - Static libraries for Python
- Configuration files needed for compiling Python extensions
# Install both dependencies in one command
sudo apt-get update && sudo apt-get install -y build-essential python3.12-dev- Your Code → Uses
load_in_4bit=True - Transformers → Calls quantization backend
- BitsAndByes → Provides 4-bit/8-bit quantization
- Triton → GPU kernel compilation framework
- CUDA Utils → Need C compilation at runtime
Unlike pre-compiled packages, quantization libraries often:
- Compile optimized kernels on first use
- Generate hardware-specific code
- Require build tools to be available at runtime
When building Docker images with quantization support:
# Ubuntu-based Dockerfile
FROM ubuntu:24.04
# Install system dependencies for quantization
RUN apt-get update && apt-get install -y \
build-essential \
python3.12-dev \
python3.12-venv \
python3-pip \
&& rm -rf /var/lib/apt/lists/*
# Install uv
RUN pip install uv
# Copy project files
COPY . /app
WORKDIR /app
# Install with quantization support
RUN uv sync --extra quantization
# Your application setup...# Required for quantization features
sudo apt-get install -y build-essential python3.12-dev
# Optional but recommended
sudo apt-get install -y \
git \
curl \
nvidia-cuda-toolkit # If using CUDA- Same build dependencies required
- CUDA toolkit if using GPU quantization
- Sufficient GPU memory for quantized models
After installing the dependencies, verify everything works:
# Test C compiler
gcc --version
# Test Python headers
python3.12-config --includes
# Test quantization import
uv run python -c "import bitsandbytes; print('✅ BitsAndByes working')"
# Test your adapter
uv run python tests/test_lightllm.pysudo apt-get install -y build-essential python3.12-devThis issue is specific to systems where Python packages need to compile C extensions at runtime. The quantization libraries are performance-critical and often compile optimized code for your specific hardware configuration.