Skip to content

Commit e33f4fe

Browse files
authored
fix(docker): make ChromaDB optional and optimize build (#51)
* fix(docker): make ChromaDB optional and optimize build - Add .dockerignore to exclude unnecessary files while keeping .git and README.md for build - Remove chromadb from default dependencies to reduce image size - Make ChromaVectorStore optional (import only if chromadb is installed) - Add APT_MIRROR and PYPI_INDEX_URL build args for faster downloads in China - Optimize COPY order for better layer caching - Clean up .git after build to reduce image size * fix(docker): correct volume mount paths - Fix uploads path: /opt/xagent/uploads → /opt/xagent/src/xagent/web/uploads - Fix .xagent path: /home/xagent/.xagent → /root/.xagent (container runs as root) * fix(storage): check legacy path first, then use new path - Check legacy location first (if has data) - Otherwise use ~/.xagent/data/lancedb or ~/.xagent/memory_store - Ensures backward compatibility for existing users - Add @lru_cache to get_default_lancedb_dir() to avoid repeated filesystem checks * test: update test for new default LanceDB path Update test_default_lancedb_dir_when_missing_env to expect ~/.xagent/data/lancedb as the new default path instead of project_root/data/lancedb. * fix(docker): add WebSocket support and Playwright dependencies - Move websockets from dev to main dependencies for WebSocket support - Add missing Playwright Chromium system libraries - Fix lancedb.py to use .is_dir() instead of .exists() * fix(deps): move matplotlib to main dependencies - Move matplotlib from dev to production dependencies * docs(lancedb): update get_connection_from_env docstring for legacy path behavior - Reflect current legacy-check and fallback behavior in documentation - Clarify default path resolution logic in both class method and module function
1 parent 710a531 commit e33f4fe

File tree

9 files changed

+218
-70
lines changed

9 files changed

+218
-70
lines changed

.dockerignore

Lines changed: 71 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,71 @@
1+
# Git
2+
# .git is needed for hatch-vcs version detection in backend build
3+
.github
4+
5+
# Python
6+
__pycache__
7+
*.py[cod]
8+
*$py.class
9+
*.so
10+
.Python
11+
.pytest_cache
12+
.mypy_cache
13+
.ruff_cache
14+
.coverage
15+
htmlcov/
16+
17+
# Virtual environments
18+
.venv
19+
venv/
20+
ENV/
21+
env/
22+
23+
# IDE
24+
.vscode
25+
.idea
26+
*.swp
27+
*.swo
28+
*~
29+
30+
# Documentation
31+
# README.md is needed for pyproject.toml
32+
docs/
33+
assets/
34+
35+
# Testing
36+
tests/
37+
.pytest_cache/
38+
.tox/
39+
40+
# CI/CD
41+
.github/
42+
.gitlab-ci.yml
43+
.travis.yml
44+
45+
# Docker
46+
docker-compose*.yml
47+
Dockerfile*
48+
.dockerignore
49+
50+
# Data directories
51+
data/
52+
memory_store/
53+
uploads/
54+
55+
# Build artifacts
56+
dist/
57+
build/
58+
*.egg-info/
59+
60+
# Logs
61+
*.log
62+
logs/
63+
64+
# OS
65+
.DS_Store
66+
Thumbs.db
67+
68+
# Misc
69+
*.env
70+
*.local
71+
.example.env

docker-compose.yml

Lines changed: 1 addition & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -41,10 +41,8 @@ services:
4141
env_file:
4242
- .env
4343
volumes:
44-
- xagent_data:/home/xagent/.xagent
44+
- xagent_data:/root/.xagent
4545
- xagent_uploads:/opt/xagent/src/xagent/web/uploads
46-
- xagent_lancedb:/opt/xagent/data
47-
- xagent_memory:/opt/xagent/memory_store
4846
depends_on:
4947
postgres:
5048
condition: service_healthy
@@ -84,10 +82,6 @@ volumes:
8482
driver: local
8583
xagent_uploads:
8684
driver: local
87-
xagent_lancedb:
88-
driver: local
89-
xagent_memory:
90-
driver: local
9185

9286
networks:
9387
xagent_network:

docker/Dockerfile.backend

Lines changed: 69 additions & 35 deletions
Original file line numberDiff line numberDiff line change
@@ -1,14 +1,25 @@
11
# syntax=docker/dockerfile:1.6
22
ARG PYTHON_VERSION=3.11
33
ARG NODE_VERSION=22
4+
ARG PYPI_INDEX_URL=""
5+
ARG APT_MIRROR=""
46

57
############################
68
# Backend build
79
############################
810
FROM python:${PYTHON_VERSION}-bookworm AS backend-build
911

12+
ARG PYPI_INDEX_URL=""
13+
ARG APT_MIRROR=""
14+
1015
WORKDIR /opt/xagent
1116

17+
# Use APT mirror if provided (for faster downloads in China)
18+
RUN if [ -n "$APT_MIRROR" ]; then \
19+
sed -i "s|http://deb.debian.org/debian|$APT_MIRROR|g" /etc/apt/sources.list.d/debian.sources \
20+
|| sed -i "s|http://deb.debian.org/debian|$APT_MIRROR|g" /etc/apt/sources.list; \
21+
fi
22+
1223
RUN apt-get update && apt-get install -y --no-install-recommends \
1324
build-essential \
1425
gcc \
@@ -21,43 +32,55 @@ RUN apt-get update && apt-get install -y --no-install-recommends \
2132
libsm6 \
2233
libxrender1 \
2334
libxext6 \
35+
&& git config --global http.version HTTP/1.1 \
36+
&& git config --global http.postBuffer 524288000 \
2437
&& rm -rf /var/lib/apt/lists/*
2538

26-
# Improve git reliability for large repos / HTTP2 issues
27-
RUN git config --global http.version HTTP/1.1 \
28-
&& git config --global http.postBuffer 524288000
39+
# Copy dependency files first for better layer caching
40+
COPY pyproject.toml .
41+
COPY README.md .
2942

43+
# Copy .git directory for version detection (hatch-vcs)
3044
COPY .git .git
31-
COPY pyproject.toml README.md .
32-
COPY src ./src
33-
COPY alembic.ini .
3445

3546
# Create venv and install build tools
3647
RUN python -m venv /opt/venv \
37-
&& /opt/venv/bin/pip install --no-cache-dir --upgrade pip
48+
&& /opt/venv/bin/pip install --no-cache-dir ${PYPI_INDEX_URL:+--index-url $PYPI_INDEX_URL} --upgrade pip
49+
50+
# Install torch CPU version first (to avoid CUDA version from docling dependency)
51+
RUN /opt/venv/bin/pip install --no-cache-dir --extra-index-url https://download.pytorch.org/whl/cpu torch
3852

39-
# Install app + deepdoc + chromadb into venv
40-
# Patch deepdoc to use CPU onnxruntime on x86_64
41-
RUN /opt/venv/bin/pip install --no-cache-dir . \
53+
# Copy source code (changes frequently)
54+
COPY src ./src
55+
COPY alembic.ini .
56+
57+
# Install app + deepdoc into venv
58+
RUN /opt/venv/bin/pip install --no-cache-dir ${PYPI_INDEX_URL:+--index-url $PYPI_INDEX_URL} . \
4259
&& git clone https://github.com/xorbitsai/deepdoc-lib /opt/xagent/deepdoc-lib \
4360
&& cd /opt/xagent/deepdoc-lib && git checkout fe75bbb676b141a1ea4257b198f6a04a89e5ac5c \
44-
&& /opt/venv/bin/pip install --no-cache-dir /opt/xagent/deepdoc-lib \
45-
&& /opt/venv/bin/pip install --no-cache-dir chromadb \
46-
&& rm -rf /opt/xagent/deepdoc-lib
61+
&& /opt/venv/bin/pip install --no-cache-dir ${PYPI_INDEX_URL:+--index-url $PYPI_INDEX_URL} /opt/xagent/deepdoc-lib \
62+
&& rm -rf /opt/xagent/deepdoc-lib .git
4763

4864
############################
4965
# Runtime
5066
############################
5167
FROM python:${PYTHON_VERSION}-bookworm AS runtime
5268

5369
ARG NODE_VERSION=22
70+
ARG APT_MIRROR=""
5471
ENV PLAYWRIGHT_BROWSERS_PATH=/ms-playwright
5572
ENV VIRTUAL_ENV=/opt/venv
5673
ENV PATH="/opt/venv/bin:$PATH"
5774

5875
WORKDIR /opt/xagent
5976

60-
# System deps + Node.js + LibreOffice
77+
# Use APT mirror if provided (for faster downloads in China)
78+
RUN if [ -n "$APT_MIRROR" ]; then \
79+
sed -i "s|http://deb.debian.org/debian|$APT_MIRROR|g" /etc/apt/sources.list.d/debian.sources \
80+
|| sed -i "s|http://deb.debian.org/debian|$APT_MIRROR|g" /etc/apt/sources.list; \
81+
fi
82+
83+
# System deps + Node.js + LibreOffice + Playwright dependencies
6184
RUN apt-get update && apt-get install -y --no-install-recommends \
6285
curl \
6386
gnupg \
@@ -67,6 +90,25 @@ RUN apt-get update && apt-get install -y --no-install-recommends \
6790
libxrender1 \
6891
libxext6 \
6992
libgl1 \
93+
libatk1.0-0 \
94+
libatk-bridge2.0-0 \
95+
libcups2 \
96+
libdbus-1-3 \
97+
libdrm2 \
98+
libxkbcommon0 \
99+
libxcomposite1 \
100+
libxdamage1 \
101+
libxfixes3 \
102+
libxi6 \
103+
libxtst6 \
104+
libpango-1.0-0 \
105+
libcairo2 \
106+
libgbm1 \
107+
libnss3 \
108+
libnspr4 \
109+
libxrandr2 \
110+
libasound2 \
111+
libpangocairo-1.0-0 \
70112
&& rm -rf /var/lib/apt/lists/*
71113

72114
# Install Node.js
@@ -77,31 +119,23 @@ RUN curl -fsSL https://deb.nodesource.com/gpgkey/nodesource-repo.gpg.key | gpg -
77119
&& apt-get install -y --no-install-recommends nodejs \
78120
&& rm -rf /var/lib/apt/lists/*
79121

80-
# Copy venv + app
122+
# Copy entrypoint script early (changes rarely)
123+
COPY docker/entrypoint.sh /opt/xagent/deploy/entrypoint.sh
124+
125+
# Copy venv + app (exclude .git directory)
81126
COPY --from=backend-build /opt/venv /opt/venv
82-
COPY --from=backend-build /opt/xagent /opt/xagent
127+
COPY --from=backend-build /opt/xagent/src /opt/xagent/src
128+
COPY --from=backend-build /opt/xagent/pyproject.toml /opt/xagent/
129+
COPY --from=backend-build /opt/xagent/alembic.ini /opt/xagent/
83130

84131
# Copy frontend package files for npm tasks (e.g., PPTX generation)
85132
COPY frontend/package.json frontend/package-lock.json /opt/xagent/frontend/
86-
RUN cd /opt/xagent/frontend && npm ci
87-
88-
# Install pptxgenjs globally for JavaScript executor tool
89-
RUN npm install -g pptxgenjs@4.0.1
90-
91-
# Playwright browser + deps
92-
RUN python -m playwright install --with-deps chromium
93-
94-
# Download NLTK data and tiktoken models
95-
RUN python -c "import nltk; nltk.download('punkt_tab')" \
96-
&& python -c "import tiktoken; tiktoken.encoding_for_model('gpt-4')"
97-
98-
# Copy entrypoint script
99-
COPY docker/entrypoint.sh /opt/xagent/deploy/entrypoint.sh
100-
RUN chmod +x /opt/xagent/deploy/entrypoint.sh
101-
102-
# Create directory for uploads and data
103-
RUN mkdir -p /opt/xagent/uploads \
104-
&& mkdir -p /home/xagent/.xagent
133+
RUN cd /opt/xagent/frontend && npm ci \
134+
&& npm install -g pptxgenjs@4.0.1 \
135+
&& python -m playwright install chromium \
136+
&& python -c "import nltk; nltk.download('punkt'); nltk.download('punkt_tab'); nltk.download('wordnet'); nltk.download('averaged_perceptron_tagger')" \
137+
&& python -c "import tiktoken; tiktoken.encoding_for_model('gpt-4')" \
138+
&& chmod +x /opt/xagent/deploy/entrypoint.sh
105139

106140
EXPOSE 8000
107141
ENTRYPOINT ["/opt/xagent/deploy/entrypoint.sh"]

pyproject.toml

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -30,6 +30,7 @@ dependencies = [
3030
"PyYAML >= 6.0.2",
3131
"fastapi >= 0.35.0",
3232
"uvicorn >= 0.35.0",
33+
"websockets>=15.0.1",
3334
"httpx >= 0.27.2",
3435
"rich >= 14.0.0",
3536
"prompt_toolkit >= 3.0.0",
@@ -61,6 +62,7 @@ dependencies = [
6162
"jsonschema>=4.25.1",
6263
"playwright>=1.40.0",
6364
"beartype>=0.18.5",
65+
"matplotlib>=3.5.0",
6466
]
6567

6668
[project.optional-dependencies]
@@ -84,13 +86,11 @@ dev = [
8486
"types-openpyxl>=3.1.5",
8587
"openpyxl>=3.1.5",
8688
"pytest-timeout>=2.4.0",
87-
"websockets>=15.0.1",
8889
"types-setuptools>=69.0.0",
8990
"pytest-socket>=0.7.0",
9091
# Data science libraries for testing
9192
"numpy>=1.21.0",
9293
"pandas>=1.3.0",
93-
"matplotlib>=3.5.0",
9494
"types-docker>=7.1.0.20251009",
9595
"copilotkit==0.1.69",
9696
]

src/xagent/providers/vector_store/__init__.py

Lines changed: 22 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -5,16 +5,29 @@
55
the standard VectorStore interface.
66
"""
77

8-
from xagent.providers.vector_store.base import VectorStore
9-
from xagent.providers.vector_store.chroma import ChromaVectorStore
10-
from xagent.providers.vector_store.lancedb import (
8+
import importlib.util
9+
10+
from .base import VectorStore
11+
from .lancedb import (
1112
LanceDBConnectionManager,
1213
LanceDBVectorStore,
1314
)
1415

15-
__all__ = [
16-
"VectorStore",
17-
"ChromaVectorStore",
18-
"LanceDBVectorStore",
19-
"LanceDBConnectionManager",
20-
]
16+
# ChromaVectorStore is optional (requires chromadb)
17+
_chroma_available = importlib.util.find_spec("chromadb") is not None
18+
19+
if _chroma_available:
20+
from .chroma import ChromaVectorStore
21+
22+
__all__ = [
23+
"VectorStore",
24+
"LanceDBVectorStore",
25+
"LanceDBConnectionManager",
26+
"ChromaVectorStore",
27+
]
28+
else:
29+
__all__ = [
30+
"VectorStore",
31+
"LanceDBVectorStore",
32+
"LanceDBConnectionManager",
33+
]

src/xagent/providers/vector_store/chroma.py

Lines changed: 8 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -3,9 +3,14 @@
33
from typing import Any, ClassVar, Optional
44
from uuid import uuid4
55

6-
from chromadb import Client
7-
from chromadb.config import Settings
8-
from chromadb.utils.embedding_functions import DefaultEmbeddingFunction
6+
try:
7+
from chromadb import Client
8+
from chromadb.config import Settings
9+
from chromadb.utils.embedding_functions import DefaultEmbeddingFunction
10+
except ImportError as e:
11+
raise ImportError(
12+
"ChromaDB is not installed. Please install it with: pip install chromadb"
13+
) from e
914

1015
from .base import VectorStore
1116

0 commit comments

Comments
 (0)