Skip to content

Commit 619e4f8

Browse files
authored
Merge pull request #20 from voarsh2/multi-repo-support-collections-11
Add multi-collection support and remote delta upload tooling
2 parents 1629c52 + b45b582 commit 619e4f8

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

45 files changed

+7450
-1046
lines changed

.env

Lines changed: 9 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -3,6 +3,11 @@
33
QDRANT_URL=http://qdrant:6333
44
# QDRANT_API_KEY= # not needed for local
55

6+
# Repository mode: 0=single-repo (default), 1=multi-repo
7+
# Single-repo: All files go into one collection (COLLECTION_NAME)
8+
# Multi-repo: Each subdirectory gets its own collection
9+
MULTI_REPO_MODE=0
10+
611
# Single unified collection for seamless cross-repo search
712
# Default: "codebase" - all your code in one collection for unified search
813
# This enables searching across multiple repos/workspaces without fragmentation
@@ -144,7 +149,7 @@ MEMORY_COLLECTION_TTL_SECS=300
144149
# INDEX_UPSERT_BATCH=128
145150
# INDEX_UPSERT_RETRIES=5
146151
# INDEX_UPSERT_BACKOFF=0.5
147-
WATCH_DEBOUNCE_SECS=4
152+
WATCH_DEBOUNCE_SECS=4
148153

149154

150155
# Duplicate Streamable HTTP MCP instances (run alongside SSE)
@@ -161,3 +166,6 @@ HYBRID_RESULTS_CACHE_ENABLED=1
161166
INDEX_CHUNK_LINES=60
162167
INDEX_CHUNK_OVERLAP=10
163168
USE_GPU_DECODER=0
169+
170+
# Development Remote Upload Configuration
171+
HOST_INDEX_PATH=./dev-workspace

.env.example

Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,10 +1,21 @@
11
# Qdrant connection
22
QDRANT_URL=http://localhost:6333
33
QDRANT_API_KEY=
4+
5+
# Multi-repo mode: 0=single-repo (default), 1=multi-repo
6+
# Single-repo: All files go into one collection (COLLECTION_NAME)
7+
# Multi-repo: Each subdirectory gets its own collection
8+
MULTI_REPO_MODE=0
9+
410
# Single unified collection for seamless cross-repo search (default: "codebase")
511
# Leave unset or use "codebase" for unified search across all your code
612
COLLECTION_NAME=codebase
713

14+
# Repository mode: 0=single-repo (default), 1=multi-repo
15+
# Single-repo: All files go into one collection (COLLECTION_NAME)
16+
# Multi-repo: Each subdirectory gets its own collection
17+
MULTI_REPO_MODE=0
18+
819
# Embeddings
920
EMBEDDING_MODEL=BAAI/bge-base-en-v1.5
1021
EMBEDDING_PROVIDER=fastembed

Dockerfile.mcp

Lines changed: 6 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -3,11 +3,16 @@ FROM python:3.11-slim
33

44
ENV PYTHONDONTWRITEBYTECODE=1 \
55
PYTHONUNBUFFERED=1 \
6-
WORK_ROOTS="/work,/app"
6+
WORK_ROOTS="/work,/app" \
7+
HF_HOME=/tmp/cache \
8+
TRANSFORMERS_CACHE=/tmp/cache
79

810
# Install latest FastMCP with Streamable HTTP (RMCP) support + deps
911
RUN pip install --no-cache-dir --upgrade mcp fastmcp qdrant-client fastembed
1012

13+
# Create cache directory with proper permissions
14+
RUN mkdir -p /tmp/cache && chmod 755 /tmp/cache
15+
1116
# Bake scripts into image so server can run even when /work points elsewhere
1217
COPY scripts /app/scripts
1318

Dockerfile.upload-service

Lines changed: 56 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,56 @@
1+
# Dockerfile for Context-Engine Delta Upload Service
2+
FROM python:3.11-slim
3+
4+
# Set environment variables
5+
ENV PYTHONUNBUFFERED=1 \
6+
PYTHONDONTWRITEBYTECODE=1 \
7+
PIP_NO_CACHE_DIR=1 \
8+
PIP_DISABLE_PIP_VERSION_CHECK=1 \
9+
PYTHONPATH=/app
10+
11+
# Install system dependencies
12+
RUN apt-get update && apt-get install -y \
13+
git \
14+
curl \
15+
&& rm -rf /var/lib/apt/lists/*
16+
17+
# Create app directory
18+
WORKDIR /app
19+
20+
# Copy requirements first for better caching
21+
COPY requirements.txt .
22+
23+
# Install Python dependencies
24+
RUN pip install --upgrade pip && \
25+
pip install -r requirements.txt
26+
27+
# Copy application code
28+
COPY scripts/ ./scripts/
29+
COPY . .
30+
31+
# Create work directory for repositories
32+
RUN mkdir -p /work && \
33+
chmod 755 /work
34+
35+
# Create non-root user for security
36+
RUN useradd --create-home --shell /bin/bash app && \
37+
chown -R app:app /app /work
38+
USER app
39+
40+
# Expose port
41+
EXPOSE 8002
42+
43+
# Health check
44+
HEALTHCHECK --interval=30s --timeout=10s --start-period=5s --retries=3 \
45+
CMD curl -f http://localhost:8002/health || exit 1
46+
47+
# Default environment variables
48+
ENV UPLOAD_SERVICE_HOST=0.0.0.0 \
49+
UPLOAD_SERVICE_PORT=8002 \
50+
QDRANT_URL=http://qdrant:6333 \
51+
WORK_DIR=/work \
52+
MAX_BUNDLE_SIZE_MB=100 \
53+
UPLOAD_TIMEOUT_SECS=300
54+
55+
# Run the upload service
56+
CMD ["python", "scripts/upload_service.py"]

Makefile

Lines changed: 62 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -4,8 +4,8 @@ SHELL := /bin/bash
44
# An empty export forces docker to use its default context/socket.
55
export DOCKER_HOST =
66

7-
.PHONY: help up down logs ps restart rebuild index reindex watch env hybrid bootstrap history rerank-local setup-reranker prune warm health
8-
.PHONY: venv venv-install
7+
.PHONY: help up down logs ps restart rebuild index reindex watch watch-remote env hybrid bootstrap history rerank-local setup-reranker prune warm health test-e2e
8+
.PHONY: venv venv-install dev-remote-up dev-remote-down dev-remote-logs dev-remote-restart dev-remote-bootstrap dev-remote-test dev-remote-client dev-remote-clean
99

1010
.PHONY: qdrant-status qdrant-list qdrant-prune qdrant-index-root
1111

@@ -77,6 +77,23 @@ index-here: ## index the current directory: make index-here [RECREATE=1] [REPO_N
7777
watch: ## watch mode: reindex changed files on save (Ctrl+C to stop)
7878
docker compose run --rm --entrypoint python indexer /work/scripts/watch_index.py
7979

80+
watch-remote: ## remote watch mode: upload delta bundles to remote server (Ctrl+C to stop)
81+
@echo "Starting remote watch mode..."
82+
@if [ -z "$(REMOTE_UPLOAD_ENDPOINT)" ]; then \
83+
echo "Error: REMOTE_UPLOAD_ENDPOINT is required"; \
84+
echo "Usage: make watch-remote REMOTE_UPLOAD_ENDPOINT=http://your-server:8080 [REMOTE_UPLOAD_MAX_RETRIES=3] [REMOTE_UPLOAD_TIMEOUT=30]"; \
85+
exit 1; \
86+
fi
87+
@echo "Remote upload endpoint: $(REMOTE_UPLOAD_ENDPOINT)"
88+
@echo "Max retries: $${REMOTE_UPLOAD_MAX_RETRIES:-3}"
89+
@echo "Timeout: $${REMOTE_UPLOAD_TIMEOUT:-30} seconds"
90+
docker compose run --rm --entrypoint python \
91+
-e REMOTE_UPLOAD_ENABLED=1 \
92+
-e REMOTE_UPLOAD_ENDPOINT=$(REMOTE_UPLOAD_ENDPOINT) \
93+
-e REMOTE_UPLOAD_MAX_RETRIES=$${REMOTE_UPLOAD_MAX_RETRIES:-3} \
94+
-e REMOTE_UPLOAD_TIMEOUT=$${REMOTE_UPLOAD_TIMEOUT:-30} \
95+
indexer /work/scripts/watch_index.py
96+
8097
rerank: ## multi-query re-ranker helper example
8198
docker compose run --rm --entrypoint python indexer /work/scripts/rerank_query.py \
8299
--query "chunk code by lines with overlap for indexing" \
@@ -216,12 +233,54 @@ llamacpp-build-image: ## build custom llama.cpp image with baked model (override
216233
# Download a tokenizer.json for micro-chunking (default: BAAI/bge-base-en-v1.5)
217234
TOKENIZER_URL ?= https://huggingface.co/BAAI/bge-base-en-v1.5/resolve/main/tokenizer.json
218235
TOKENIZER_PATH ?= models/tokenizer.json
219-
220236
tokenizer: ## download tokenizer.json to models/tokenizer.json (override with TOKENIZER_URL/TOKENIZER_PATH)
221237
@mkdir -p $(dir $(TOKENIZER_PATH))
222238
@echo "Downloading: $(TOKENIZER_URL) -> $(TOKENIZER_PATH)" && \
223239
curl -L --fail --retry 3 -C - "$(TOKENIZER_URL)" -o "$(TOKENIZER_PATH)"
224240

241+
# --- Development Remote Upload System Targets ---
242+
243+
dev-remote-up: ## start dev-remote stack with upload service
244+
@echo "Starting development remote upload system..."
245+
@mkdir -p dev-workspace/.codebase
246+
docker compose -f docker-compose.dev-remote.yml up -d --build
247+
248+
dev-remote-down: ## stop dev-remote stack
249+
@echo "Stopping development remote upload system..."
250+
docker compose -f docker-compose.dev-remote.yml down
251+
252+
dev-remote-logs: ## follow logs for dev-remote stack
253+
docker compose -f docker-compose.dev-remote.yml logs -f --tail=100
254+
255+
dev-remote-restart: ## restart dev-remote stack (rebuild)
256+
docker compose -f docker-compose.dev-remote.yml down && docker compose -f docker-compose.dev-remote.yml up -d --build
257+
258+
dev-remote-bootstrap: env dev-remote-up ## bootstrap dev-remote: up -> wait -> init -> index -> warm
259+
@echo "Bootstrapping development remote upload system..."
260+
./scripts/wait-for-qdrant.sh
261+
docker compose -f docker-compose.dev-remote.yml run --rm init_payload || true
262+
$(MAKE) tokenizer
263+
docker compose -f docker-compose.dev-remote.yml run --rm indexer --root /work --recreate
264+
$(MAKE) warm || true
265+
$(MAKE) health
266+
267+
dev-remote-test: ## test remote upload workflow
268+
@echo "Testing remote upload workflow..."
269+
@echo "Upload service should be accessible at http://localhost:8004"
270+
@echo "Health check: curl http://localhost:8004/health"
271+
@echo "Status check: curl 'http://localhost:8004/api/v1/delta/status?workspace_path=/work/test-repo'"
272+
@echo "Test upload: curl -X POST -F 'bundle=@test-bundle.tar.gz' -F 'workspace_path=/work/test-repo' http://localhost:8004/api/v1/delta/upload"
273+
274+
dev-remote-client: ## start remote upload client for testing
275+
@echo "Starting remote upload client..."
276+
docker compose -f docker-compose.dev-remote.yml --profile client up -d remote_upload_client
277+
278+
dev-remote-clean: ## clean up dev-remote volumes and containers
279+
@echo "Cleaning up development remote upload system..."
280+
docker compose -f docker-compose.dev-remote.yml down -v
281+
docker volume rm context-engine_shared_workspace context-engine_shared_codebase context-engine_upload_temp context-engine_qdrant_storage_dev_remote 2>/dev/null || true
282+
rm -rf dev-workspace
283+
225284

226285
# Router helpers
227286
Q ?= what is hybrid search?

README.md

Lines changed: 17 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -730,6 +730,8 @@ Indexer/Search MCP (8001 SSE, 8003 RMCP):
730730
- search_callers_for — intent wrapper for probable callers/usages
731731
- search_importers_for — intent wrapper for files importing a module/symbol
732732
- change_history_for_path(path) — summarize recent changes using stored metadata
733+
- collection_map - return collection↔repo mappings
734+
- default_collection - set the collection to use for the session
733735

734736
Notes:
735737
- Most search tools accept filters like language, under, path_glob, kind, symbol, ext.
@@ -888,11 +890,25 @@ For production-grade backup/migration strategies, see the official Qdrant docume
888890

889891
Operational notes:
890892
- Collection name comes from `COLLECTION_NAME` (see .env). This stack defaults to a single collection for both code and memories; filtering uses `metadata.kind`.
891-
- If you switch to a dedicated memory collection, update the MCP Memory server and the Indexers memory blending env to point at it.
893+
- If you switch to a dedicated memory collection, update the MCP Memory server and the Indexer's memory blending env to point at it.
892894
- Consider pruning expired memories by filtering `expires_at < now`.
893895

894896
- Call `context_search` on :8001 (SSE) or :8003 (RMCP) with `{ "include_memories": true }` to return both memory and code results.
895897

898+
### Collection Naming Strategies
899+
900+
Different hash lengths are used for different workspace types:
901+
902+
**Local Workspaces:** `repo-name-8charhash`
903+
- Example: `Anesidara-e8d0f5fc`
904+
- Used by local indexer/watcher
905+
- Assumes unique repo names within workspace
906+
907+
**Remote Uploads:** `folder-name-16charhash-8charhash`
908+
- Example: `testupload2-04e680d5939dd035-b8b8d4cc`
909+
- Collision avoidance for duplicate folder names for different codebases
910+
- 16-char hash identifies workspace, 8-char hash identifies collection
911+
896912

897913
### Enable memory blending (for context_search)
898914

0 commit comments

Comments
 (0)