This project provides an HTTP service to generate text embeddings using HuggingFace models. The API is built with FastAPI and containerized with Docker, ready to use locally or in test/prototype environments.
/embedendpoint that accepts a list of texts and returns embedding vectors.- Model and device configurable via environment variables.
- Docker image includes the model pre-downloaded for immediate startup.
- Gunicorn + Uvicorn used to handle multiple concurrent requests.
EMBEDDING_MODEL_ID: HuggingFace model ID to use (default:sentence-transformers/all-MiniLM-L6-v2).DEVICE: Set tocpuorcudadepending on available hardware (default:cpu).
- Build the image:
docker build -t embeddings-service .
- Run the container with defaults:
docker run -p 8000:8000 embeddings-service
- Override environment variables if needed:
docker run -e EMBEDDING_MODEL_ID=Qwen/Qwen3-Embedding-0.6B -e DEVICE=cuda -p 8000:8000 embeddings-service
services:
embeddings:
image: embeddings-service:latest
ports:
- "8000:8000"
environment:
EMBEDDING_MODEL_ID: sentence-transformers/all-MiniLM-L6-v2
DEVICE: cpuPOST /embed
Request JSON:
{ "texts": ["text1", "text2"] }Response JSON:
{ "vectors": [[...], [...]] }- The model is pre-downloaded during the build, so first startup is fast.
- If a GPU is available, set
DEVICE=cudato utilize it. - Larger models require more RAM and longer initialization time.