- Models from Anthropic, Ollama, OpenAI
- Vector Databases: ChromaDB
- RAG (retrieval augmented generation)
- LangChain
Create a .env
file in the root with:
OLLAMA_ROOT=ollama
$ docker compose up
Note that on Mac, Docker will not have access to the GPU which means model responses will be very slow.
Install Ollama
$ ollama pull all-minilm
$ ollama pull mistral
$ ollama serve
OLLAMA_ROOT=host.docker.internal
$ docker compose -f compose-no-ollama.yaml up
Install and run Ollama as per Option 2.
OLLAMA_ROOT=localhost
$ python3 -m venv venv
$ source venv/bin/activate
$ pip install -r requirements.txt
Note that --env-file .env makes uvicorn auto load env vars so that load_dotenv() is not needed.
$ uvicorn app.main:app --env-file .env --reload
Endpoints are accessible at http://0.0.0.0:8000
GET /chat or /chat/langchain JSON body:
{
"prompt": "Who are you?"
}
POST /admin/collection/document/add
JSON body:
{
"content": "This is a document"
}
GET /admin/collection/count
GET /admin/collection/reset
$ cd infra
$ terraform apply
$ cd ../
$ ./deploy.sh
Get app URI:
$ kubectl get svc
$ kubectl delete svc --all
$ kubectl delete deployment --all
Wait a few minutes, then...
$ cd infra
$ terraform destroy