This setup runs Allycat RAG - runs models on cloud services.
- API_KEYs for service we wll be using. For example, to use Nebius AI, we will need
NEBIUS_API_KEY
| Component | Functionality | Runtime |
|---|---|---|
| Milvus embedded | Vector db | Locally or remotely |
| Models | LLM runtime | Remotely (Nebius, Replicate ...etc) |
# Substitute appropriate repo URL
git clone https://github.com/The-AI-Alliance/allycat/
cd allycat/rag-remoteFollow the Python dev env setup guide.
And activate your python env as below
## if using uv
uv sync
## if using python venv
source .venv/bin/activate
pip install -r requirements.txt
## If using conda
conda activate allycat-1 # what ever the name of the env
pip install -r requirements.txtA sample env.sample.txt is provided. Copy this file into .env file.
cp env.sample.txt .envAnd edit .env file to make your changes.
1) To use Nebius AI
- Get NEBIUS_API_KEY from Nebius
- add the NEBIUS_API_KEY to
.envfile
NEBIUS_API_KEY = "your key goes here"
The default models used will be:
- LLM:
nebius/Qwen/Qwen3-30B-A3B-Instruct-2507 - Embedding:
nebius/Qwen/Qwen3-Embedding-8B
Optionally you can configure models we might use in .env file:
Find the available models at Nebius Token Factory
EMBEDDING_MODEL = Qwen/Qwen3-Embedding-8B
EMBEDDING_LENGTH = 384
LLM_MODEL = nebius/Qwen/Qwen3-30B-A3B-Instruct-2507
This step will crawl a site and download the website content into the workspace/crawled directory
code: 1_crawl_site.py
# default settings
## if using uv
uv run python 1_crawl_site.py --url https://thealliance.ai
# or
python 1_crawl_site.py --url https://thealliance.ai
# or specify parameters
uv run python 1_crawl_site.py --url https://thealliance.ai --max-downloads 100 --depth 5
# or
python 1_crawl_site.py --url https://thealliance.ai --max-downloads 100 --depth 5We will process the downloaded files (html / pdf) and extract the text as markdown. The output will be saved in theworkspace/processed directory in markdown format
We use Docling to process downloaded files. It will convert the files into markdown format for easy digestion.
- Use python script: 2_process_files.py
- or (For debugging) Run notebook : 2_process_files.ipynb
uv run python 2_process_files.py
# or
python 2_process_files.py
# uv run python 2_process_files.pyIn this step we:
- create chunks from cleaned documents
- create embeddings (embedding models may be downloaded at runtime)
- save the chunks + embeddings into a vector database
We currently use Milvus as the vector database. We use the embedded version, so there is no setup required!
- Run python script 3_save_to_vector_db.py
- or (For debugging) Run the notebook 3_save_to_vector_db.ipynb
uv run python 3_save_to_vector_db.py
# or
python 3_save_to_vector_db.py- running python script: 4_query.py
- or (for debug) using notebook 4_query.ipynb
uv run python 4_query.py
# or
python 4_query.pyOption 1: Flask UI
python app_flask.pyGo to url : http://localhost:8080 and start chatting!
Option 2: Chainlit UI
uv run chainlit run app_chainlit.py --port 8090
# or
chainlit run app_chainlit.py --port 8090Go to url : http://localhost:8090 and start chatting!
MCP server code
MCP client code
See mcp.md for more.
We will create a docker image of the app. It will package up the code + data
Note: Be sure to run the docker command from the root of the project.
docker build -t allycat-remote .Let's start the docker in 'dev' mode
docker run -it --rm -p 8090:8090 -p 8080:8080 allycat-remote deploy
# docker run -it --rm -p 8090:8090 -v allycat-vol1:/allycat/workspace sujee/allycatdeploy option starts web UI.
when uv dependencies are updated, run this command to create requirements.txt
uv export --frozen --no-hashes --no-emit-project --no-default-groups --output-file=requirements.txt