Project setup helper (LLM chatbot) for CEO
ceo-chatbot/
├── .venv/ # uv python virtual environment directory (not tracked; created on first `uv sync` or `uv run`)
├── conf/ # Configuration files for script arguments (YAML, JSON)
│ ├── base/ # Global/shared configuration files tracked in repo
│ │ ├── prompts.yml # Prompt configuration
│ │ └── rag_config.yml # RAG configuration: models, vectorstore, prompts...
│ └── local/ # Personal/local configs (excluded from version control)
│
├── data/ # Project data directory (not tracked in repo)
│
├── demo/ # Try ceo-chatbot in a streamlit app
│
├── notebooks/ # Jupyter notebooks for development, prototyping, demos
│
├── src/ # Source code (modules and utilities imported by scripts) and scripts
│ └── ceo_chatbot
│ ├── __init__.py
│ ├── config.py # Utilities for loading confs in conf/base/
│ │
│ ├── ingest # Data ingestion pipeline: loaders.py, chunking.py, embeddings.py -> index_builder.py
│ │ ├── __init__.py
│ │ ├── chunking.py # Utilities for splitting documents into chunks
│ │ ├── embeddings.py # Utilities for defining embedding model
│ │ ├── index_builder.py # Define data ingestion pipeline
│ │ └── loaders.py # Utilities for loading dataset
│ │
│ └── rag # RAG pipeline: llm.py, retriever.py -> pipeline.py
│ ├── __init__.py
│ ├── llm.py # Utilities for defining reader model
│ ├── pipeline.py # Define RAG pipeline
│ └── retriever.py # Utilities for doc retrieval and similarity search
│
├── scripts/
│ ├── build_index.py # Data ingestion pipeline runner (offline); chunk, embed, build vector store
│ └── ...
│
├── tests/ # Unit and integration tests
│
├── .gitignore
├── .python-version # Python version used by uv environment manager
├── pyproject.toml # Metadata about the project (for uv)
├── uv.lock # Locked dependency versions
└── README.md
SSH:
git clone git@github.com:sig-gis/ceo-chatbot.git
HTTPS:
git clone https://github.com/sig-gis/ceo-chatbot.git
This app uses uv for dependency managment. Read more about uv in the docs.
macOS/Linux
curl -LsSf https://astral.sh/uv/install.sh | sh
See the uv installation docs for Windows installation instructions
Also see CONTRIBUTING.md for more detail on developing with uv in this project.
Install the ceo-chatbot package locally in editable mode.
From the project root:
uv pip install -e .
For local development, the knowledge corpus must be set up one time.
If using a gated model such as google/embeddinggemma-300m,
- Request access to the model on its Hugging Face model page (access is granted instantly)
- Generate an access token: Profile > Settings > Access Tokens > + Create new token > Token type Read > Create token
- Run the following
hfcli command in your terminal and paste your HF access token when prompted
uv run hf auth login
Authenticate GCloud for access to the GCS bucket specified in conf/base/extract_docs_config.yml
gcloud auth login
To initially set up the CEO docs corpus or to manually re-upload updates to the GCS bucket specified in conf/base/extract_docs_config.yml, run the CEO docs extraction pipeline.
NOTE: this step is not required to reproduce the chatbot once the corpus exists in GCS. If the corpus already exists in GCS, skip to Build the vector DB below.
uv run scripts/extract_docs.py
Build the vector DB:
uv run scripts/build_index.py
Stay tuned! A planned future version will automate uploading the corpus to GCS and building the vector DB.
Launch a basic streamlit application to demo ceo-chatbot in a chat UI.
uv run streamlit run demo/chat_app.py