This is the backend for the Chatterbox project. Uses FastAPI as the backend framework and SQLAlchemy as the ORM.
We use direnv to manage environment variables, it can be installed here
cp .envrc.example .envrcdirenv allow .Fill in the environment variables in the.envrcfile.
- Install
uvhere for dependency management - use python 3.12 if you don't have it
uv python install 3.12.9
- activate the venv
uv venvto create the venvsource .venv/bin/activateto use the venv
- run
uv syncwithin the virtual environment to sync the dependencies from the uv.lock file into your virtual environment
- You should always be in the virtual environment when developing e.g. this
(cortex) $should be present in your terminal - Activate the virtual environment if you are not already in it
source .venv/bin/activate
- Load the environment variables using
direnv allow .
docker compose up -dfastapi dev app/main.pyOpen another terminal and run the following command to start the temporal server
temporal server start-devchmod +x app/temporal/run_worker.sh
app/temporal/run_worker.sh- Use the following to get a cognito access token to simulate a user login to access authenticated endpoints
aws cognito-idp initiate-auth \
--auth-flow USER_PASSWORD_AUTH \
--client-id ${COGNITO_CLIENT_ID} \
--auth-parameters USERNAME=${username},PASSWORD=${password} \
--query 'AuthenticationResult.AccessToken' \
--output text- Activate the virtual environment if you are not already in it
source .venv/bin/activate
- run
uv pip install -e ".[dev]" - run
pytest - current tests that work is test get user profile, test create chatbot valid and invalid
- User upload file
- File uploaded to S3, return success to client
- Start a temporal workflow that:
- Generate a presigned S3 url to pass to Mistral OCR API to parse PDF
- Convert the parsed pdf text into chunks
- Generate embeddings from each chunk
- Store the embeddings into vector store
- Update the sync status of the document to
SyncedIf there are exceptions e.g. Mistral API rate limit Retry policy will be carried out by the temporal server. Failed retry video
- User submits question
- Question gets passed to agent workflow
- Agent workflow has its tools (search_info_from_documents) and the agent (function calling LLM) does:
- Query decomposition from complex queries into multiple single queries 1. Routes each query into the right tools to answer the question
- Invoke the tools with the question based on the function arguments
- Keep doing this (invoke the tool with arguments, decide what tool to invoke based on the answer and question to return) until the LLM thinks the tool responses can answer the question
- Store the answer and tool call responses in the chat store for conversation
Rate limit of 2 request per minute at the chat API endpoint level handled by SlowAPI.
- Activate the virtual environment if you are not already in it
source .venv/bin/activate
- run
uv pip install -e ".[dev]" - run
pytest - current tests that work is test get user profile, test create chatbot valid and invalid
df70495 (added test stuff running to readme)
Frontend
- Typescript
- Next.js
Backend
- Python
- FastAPI (Web server)
- SQLAlchemy (ORM)
- Llama Index (RAG Framework)
- Temporal (Workflow orchestration) To handle syncing of documents uploaded into vector store and easy configuration of retry policies in the event of activity failures e.g. Mistral OCR API rate limit, vector store not available etc
Database
- Postgres + pgvector (for storing vector embeddings)
Infrastructure
- AWS Cognito
- AWS S3 Terraform for provisioning resources
Third party
- OpenAI (for ReAct agent that powers the chat that does the query decomposition and invokes the tool to search information from the vector store)
- Mistral OCR API