GitHub - vtdinh13/habit-builder-ai-agent: An end-to-end AI agent project that transcribes audio files, embeds user queries, and searches in Qdrant and web browser via the Brave API. A Streamlit interface powered by OpenAI GPT models delivers actionable health insights from both the archive and the latest research.

Introduction

Building lasting habits is challenging without understanding the why behind them. The Habit Builder AI Agent bridges this gap by drawing on the Huberman Lab podcast archive—a top 5 podcast on Apple and Spotify with over 7 million YouTube subscribers—to transform complex scientific insights into actionable habits supported by the latest research.

This AI agent acts as a personalized coach that delivers relevant knowledge from the podcast's extensive archive, searches the web for current research, and with access to all of this knowledge, recommends actionable takeaways grounded in expert interviews and scientific evidence.

Audio files are downloaded via RSS, transcribed with Faster Whisper, chunked with a sliding window, and embedded with Hugging Face's Sentence Transformer model all-MPNet-base-v2. Qdrant stores embeddings and Streamlit offers a nice interface to interact with the agent. You will be able to create the local Streamlit version if you replicate this project, or you can also visit the Streamlit cloud version, as shown below:

The agent is implemented with the Pydantic's BaseModel for strict Python data validation and PydanticAI's Agent class for structured output and agent tooling. OpenAI's gpt-4o-mini powers the reasoning and the tools given to the agent include: searching the knowledge base, retrieving recent research articles, and summarizing the current state of the research for a requested topic.

Testing and evaluation combine vibe checks, unit tests (via the pytest framework), and a LLM judge. Logging/monitoring can be done locally or with Pydantic Logfire.

The diagram below outlines the development flow and supporting services.

Setup

Clone this repo:

git clone https://github.com/vtdinh13/habit-builder-ai-agent.git

uv manages Python packages and the virtual environment. To replicate the project you can use either uv or pip, but using uv will match this repository’s workflow most closely. Choose Option 1 or 2, but not both.

Option 1: Manage with uv
- Install uv if it is not already on your system. See Astral documentation for installation steps.
- Run uv sync to install all required packages and start uv's managed virtual environment.
Option 2: Manage with pip
- Run pip install -r requirements.txt.
Docker Desktop runs the Docker Engine daemon.
- Download Docker Desktop if needed (refer to the Docker Desktop docs).
- Start Docker Desktop so the Docker Engine daemon is available.
- Run docker-compose up to start every service, or docker-compose up -d to run them in detached mode so the containers stay in the background while you continue working in the terminal.
API keys are required.

Required keys
- The agent uses OpenAI models. Sign up for an OpenAI API key if you don't already have one.
- The Brave API powers the web search tool. Register for a Brave API key.
Optional keys
- The local vector databases is sufficient, but if you want to upload embeddings to Qdrant cloud, generate an API key from Qdrant.
API keys are managed via direnv. Keys live in a .env file, and .envrc contains dotenv so the values load automatically. Example:
```
OPENAI_API_KEY=openai_api_key_value
BRAVE_API_KEY=brave_api_key_value
```
If you want to skip direnv and .env entirely, export the keys and its values:
```
export OPENAI_API_KEY="openai_api_key_value"
export BRAVE_API_KEY="brave_api_key_value"
```
The values to the API keys are now available in your current working environment. Exports only apply to this one session; you'd have to export it again if you need to revist this project.

Ingestion

Downloading and transcribing audio files is a project on its own. A Parquet file containing transcripts is provided to avoid this step. See Ingestion if you'd like to replicate the transcription process yourself.
Make sure Docker Desktop is running.
Start the qdrant service:
```
docker-compose up qdrant -d
```
The speed of chunking and embedding text and upserting into Qdrant depends on your processor. Adjust the parquet and embedding batch size according to the capability of your machine. The default is set at 128. Chunking and uploading embeddings to the local Qdrant vector database takes ~2 hours.
```
uv run python ingestion/ingest_qdrant.py \
  --parquet-path transcripts/transcripts.parquet \
  --collection-name transcripts \
  --distance cosine \
  --parquet-batch-size 128 \ 
  --embedding-batch-size 128 \ 
  --target local 
```
Recommendation: Add the --limit argument to process only a sample of transcripts (each row corresponds to one episode/transcript). For example, --limit 100 chunks and uploads the first 100 transcripts. This cuts the processing time to about 40 minutes.
```
uv run python ingestion/ingest_qdrant.py \
  --parquet-path transcripts/transcripts.parquet \
  --collection-name transcripts \
  --distance cosine \
  --parquet-batch-size 128 \ 
  --embedding-batch-size 128 \ 
  --target local \
  --limit 100
```

If you are using pip to manage your packages, run the following instead:

python ingestion/ingest_qdrant.py \
  --parquet-path transcripts/transcripts.parquet \
  --collection-name transcripts \
  --distance cosine \
  --parquet-batch-size 128 \ 
  --embedding-batch-size 128 \ 
  --target local \
  --limit 100

Optional: You can see your data in the Qdrant dashboard: http://localhost:6333/dashboard.
Keep the Qdrant service up to run the agent. Shut the Qdrant service down when the service is no longer neccessary: docker-compose down qdrant.

Agent

The Qdrant database is ready for querying. Note that the Qdrant service has to be running via Docker. Testing on CLI allows you to ask only one question at a time and you cannot ask follow up questions. Make sure that your API keys are available in your current working environment.
- with uv:
```
uv run habit_agent_run.py
```
- with pip:
```
python habit_agent_run.py
```
You can also run the agent locally on Streamlit. This option includes streaming parsing and continuing conversation. Run the following command on CLI:
- with uv:
```
uv run streamlit run qdrant_app_no_logfire.py
```
- with pip:
```
python streamlit run qdrant_app_no_logfire.py
```
A window should pop up in your browser giving you access to the streamlit app. Paste this link in your browser if that is not the case: http://localhost:8505/.
There's also a streamlit cloud version. You can interact with the agent without having to replicate the repo.

Test

Install all development dependencies:
```
uv sync --group dev
```
Choose between Option 1 or 2. Option 1 runs tests via Makefile on CLI. Option 2 runs tests via VS code.

Option 1: Run tests on CLI
- To run all files with the prefix test_*:
```
make test 
```
- A version of the following will display:
Option 2: Run tests in VS code
- Select Testing on the Activity bar, which can be found on the vertical bar on the left edge of VS Code.
- Run tests by click on the triangle as shown below:

Note: Tests could fail if the entire knowledge base is not upserted to Qdrant.

Evaluation

Running evaluation takes ~5 minutes.

uv run python -m evaluation.eval_orchestrator \
--csv evaluation/gt_sample.csv \
--agent-model gpt-4o-mini \
--judge-model gpt-5-nano \
--concurrency 2

You should get a version of the following:

Logging and Monitoring

User interactions are logged with Pydantic Logfire. You have to make an account to use it.
Create a new project.
Enter a desired project name, then click "Create Project".
Authenticate your local environment on the command line: uv run logfire auth. You will be asked questions about your region and prompted to provide your password.
Once authentication is complete, run uv run habit_agent_logfire.py. The script invokes the agent.
A logs folder will also be generated to log user interactions in your current directory.

Name		Name	Last commit message	Last commit date
Latest commit History 138 Commits
.vscode		.vscode
agent_logging		agent_logging
diagrams		diagrams
evaluation		evaluation
ingestion		ingestion
tests		tests
tools		tools
transcripts		transcripts
.gitignore		.gitignore
Makefile		Makefile
README.md		README.md
docker-compose.yaml		docker-compose.yaml
habit_agent.py		habit_agent.py
habit_agent_logfire.py		habit_agent_logfire.py
habit_agent_run.py		habit_agent_run.py
main.py		main.py
pyproject.toml		pyproject.toml
qdrant_app.py		qdrant_app.py
qdrant_app_no_logfire.py		qdrant_app_no_logfire.py
token_guard.py		token_guard.py
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Introduction

Setup

Ingestion

Agent

Test

Evaluation

Logging and Monitoring

About

Uh oh!

Languages

vtdinh13/habit-builder-ai-agent

Folders and files

Latest commit

History

Repository files navigation

Introduction

Setup

Ingestion

Agent

Test

Evaluation

Logging and Monitoring

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Languages