Building lasting habits is challenging without understanding the why behind them. The Habit Builder AI Agent bridges this gap by drawing on the Huberman Lab podcast archive—a top 5 podcast on Apple and Spotify with over 7 million YouTube subscribers—to transform complex scientific insights into actionable habits supported by the latest research.
This AI agent acts as a personalized coach that delivers relevant knowledge from the podcast's extensive archive, searches the web for current research, and with access to all of this knowledge, recommends actionable takeaways grounded in expert interviews and scientific evidence.
Audio files are downloaded via RSS, transcribed with Faster Whisper, chunked with a sliding window, and embedded with Hugging Face's Sentence Transformer model all-MPNet-base-v2. Qdrant stores embeddings and Streamlit offers a nice interface to interact with the agent. You will be able to create the local Streamlit version if you replicate this project, or you can also visit the Streamlit cloud version, as shown below:
The agent is implemented with the Pydantic's BaseModel for strict Python data validation and PydanticAI's Agent class for structured output and agent tooling. OpenAI's gpt-4o-mini powers the reasoning and the tools given to the agent include: searching the knowledge base, retrieving recent research articles, and summarizing the current state of the research for a requested topic.
Testing and evaluation combine vibe checks, unit tests (via the pytest framework), and a LLM judge. Logging/monitoring can be done locally or with Pydantic Logfire.
The diagram below outlines the development flow and supporting services.
-
Clone this repo:
git clone https://github.com/vtdinh13/habit-builder-ai-agent.git
-
uvmanages Python packages and the virtual environment. To replicate the project you can use eitheruvorpip, but usinguvwill match this repository’s workflow most closely. Choose Option 1 or 2, but not both.Option 1: Manage with uv
- Install
uvif it is not already on your system. See Astral documentation for installation steps. - Run
uv syncto install all required packages and start uv's managed virtual environment.
Option 2: Manage with pip
- Run
pip install -r requirements.txt.
- Install
-
Docker Desktop runs the Docker Engine daemon.
- Download Docker Desktop if needed (refer to the Docker Desktop docs).
- Start Docker Desktop so the Docker Engine daemon is available.
- Run
docker-compose upto start every service, ordocker-compose up -dto run them in detached mode so the containers stay in the background while you continue working in the terminal.
-
API keys are required.
Required keys
- The agent uses OpenAI models. Sign up for an OpenAI API key if you don't already have one.
- The Brave API powers the web search tool. Register for a Brave API key.
Optional keys
- The local vector databases is sufficient, but if you want to upload embeddings to Qdrant cloud, generate an API key from Qdrant.
-
API keys are managed via
direnv. Keys live in a.envfile, and.envrccontainsdotenvso the values load automatically. Example:OPENAI_API_KEY=openai_api_key_value BRAVE_API_KEY=brave_api_key_value -
If you want to skip
direnvand.enventirely, export the keys and its values:export OPENAI_API_KEY="openai_api_key_value" export BRAVE_API_KEY="brave_api_key_value"
The values to the API keys are now available in your current working environment. Exports only apply to this one session; you'd have to export it again if you need to revist this project.
-
Downloading and transcribing audio files is a project on its own. A Parquet file containing transcripts is provided to avoid this step. See Ingestion if you'd like to replicate the transcription process yourself.
-
Make sure Docker Desktop is running.
-
Start the qdrant service:
docker-compose up qdrant -d -
The speed of chunking and embedding text and upserting into Qdrant depends on your processor. Adjust the parquet and embedding batch size according to the capability of your machine. The default is set at 128. Chunking and uploading embeddings to the local Qdrant vector database takes ~2 hours.
uv run python ingestion/ingest_qdrant.py \ --parquet-path transcripts/transcripts.parquet \ --collection-name transcripts \ --distance cosine \ --parquet-batch-size 128 \ --embedding-batch-size 128 \ --target local
Recommendation: Add the
--limitargument to process only a sample of transcripts (each row corresponds to one episode/transcript). For example,--limit 100chunks and uploads the first 100 transcripts. This cuts the processing time to about 40 minutes.uv run python ingestion/ingest_qdrant.py \ --parquet-path transcripts/transcripts.parquet \ --collection-name transcripts \ --distance cosine \ --parquet-batch-size 128 \ --embedding-batch-size 128 \ --target local \ --limit 100
-
If you are using pip to manage your packages, run the following instead:
python ingestion/ingest_qdrant.py \ --parquet-path transcripts/transcripts.parquet \ --collection-name transcripts \ --distance cosine \ --parquet-batch-size 128 \ --embedding-batch-size 128 \ --target local \ --limit 100
-
Optional: You can see your data in the Qdrant dashboard: http://localhost:6333/dashboard.
-
Keep the Qdrant service up to run the agent. Shut the Qdrant service down when the service is no longer neccessary:
docker-compose down qdrant.
- The Qdrant database is ready for querying. Note that the Qdrant service has to be running via Docker. Testing on CLI allows you to ask only one question at a time and you cannot ask follow up questions. Make sure that your API keys are available in your current working environment.
-
with uv:
uv run habit_agent_run.py
-
with pip:
python habit_agent_run.py
-
- You can also run the agent locally on Streamlit. This option includes streaming parsing and continuing conversation. Run the following command on CLI:
- with uv:
uv run streamlit run qdrant_app_no_logfire.py
- with pip:
python streamlit run qdrant_app_no_logfire.py
- with uv:
- A window should pop up in your browser giving you access to the streamlit app. Paste this link in your browser if that is not the case: http://localhost:8505/.
- There's also a streamlit cloud version. You can interact with the agent without having to replicate the repo.
-
Install all development dependencies:
uv sync --group dev
-
Choose between Option 1 or 2. Option 1 runs tests via Makefile on CLI. Option 2 runs tests via VS code.
Option 1: Run tests on CLI
Option 2: Run tests in VS code
Note: Tests could fail if the entire knowledge base is not upserted to Qdrant.
-
Running evaluation takes ~5 minutes.
uv run python -m evaluation.eval_orchestrator \ --csv evaluation/gt_sample.csv \ --agent-model gpt-4o-mini \ --judge-model gpt-5-nano \ --concurrency 2
You should get a version of the following:
-
User interactions are logged with Pydantic Logfire. You have to make an account to use it.
-
Create a new project.
-
Enter a desired project name, then click "Create Project".
-
Authenticate your local environment on the command line:
uv run logfire auth. You will be asked questions about your region and prompted to provide your password. -
Once authentication is complete, run
uv run habit_agent_logfire.py. The script invokes the agent.
-
A
logsfolder will also be generated to log user interactions in your current directory.



