Skip to content

Commit d68fb40

Browse files
committed
updates
1 parent 46ec5fa commit d68fb40

28 files changed

+1721
-884
lines changed

.claude/settings.local.json

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -9,8 +9,9 @@
99
"Bash(python3:*)",
1010
"Bash(git add:*)",
1111
"Bash(git commit:*)",
12-
"Bash(git push:*)"
12+
"Bash(git push:*)",
13+
"Bash(done)"
1314
],
1415
"deny": []
1516
}
16-
}
17+
}

_drafts/2025-12-31-vaultagent-voice-powered-life-os.md

Lines changed: 27 additions & 80 deletions
Original file line numberDiff line numberDiff line change
@@ -2,117 +2,64 @@
22
layout: post
33
title: "VaultAgent: Building a Voice-Powered Personal Life OS"
44
date: 2025-12-31
5-
excerpt: "I got tired of typing daily notes, so I'm building an AI that interviews me while I make coffee."
5+
excerpt: "I keep abandoning productivity systems because typing my thoughts feels like homework. So I'm building an AI that interviews me while I make coffee."
66
categories: [ai, maker]
77
tags: [voice, obsidian, llm, personal-productivity, python]
88
comments: true
99
source_note: ~/obsidian/Projects/VaultAgent.md
1010
status: draft
1111
---
1212

13-
<!-- OUTLINE -->
14-
## Outline
13+
I've tried every productivity system -- Notion, Roam, bullet journals, sticky notes on my monitor. The problem was never the system. It was me. I'd build beautiful dashboards and abandon them within a week because typing out my thoughts felt like homework.
1514

16-
**Target Length**: 800-1000 words
17-
**Categories**: [ai, maker]
18-
**Tags**: [voice, obsidian, llm, personal-productivity, python]
19-
20-
**Hook**: "I've tried every productivity system—Notion, Roam, bullet journals, even sticky notes on my monitor. The problem was never the system. It was me. I'd build beautiful dashboards and then abandon them within a week because typing out my thoughts felt like homework."
21-
22-
**Section 1: The Friction Problem**
23-
- Daily journaling/tracking requires typing
24-
- Speaking is 3-4x faster than typing
25-
- We naturally share more when talking
26-
- Morning routine already has hands-free time (coffee, getting ready)
27-
28-
**Section 2: The VaultAgent Concept**
29-
- AI voice agent that conducts daily "interviews"
30-
- Integrated with sun-up/sun-down routines
31-
- Asks tailored questions based on vault context
32-
- Writes structured updates to Obsidian files
33-
34-
**Section 3: Architecture Decisions**
35-
- Whisper for speech-to-text (fast, accurate)
36-
- OpenCode/Claude as the "brain" with vault context
37-
- ElevenLabs for natural-sounding responses
38-
- Direct filesystem access to Obsidian vault
39-
40-
**Section 4: Conversation Design**
41-
- One question at a time
42-
- Follow-up on vague answers
43-
- Context-aware (knows yesterday's tasks, current projects)
44-
- 5-10 minute morning and evening flows
45-
46-
**Section 5: The Cost Reality**
47-
- ~$15-30/month for daily use
48-
- Could be free with local Whisper
49-
- Worth it for consistency?
50-
51-
**Conclusion**: "The goal isn't to build another productivity tool—it's to make reflection as natural as conversation. If I can answer a few questions while making coffee, maybe I'll actually stick with it."
52-
53-
**Links to include**: OpenCode, Obsidian, ElevenLabs, Whisper
54-
55-
---
56-
57-
<!-- FIRST DRAFT -->
58-
## Draft
59-
60-
I've tried every productivity system—Notion, Roam, bullet journals, even sticky notes on my monitor. The problem was never the system. It was me. I'd build beautiful dashboards and then abandon them within a week because typing out my thoughts felt like homework.
15+
Then I realized something obvious: I talk to myself constantly. In the shower, in the car, while making coffee. The barrier to reflection isn't time or discipline -- it's the keyboard.
6116

6217
<!--more-->
6318

64-
Then I realized something obvious: I talk to myself constantly. In the shower, in the car, while making coffee. The barrier to reflection isn't time or discipline—it's the keyboard. So I'm building VaultAgent, an AI that interviews me every morning and evening, then updates my Obsidian vault automatically.
65-
66-
### The Friction Problem
67-
68-
Here's the thing about daily tracking: it works. The data is valuable. Knowing how I slept, what I accomplished, what's blocking me—that context makes better decisions. But the act of opening a text editor and typing "Today I woke up feeling..." kills me. It feels performative and slow.
69-
70-
Speaking is different. I can dump 300 words of stream-of-consciousness in the time it takes to type 50. When someone asks how my day went, I don't give them a bulleted list. I tell a story. That's the interaction model I want with my personal system.
71-
72-
The morning routine is perfect for this. I'm already hands-free—making coffee, getting dressed, walking around. If an AI can ask me three smart questions during that time, I get my daily reflection without changing anything about my routine.
19+
So I'm building VaultAgent, an AI that interviews me every morning and evening and writes the results straight into my Obsidian vault.
7320

74-
### The Design
21+
## The friction problem
7522

76-
VaultAgent is essentially a voice-activated pipeline: my speech goes to OpenAI's Whisper for transcription, then to Claude (via OpenCode) for processing, then to ElevenLabs for natural-sounding responses. The AI has full context on my vault—yesterday's notes, active projects, health tracking—so it can ask relevant questions.
23+
Here's the thing about daily tracking: it works. The data is valuable. Knowing how I slept, what I accomplished, what's blocking me -- that context makes better decisions. But opening a text editor and typing "Today I woke up feeling..." kills me. It's slow and it feels performative.
7724

78-
The conversation design matters more than the tech. The agent asks one question at a time (don't overwhelm), follows up on vague answers ("tell me more about that"), and keeps the whole interaction under 10 minutes. Morning might be: How'd you sleep? What's your main focus today? Any blockers? Evening might be: What'd you get done? Any wins? What's on your mind for tomorrow?
25+
Speaking is different. I can dump 300 words of stream-of-consciousness in the time it takes to type 50. When someone asks how my day went, I don't give them a bulleted list. I tell a story. That's the interaction model I want.
7926

80-
Every response maps to a structured update. "I slept about 7 hours" becomes `sleep: 7` in today's daily note. "Need to work on the ClassCheck prototype" creates a task and links to the project file. The AI isn't just taking notes—it's maintaining the system.
27+
The morning routine is perfect for this. I'm already hands-free -- making coffee, getting dressed, walking around. If an AI asks me three smart questions during that time, I get daily reflection without changing anything about my routine.
8128

82-
### The Architecture
29+
## The architecture
8330

84-
The tech stack is deliberately simple:
31+
The pipeline is deliberately simple:
8532

8633
```
87-
Voice Whisper (STT) Claude via OpenCode ElevenLabs (TTS) Speaker
88-
89-
Obsidian Vault
34+
Voice --> Whisper (STT) --> Claude via OpenCode --> ElevenLabs (TTS) --> Speaker
35+
|
36+
Obsidian Vault
9037
```
9138

92-
**Whisper** handles speech-to-text. The API version is fast and accurate. I could run it locally for free, but that requires GPU and adds latency.
39+
Whisper handles speech-to-text. Claude (via OpenCode) is the brain -- it has full context on my vault, knows yesterday's notes, understands my active projects. ElevenLabs makes responses sound human, which matters more than you'd think. A natural voice turns it into a conversation instead of dictation. And the agent reads and writes directly to the filesystem, so there are no plugins or sync issues -- just markdown files following my existing conventions.
9340

94-
**OpenCode** is the brain. It runs Claude with full vault context—reading existing notes, understanding my projects, knowing what I tracked yesterday. The system prompt defines conversation flows and the format for vault updates.
41+
## Conversation design matters more than tech
9542

96-
**ElevenLabs** makes the responses sound human. There's something about a natural voice that makes the interaction feel like a conversation instead of dictation. Budget is maybe $5-20/month depending on usage.
43+
The agent asks one question at a time. No overwhelming. It follows up on vague answers ("tell me more about that"). It keeps the whole interaction under ten minutes.
9744

98-
**Direct filesystem access** to Obsidian means no plugins or sync issues. The agent reads and writes markdown files following my existing conventions—frontmatter for structured data, wiki-links for connections.
45+
Morning flow might be: How'd you sleep? What's your main focus today? Any blockers? Evening: What'd you get done? Any wins? What's on your mind for tomorrow?
9946

100-
### The Honest Parts
47+
Every response maps to a structured update. "I slept about 7 hours" becomes `sleep: 7` in today's daily note. "Need to work on the ClassCheck prototype" creates a task and links to the project file. The AI isn't just transcribing -- it's maintaining the system.
10148

102-
This is still in planning. I haven't written a line of code yet—just the spec. Here's what I'm uncertain about:
49+
## The honest parts
10350

104-
**Push-to-talk vs wake word vs always listening?** Wake words are annoying ("Hey VaultAgent"), push-to-talk is friction, always listening is creepy. Probably starting with push-to-talk and iterating.
51+
I haven't written a line of code yet. Just the spec. Here's what I'm still uncertain about:
10552

106-
**Will the cost be worth it?** ~$15-30/month for daily STT and TTS. That's not nothing. But if it's the difference between maintaining the system and abandoning it, it's trivial.
53+
**Push-to-talk vs wake word vs always listening.** Wake words are annoying ("Hey VaultAgent"), push-to-talk adds friction, always listening is creepy. Probably starting with push-to-talk and iterating.
10754

108-
**Am I just procrastinating on actually journaling?** Building the tool that helps you do the thing is a classic trap. But the spec is done. Next step is prototype code.
55+
**Will the cost be worth it?** Roughly $15-30/month for daily STT and TTS. That's not nothing. But if it's the difference between maintaining the system and abandoning it, it's trivial.
10956

110-
### Why This Might Work
57+
**Am I just procrastinating on actually journaling?** Building the tool that helps you do the thing is a classic trap. But the spec is done, and I'm starting with the audio pipeline: mic capture, Whisper transcription, ElevenLabs playback. Just prove the voice interaction feels natural, then add Claude and vault operations.
11158

112-
The bet is simple: I'm more likely to answer questions out loud than type answers. If that's true, VaultAgent solves my consistency problem. If it's not, I've wasted a weekend on an interesting Python project.
59+
## Why this might work
11360

114-
The deeper motivation is making the vault actually useful. Right now it's a pile of notes I rarely review. If an AI can synthesize context—"You've been working on ClassCheck for two weeks, you mentioned feeling stuck on WebSocket reconnection, your fast is going well"—the system becomes genuinely intelligent.
61+
The bet is simple: I'm more likely to answer questions out loud than type answers. If that's true, VaultAgent solves my consistency problem. If not, I've wasted a weekend on an interesting Python project.
11562

116-
I'm starting with the audio pipeline this week: mic capture, Whisper transcription, ElevenLabs playback. Just prove the voice interaction feels natural. Then I'll add Claude and vault operations. The goal is a working morning flow within a few sessions.
63+
The deeper motivation is making the vault actually useful. Right now it's a pile of notes I rarely review. If an AI can synthesize context across my projects -- what I've been working on, where I'm stuck, how my habits are going -- the system becomes genuinely intelligent instead of just a filing cabinet.
11764

118-
If you're curious about voice-first interfaces or have thoughts on the architecture, I'd love to hear about it. This feels like the kind of project that gets better with outside perspective.
65+
This is a personal tool, not one of the [five MVPs I'm building in parallel](/articles/2026-02/building-five-mvps-with-ai). But it feeds into everything else -- a working VaultAgent means better daily tracking, which means better decision-making across all five products.

0 commit comments

Comments
 (0)