Skip to content

Latest commit

 

History

History
110 lines (59 loc) · 11.8 KB

File metadata and controls

110 lines (59 loc) · 11.8 KB

How I Use AI

An Illustrated Tour of My Agent Setup

This Substack isn't meant to be a how-to or productivity blog. I've said that before, and I mean it. That being said, a lot of people have asked me to illustrate how I actually use AI agents day-to-day, and have been somewhat surprised by how extensive my setup is. So here it is.

I push it to the max—not because I'm a zealot, but because I want to fully understand what the capabilities and limits are. I also don't think humans should do what is easily automated. But there's a more important reason: it prevents what I'll call the NYU Professor-Uber Problem.

You may recall the NYU professor who publicly criticized Uber without ever having used the service. Kudos to him for admitting it, though whether it was a mea culpa or a somewhat snooty statement about preferring the subway is debatable. Still, you can't really know how disruptive a technology is until you try it.

For example, a lot of people are surprised when I tell them that Waymos are both more expensive and take longer to hail than an Uber in San Francisco. The reason? Well, tourists for one—but also some people refuse to take Ubers anymore. People who have gotten into car accidents. Women who have had bad experiences with drivers. People who are just very nervous and uncomfortable with small talk. There's an archetype that you wouldn't have rationally sketched out on paper—a market that is opened by a new technology. You wouldn't know that unless you used Waymo regularly.

The same is true for AI agents. You wouldn't understand what's possible, what's infuriating, and what's quietly transformative unless you were actually using them.

We Are Unquestionably Here

Intelligent agents are considered by many to be the ultimate goal of AI. The classic textbook by Stuart Russell and Peter Norvig, Artificial Intelligence: A Modern Approach (Prentice Hall, 1995), defines the field of artificial intelligence research as "the study and design of rational agents."

We are unquestionably here. We may not have AGI, but we definitely have AI—which is a bit funny to say, but I make the distinction because so many things we now call "AI," including aspects of classical machine learning, have been around for a while. What's new is that agents can now take actions in the real world, not just generate text in a chatbox. As I wrote in "The Boring Phase of AI," this shift from chatting to doing is both less flashy and ultimately much more important.

The Foundation: Context and Iteration

Before I get into the specific things I use AI for, there's a core principle that underpins everything: I iteratively improve both the prompts AND the context, and I do it semi-automatically.

What that means in practice: I ask the AI agent to evaluate the diff between its outputs and what I actually ended up using, and then add more documentation based on those differences—similar to how you'd refine a codebase over time. Most of this context lives in DEVONthink for me, because an MCP (Model Context Protocol—essentially a way for AI to talk to other applications) connector exists for it, I like the platform, and it's easier to back up, protect, and sync across devices than just raw files.

Is it perfect? No. Sometimes it overwrites past guidance. But if I'm careful about reading the changes to at least ensure it isn't overwriting important things—even if some of the new guidance doesn't make as much sense to me as a human—I'm ok with it and continue to improve it. Everything is iterative and builds on what came before. That's a huge portion of ultimately making it useful.

For example, even less structured context helps enormously. I have all of my Substack pieces stored in DEVONthink, categorized and tagged. For a talk, I can simply provide some bullets on what I want to cover, ask Claude to pull my relevant published pieces, and have it write up notes for me. This is like having an assistant—it's obviously still me giving the talk. I gave the bullets. I wrote the original Substack pieces. I ultimately decide whether to use the sketch or not. But I get leverage and some time savings. That only happens from organization and context. It's essentially using RAG (retrieval-augmented generation), but hyper-specialized for my own material.

The Morning Briefing

Every morning, a cron job (well, launchd—the Mac-native equivalent, because of auth token handling) runs and dumps a daily briefing into my Day One journal and DEVONthink.

[IMAGE: morning_briefing] My morning briefing, generated automatically and deposited into Day One.

[IMAGE: briefing_capture_structure] The folder structure for the briefing automation.

(Why Day One and DEVONthink instead of just email? Because for some reason, the default Gmail MCP connector can only draft emails—it can't send them to me. I understand why, given prompt injection risks, which I'll get to later. But it somewhat defeats its utility as a notification mechanism.)

Writing: Research and Junior Drafting

This is the use case I get asked about the most. I've written about using AI as a writing aid before in "AI as a Partner, Not a Replacement," where I was still figuring things out. I credit some inspiration from @Alejandro Piad Morffís, though having done it myself and helped others set it up, it's really quite personalized. And it requires multiple iterations and lots of context and examples to be any good.

[IMAGE: writing_claude_md] The CLAUDE.md file—a comprehensive set of writing guidelines that the AI references for every draft.

[IMAGE: writing_claude_md_james] The guidelines include specifics about my voice, tone, common phrases, and what to avoid.

Here's how it works: I still need to write the outline of what I want, with the broad notes to hit. I still need to go through and substantially edit—asking for different charts and data, checking math (which is often wrong), probing claims, and changing a decent amount of the text for my own tone. It's more akin to an intern helping do a first draft than a ghostwriter.

Is it useful? Yeah, broadly—it helps get stuff on paper. It helps do research, even if I have to check it. There's no real question as to who is the author in any meaningful sense, though. Still, it actually saves me time and mental energy versus before when it was totally useless.

Really, though, it's augmented by a lot of the research snippets I keep in DEVONthink. I have my own "canon" that I direct it to draw from instead of random internet data. I also still form the basis of the piece through my initial outline.

It also seems to think a critical part of my writing is to say "Let's be clear" and "The thing is" a lot. I mean, it's obviously a tic, ok? But I would argue it isn't a core part of my writing—I assume, anyway, it isn't the reason you're reading this.

(I plan to make a GitHub repo showing the process and all of the drafts so people can click through it and see that it's mainly helpful in the one step of taking scattered notes and organizing them into a coherent flow—but not necessarily a huge amount more.)

Meeting Notes

I use a Sony ICD-UX570 recorder for meetings. I have detailed instructions in a directory for Claude Code to take the DeepGram transcript, ask me who each speaker is (with context on what is said), classify the notes, and auto-generate a summary in both Markdown (for me) and PDF (to share with others).

Knowledge Management and Web Capture

DEVONthink, OmniFocus, and Day One all connect to my AI setup via MCP passthrough. (I use Google OAuth to make some of these available on Claude's web interface as well, along with my own other MCPs.)

One of the more niche things I've built is a DEVONthink web capture workflow using a tmux-based Claude Code session that I can connect to with Blink.sh at any time—even from my phone. It goes to Substack reader, resizes the browser window to 650px to remove the sidebar, and captures the page as a Web Archive in DEVONthink's "To Read" inbox.

[IMAGE: devonthink_capture_tmux] The tmux-based capture session, accessible remotely.

[IMAGE: devonthink_captured] A captured article in DEVONthink.

(Claude just released "remote control," which means most people won't need to do what I'm doing with tmux—though mine still has advantages in control and flexibility. But remote control is probably easier for most people. See: https://code.claude.com/docs/en/remote-control)

Claude Code in folders doesn't have to be about coding, by the way. It can just be structures for automation. Most of what I've described here isn't writing code in any traditional sense—it's setting up context and instructions for the agent to follow.

On Model Choice

Ironically, I use Opus for almost everything, even when it's overkill. The one place I don't is Claude Code—almost all implementation work (note: not planning or my "main" chat, but the actual execution) runs on Sonnet or Haiku.

I think it tells you something that the most "permissive" case for tolerating errors and dumbness is code. Code can be validated against tests, and I'm generally going to review each pull request anyway. That perhaps says something about why AI adoption is happening so quickly in the coding realm.

Security: Don't Be Reckless

I cannot emphasize this enough: set your permissions and isolation before you start, not as you go. The AI will do a lot of things and will quickly overwhelm you with approval requests if you try to gate each one individually. Don't get into that trap. Pre-set what it's allowed to do and use isolation to ensure it only does what you want.

Prompt injection (see @Simon Willison's excellent pieces on this) is a real thing. Generally speaking, I only allow my own MCPs, ones I've personally vetted, or ones that are more broadly trustworthy (e.g., Google Calendar). However, note that you could even have prompt injection issues with something as benign as Google Calendar—someone could put malicious instructions in a calendar invite. In that case, you should be somewhat choosy about what your LLM is allowed to "phone out" for. It's a pretty good idea to forbid external web calls except for what you have explicitly allowed. From my experience, this doesn't hugely impede most use cases.

I should also call out data exfiltration—for example, of your passwords, secret keys, or bank account info—as a key consideration and why this shouldn't be done lightly or without thought. It will be solved by companies inevitably. But some current approaches are essentially YOLO on this point: powerful, but also likely to create disaster if you're not careful.

What Does This Mean?

Block (previously Square) just laid off roughly half their workforce. Citrini Research published a thought piece that essentially crashed various stocks by painting a people-less future as AI takes over (which I disagree with). At the same time, a refrain on social media is "AI adds nothing in productivity—it's been shown in studies."

My last piece directly rebuts the "nothing in productivity" point. Studies don't show that. Even in fairly conservative cases (GitHub Copilot benchmarks from over a year ago), we clearly see productivity gains across different studies. This also fits what I wrote in "The Boring Phase of AI"—AI that does real tasks is both less flashy and likely much more important than the headline-grabbing model releases. That future is already here. We're just getting started.

And I also think it's highly unlikely to cause a dystopian scenario of persistent mass unemployment. (Note: I didn't say it won't cause disruption, especially in the short term.)


Thanks for reading!

I hope you enjoyed this article. If you'd like to learn more about AI's past, present, and future in an easy-to-understand way, I've published a book titled What You Need to Know About AI.

You can order the book on Amazon, Barnes and Noble, Bookshop, or pick up a copy in-person at a local bookstore.