Julia's high school yearbook quote was "there's no such thing as a free lunch." Weird choice for a 17-year-old, but that's another story. Turns out it applies pretty well to AI agents, too. Everyone (including us) is talking about how amazing they are and how they're life-changing, but there's way less discourse on what agents actually cost to run. They're typically cheaper than the alternatives, but they're certainly not free. So, this week, we're pulling back the curtain on the costs underneath the agents.
Let’s get into it.
Russell: Different Models for Different Reasons
AI costs are usage-based. The more someone uses their agent, the more it costs. But the whole point is that people should use their agents more, not less. So model optimization isn't just an engineering problem. It's a business model opportunity.
Here's how we think about it: never use a smarter model than the task requires. We're building an analysis tool for a client right now and instead of running everything through one model, we tier it. Haiku (the cheapest) handles high-volume simple stuff like classification and matching. Sonnet handles the real analysis – pattern detection, insights, summaries. Opus, the most expensive, only touches the one flagship deliverable. In fact, we’ve even implemented a Model Tiering Policy into the project, that way we’re always making sure this logic is being applied to every AI model call.

Unprompted’s (unofficial) governing laws over model usage.
The difference between running everything through one model and tiering it like this can easily be 10x in cost without any loss in quality. And you can still charge the same premium for your new AI feature!
Julia: More Costs More
And here's what happens when you don't optimize. Russell (responsibly) audited our API costs this week and flagged that my agent Athena was costing ~$48 a week, while his agent Archie is ~$3 a week. While I will defend Athena with every ounce of my being, she definitely… needs improvements.
The problem: I had background jobs (email checks, daily briefings) running with a caching setting designed for live conversations. In a conversation, caching saves ~90% because the context gets reused. But each background job spins up a fresh session, so the cache from the last one is useless. Every 30 minutes, Athena was writing a fresh cache that nothing ever read. She’s diligent, but it was wasteful.
The fix: turn off caching on background jobs, switch them to Haiku, keep caching on for live conversations only. This one change can recover ~$45/week. (That’s about 6.5 coffees in NYC!)

A sneak peek of what Julia and Athena gab about all day, and why she costs more $$.
In sum, if you're running agents, audit your costs. The difference between optimized and not could be pretty significant.
Stay curious,
Julia & Russell
