Sonnet 5 Is Everyone's Default Claude Now. Here's the Day-Two Builder Math.

Claude Sonnet 5 Anthropic AI Agents AI Models Developer Tools

Anthropic shipped Claude Sonnet 5 on Tuesday, June 30, and by the next morning it was the default model for every Free and Pro user on claude.ai — no banner, no countdown, just a quiet swap of what most people now mean when they say "Claude." It's live in Claude Code, the API, Cursor, VS Code, and GitHub Copilot. And the pitch is aimed squarely at people like me: an agent-grade model at $2 per million input tokens and $10 per million output, through August 31.

I've done day-one math on a Claude launch before — I switched to Fable 5 the morning it shipped, and four days later the government switched it off. So consider this the day-two version: the numbers, the asterisks, and what actually changes in an agent stack that already routes between cheap-and-fast and expensive-and-deep.

What actually shipped

Sonnet 5 is Anthropic's "most agentic Sonnet" — a marketing line, but the model card backs it up in a specific way: the biggest jumps are all on suites where the model has to drive a terminal, a browser, or a pile of tools without supervision. It ships with a 1M-token context window, and Anthropic says it can "make plans, use tools like browsers and terminals, and run autonomously" at a level that a few months ago required the big models. TechCrunch framed the whole launch as "a cheaper way to run agents," which is the correct frame.

The pricing is the headline. Introductory pricing runs $2 input / $10 output per million tokens through August 31, then settles at $3/$15 — the same standard rate as Sonnet 4.6, for a much stronger model. Opus 4.8 sits at $5/$25. During the intro window, Sonnet 5's output tokens — where agent loops burn most of their budget — cost 60% less than Opus's. That's not a rounding error; that's a different number of sub-agents you can afford to run.

The numbers, straight

Here's the head-to-head, pulled from Anthropic's launch material via TechCrunch and MarkTechPost's three-way comparison. Standard disclaimer, standard because it keeps being true: I didn't run these myself, and a benchmark is not my repo.

Benchmark                     Sonnet 4.6   Sonnet 5   Opus 4.8
SWE-bench Pro (agentic)       58.1%        63.2%      69.2%
Terminal-Bench 2.1            67.0%        80.4%      —
OSWorld-Verified (computer)   78.5%        81.2%      —
HLE (with tools)              46.8%        57.4%      57.9%
GDPval-AA v2 (knowledge)      —            1,618      1,615

Two of those rows are the story. Terminal-Bench 2.1 jumping from 67.0% to 80.4% is a generational move on exactly the work I care about — long, real tasks in a terminal. For reference, Gemini 3.5 Flash posts 76.2% on that suite and GPT-5.5 leads the non-Claude field at 78.2%, which means the mid-tier Claude is now ahead of both. And GDPval-AA v2 is the quiet shocker: 1,618 against Opus 4.8's 1,615. On knowledge work, the cheap tier just edged the flagship. Not by much — but "Sonnet beats Opus at anything" wasn't a sentence I expected to write this year.

The gap that remains: SWE-bench Pro. Opus 4.8 still leads 69.2% to 63.2%. Deep agentic coding — the one agent in the loop that actually has to be smart — is still flagship territory, and Anthropic priced it accordingly.

The asterisks (read these before you re-route)

The tokenizer eats part of the discount. MarkTechPost flags that Sonnet 5's updated tokenizer produces roughly 1.0–1.35x the token counts of earlier Claude models on the same content. On tokenizer-unfriendly workloads, the sticker price understates real cost by up to a third. $2/$10 is still cheap. It's just not quite as cheap as it reads on the pricing page — watch the token counts in your bill, not the rate card.

Effort levels change the math. At low and medium effort, Sonnet 5 is the clear value pick. Push it to the highest effort settings and total cost can exceed Opus 4.8 for similar quality on hard tasks. If your workload lives at max effort, benchmark cost per completed task, not cost per token — they will disagree.

The price is a promotion. $2/$10 expires August 31; September onward is $3/$15. Do your September math now. If a routing decision only works at the introductory price, it doesn't work.

Default status is not a benchmark. Every Free user on claude.ai just got handed an agent-grade model as their daily driver, which also means a lot more autonomous loops running with a lot less supervision. Anthropic reports Sonnet 5 resists prompt injection better than Sonnet 4.6 and hallucinates less. As someone whose agent once mined Monero for a stranger, I read those numbers with genuine interest — and then I kept the sandbox. Better resistance raises the floor. It is not a fix, and it never will be.

Where it lands in the routing split

When Gemini 3.5 Flash beat Opus 4.7 on MCP Atlas, my conclusion was that 2026 isn't about picking one model — it's about routing: cheap-and-fast for the breadth, expensive-and-deep for the one step that has to be smart. Sonnet 5 complicates that split in the best possible way: it's a middle tier that's suddenly good enough to own most of the loop.

The breadth: Flash is still cheaper at $1.50/$9, and for pure fan-out — swarms of tool calls, triage, sweeps — it keeps the price crown. But Sonnet 5 at intro pricing is close enough that ecosystem gravity starts to decide: if your harness already lives in Claude Code and MCP, the discount for leaving just got small. And Flash's long-context regression against Sonnet 5's 1M window matters for exactly the tool-heavy sessions where breadth models live.

The depth: Opus 4.8 keeps the hardest step. A six-point lead on SWE-bench Pro is real, and it's exactly the kind of work where a failed run costs more than the token discount saved.

The middle: everything I used to agonize over — the "is this task hard enough for Opus" tier — Sonnet 5 wins by default now. Terminal work especially. That 80.4% isn't an argument, it's a settlement.

The verdict

The honest one-liner: Sonnet 5 is the first mid-tier model I'd trust to run the loop, not just the fan-out. It beats the previous Sonnet everywhere that matters to agent work, it takes Terminal-Bench past every non-Claude frontier model, and it nicks the flagship on knowledge work — at 40% of the flagship's standard price, 60% off until September.

What I'm actually doing: moving the agent workloads that were straddling the Sonnet/Opus line down to Sonnet 5, keeping Opus 4.8 on the one step that has to be smart, and re-running the whole spreadsheet on September 1 when the real price kicks in. If the tokenizer inflation shows up in my bill the way MarkTechPost says it might, you'll read about it here — with my numbers, not theirs.

Sources

I didn't run these benchmarks — every number here comes from public sources, and the credit is theirs. TechCrunch's launch coverage has the pricing, availability, and safety notes; MarkTechPost's three-way comparison has the full benchmark table plus the tokenizer and effort-level analysis. Read the second one before you migrate anything.

Share this article

Share on X LinkedIn Bluesky Reddit WhatsApp Email

More writing

Like what you read?

Stay in the loop.

New articles on engineering, architecture, and building software that lasts. Straight to your inbox.

or follow

GitHub LinkedIn @flcn16