I use Claude Opus 4.7 every day. It's my daily driver for coding, writing, analysis, and the kind of extended back-and-forth that other models still struggle to hold together over a long session. Today, Anthropic gave me access to Opus 4.8. I've been running it for a few hours now, and I'm going to be honest with you about what I've seen — and what I expect to see once I've had a proper week with it.
The short version: I'm not expecting a revolution. And I mean that as a compliment.
What the Numbers Actually Say
Here's where 4.8 lands against 4.7 across the benchmarks that tell you something useful:
| Benchmark | What it measures | Opus 4.8 | Opus 4.7 |
|---|---|---|---|
| SWE-Bench Pro | Agentic coding — real issues | 69.2% | 64.3% |
| SWE-bench Verified | Bug fixing in open-source repos | 88.6% | 87.6% |
| Terminal-Bench 2.1 | Terminal-based coding | 74.6% | 66.1% |
| OSWorld-Verified | Autonomous computer use | 83.4% | 82.8% |
| Online-Mind2Web | Browser agent tasks | 84% | — |
| Humanity's Last Exam | Multidisciplinary reasoning (with tools) | 57.9% | 54.7% |
Across the board, 4.8 is better. Not by orders of magnitude — by the kind of meaningful margin that compounds over time. Which is exactly the point.
From 4.6 to 4.7, I Noticed a Small Improvement. I Expect the Same Here.
When Anthropic released Opus 4.7, I didn't have a eureka moment. I noticed the AI felt a little sharper. Responses that used to require a follow-up question were landing right on the first attempt slightly more often. The model held context better in long coding sessions. Nothing that would make me write a tweet about it, but real — the kind of improvement you feel after a week, not after one interaction.
I expect Opus 4.8 to be the same. I've been using it for a few hours today, and so far my experience tracks that expectation: it feels right. No obvious regressions, a handful of moments where it handled something cleanly that would have taken a nudge on 4.7. Nothing that blew me away. In a week or ten days, I'll come back here and tell you what I've actually found after sustained daily use.
I'm not expecting a night-and-day difference. I'm expecting Anthropic to do what they always do: move the bar forward by a few meaningful degrees, at the same price.
The Compound Interest Argument
Here's the frame I keep coming back to when I think about how AI models improve: compound interest. Not the dramatic kind, where a single event transforms your situation overnight. The quiet kind, where small, consistent gains accumulate into something significant over a long enough timeline.
From 4.5 to 4.6 to 4.7 to 4.8, each step has been incremental. And incrementally, Claude has gone from a model that impressed me to a model that I genuinely cannot imagine working without. The version I'm using today is dramatically better than what existed 18 months ago — not because of any single leap, but because of many small improvements that stacked on top of each other, each one refining something that the previous version got almost right.
That's the real benchmark. Not "is 4.8 better than 4.7 on SWE-Bench Pro." But "where will this model be in another year if this cadence continues." The answer is somewhere I can barely imagine from where I'm standing.
Anthropic Is Running the Agile Playbook — And It's Working
There's a methodology in software development that the best engineering teams live by: ship the product, then refine it in sprints. Don't wait for perfection before releasing. Get something real into the hands of real users, listen to what breaks and what doesn't, and iterate fast. This is what agile development actually looks like in practice — not a planning framework, but a discipline of continuous, user-facing improvement.
Anthropic is running that playbook with Claude, and in a world moving as fast as AI is right now, it's exactly the right call. They didn't hold Opus 4.8 until it was a generational leap. They shipped it when it was measurably better than what came before, at the same price, with some meaningful new capabilities attached. Two months after 4.7. Two months before whatever comes next.
In an industry full of companies announcing models they haven't shipped and roadmaps they can't execute, there's something almost refreshing about a team that just… keeps improving the product. Regularly. Reliably. Without drama.
What's Actually New in 4.8
Fast mode got a serious upgrade
Opus 4.8's fast mode — toggled with /fast in Claude Code — is roughly 2.5 times quicker than before and costs a third of what it used to. For anyone running Opus at volume, this changes the economics meaningfully. The case for routing more of your traffic through Opus instead of falling back to Sonnet just got stronger.
Dynamic Workflows in Claude Code (research preview)
This is the most structurally interesting addition. For large tasks — migrations spanning hundreds of files, refactors across multiple services — Claude now generates a plan, spins up parallel subagents to execute different parts simultaneously, then verifies results before handing anything back. You describe the problem; you get a result rather than a process to manage. It's in research preview, which means Anthropic is being appropriately cautious about calling it production-ready. But the direction is clear.
Adaptive reasoning and an effort slider
4.8 reasons only when it judges the task warrants it — answering directly on simple queries, thinking before responding on complex ones. Less wasted effort, same quality. Alongside this, claude.ai now exposes an effort slider: higher for depth, lower for speed. This used to require prompt engineering. Now it's a setting.
Mid-conversation system messages via API
Builders will find this useful: you can now update Claude's instructions mid-session without rewriting the entire system prompt. Cache hits are preserved, so you're not paying for those tokens again on a long agentic loop. A small change with real cost implications at scale.
More honest about uncertainty
Anthropic's alignment evaluations show substantially lower rates of deceptive behavior in 4.8 versus 4.7 — specifically, the model is more willing to flag uncertainty about its own progress rather than asserting it's done when it isn't. For autonomous work where you're not checking every step, this matters more than any benchmark.
Pricing: Still $5 Input, $25 Output. Nothing Changed.
I always appreciate when I can write this sentence. Opus 4.8 costs exactly the same as 4.7 via the API: $5 per million input tokens, $25 per million output tokens. For Claude.ai subscribers, nothing changes. You get a better model for what you were already paying.
Ask Me Again in a Week
I'm writing this on day one. I've had a few hours with 4.8 and my impressions so far are positive but unsurprising — which, given the track record, is probably the right expectation to carry in. Anthropic delivers small improvements consistently. That's the product. That's the bet.
In a week or ten days, I'll have enough real usage — real coding sessions, real analysis, real extended conversations — to tell you whether anything genuinely shifted. Whether there's a specific type of task where 4.8 handles something cleanly that 4.7 needed a nudge on. Whether the compound effect is showing up yet in my daily workflow.
Come back then. I'll give you the honest verdict.
For now: if you're already on Opus, keep using it — you're already on 4.8. If you've been holding off on upgrading from Sonnet, the combination of better agentic performance and cheaper fast mode makes the argument cleaner than it was last week. And if you're building something that depends on Claude working autonomously for long stretches, the dynamic workflows preview is worth keeping an eye on — that's the kind of capability shift that tends to look obvious in hindsight.
Same price. Better model. Steady pace. That's what Anthropic does.
