22nd May 2025
I’m at Anthropic’s Code with Claude event, where they are launching Claude 4. I’ll be live blogging the keynote here.
09:29 The keynote is just about to start. The schedule for the day is available on this site. Mike Krieger is on stage.
09:33 Dario Amodei is on. "I'm not one to hype things up" he says... and announces that "as of exactly this moment" they are releasing Claude 4 Opus and Claude 4 Sonnet.
09:34 "We haven't had an Opus model in a while, so as a reminder Opus is the most capable model and Sonnet is the good balance between intelligence and efficiency".
09:36 Claude 4 Opus has state of the art of SWE-bench, but "the benchmarks don't fully do justice to it". Anthropic's most senior engineers have been surprised at how much more productive it has made them. Claude 3 Sonnet is "a strict improvement on 3.7" at the same cost.
09:38 The WiFi at the event is feeling a bit shaky.
09:39 Mike is back, talking about Opus 4 specifically for advanced code. Sonnet 4 is good at code too and has great performance. Everything should be live right now in the Claude apps, Claude Code and the API.
09:40 "I know the term agents get thrown around a lot recently" - Mike has a joke about how long you can spend in a meeting at Anthropic without the word coming up, current record is 17 minutes.
09:41 When founding Instagram Mike's team had to make a bunch of very tough prioritization systems - video v.s. core platform, app improvements v.s. focusing on the apps. With AI agents, startups can follow more streams at once.
09:43 Mike was blown away by GitHub Copilot way back in 2021. "I got an even stronger feeling last summer when we launched Artifacts".
09:43 (Oops, I forgot to enable polling for the live blog - I've turned that on now.)
09:44 Mike hasn't actually defined agents. I don't think he's going to.
09:46 New code execution tool! They're finally running code on their own servers - previously their coding tool ran JavaScript in the browser, this brings it inline with ChatGPT Code Interpreter.
09:46 The new models can run autonomously for hours - they've seen Opus 4 run for seven hours without using its thread. I wonder if the context length is longer?