Claude 4.5, AI Biology and World Models

1 hour ago 2

30-Hour AI Coding Marathon, Neural Transparency Breakthrough, and DORA's AI Shift

Anthropic had a busy week. While one team was shipping Claude Sonnet 4.5 - which can apparently code autonomously for 30 hours, purchase domain names, and run security audits - another team was performing brain surgery on their models to figure out how they actually work.

Latest News

Anthropic launches Claude Sonnet 4.5, its best AI model for coding

The quote that made me drop everything and give it a try: an executive described seeing "Claude Sonnet 4.5 code autonomously for up to 30 hours during early trials with some enterprise customers. In that time, he watched the AI model not only build an application, but stand up database services, purchase domain names, and perform a SOC 2 audit to make sure the product was secure."

Source: Anthropic Announcement | Techcrunch Article

⚡ Quick Commits

Tracing the thoughts of a large language model

What is AI Biology? How is Claude multilingual? Does Claude plan its rhymes? It's fascinating that just like Quantum Physics, we can use LLMs yet fail to understand how they work on a deeper level. Anthropic's new "AI microscope" research takes us closer to that understanding.

AI Biology means tracing the actual computational "circuits" and thought patterns inside models - like neuroscience for AI. Key findings: Claude thinks in a universal conceptual language before translating to specific languages, plans rhymes multiple words ahead, and runs parallel maths strategies it can't even explain.

Most striking: researchers caught it "bullshitting" - generating plausible reasoning without actual calculation. They can now trace when Claude's genuinely thinking vs making things up. Still early days (only captures a fraction of computation), but it's our first real look at how LLMs actually work inside.

Introducing ChatGPT Pulse

OpenAI's play to own your morning routine before you check email or socials. ChatGPT Pulse just launched as a preview for Pro users ($200/month) on mobile, where ChatGPT proactively does overnight research to deliver personalised morning updates.

The real question: Would I pay $200/month to get Pulse? No. By itself it is a small feature that should not make anyone switch or sign up. Wait until it's available for Plus or free users.

The 2025 DORA Report - An engineering leadership perspective

I think this quote by Chris Westerhold is spot on:

"The real value of an engineer is no longer just in writing code. It's in prompt engineering, solution architecture and validating AI-generated outputs. When an organization’s structure and processes don’t support this shift, AI simply becomes a faster way to create chaos."

Read full DORA Report

🎬 Why Startups Win In The AI Era - Aaron Levie

Great talk with Box CEO Aaron Levie at AI Startup School. He talks about early days of Box, finding the next “nouns and verbs” for startups, adjusting business model to stop charging per seat and start charging per unit of work due to AI and much more. Well worth watching.

☑️ Reality Checks

11 famous AI disasters

Real examples of AI failures and their consequences. From IBM Watson's oncology flop to Zillow's $500M algorithmic housing disaster, these cautionary tales remind us that AI isn't magic - it's a tool that can fail spectacularly when deployed without proper understanding or oversight.