This is a post from Robin Sloan’s lab blog & notebook. You can visit the blog’s homepage, or learn more about me.
October 17, 2025Thinking about the capabilities of multimodal AI models, I am currently
-
uninterested in “model as agent”,
-
sour on “model as media generator” (though I do seem to keep experimenting with this, so, perhaps the blogger doth protest too much), and
-
totally bullish on “model as universal perceptor”: to which you can hand all sorts of media and ask questions about it, even if that media is messy, organic, ambiguous, etc.
This is a really flexible and valuable capability, and from here on out it will be an assumed feature of computer systems — an essential tool in the toolbox.
Valuable enough to merit the yottabuck investments currently flying … ? Probably not. Who cares! We’ll carry our universal perceptors out of the wreckage, into the future.
P.S. I believe Gemini 2.5 Flash is presently the overall best universal perceptor, in terms of the whole package: capability/speed/price. In the spirit of Jack Clark’s “capability overhang”, if all AI work was halted today, and all other models destroyed, the world could still very usefully put Gemini 2.5 Flash to work for many years to come.
To the blog home page.png)


