Ask HN: Do coding assistants "see" attached images?

1 hour ago 2

I've been using Cursor and I'm genuinely curious about something.

When you paste a screenshot of a broken UI and it immediately spots the misaligned div or padding issue—is it actually doing visual analysis, or just pattern-matching against common UI bugs from training data?

The speed feels almost too fast for real vision processing. And it seems to understand spatial relationships and layout in a way that feels different from just describing an image.

Are these tools using standard vision models or is there preprocessing? How much comes from the image vs. surrounding code context?

Anyone know the technical details of what's actually happening under the hood?

Read Entire Article