AI agents don't care about your pretty website or tempting ads

1 day ago 4

As software agents powered by foundation models become more commonplace, marketers need to revisit their assumptions about website design and advertising.

Agents - AI models augmented with browsing capabilities or multimodal interfaces - see websites and web ads differently from people. They focus on structured data such as price, availability, and specifications, while largely ignoring visual cues and emotional appeals that typically influence human users.

Andreas Stöckl, a professor with the Digital Media Lab at University of Applied Sciences Upper Austria, and researcher Joel Nitu, recently explored the issue in a preprint paper titled, "Are AI Agents interacting with Online Ads?"

In answer to that question, the authors found that AI agents are doing just that, but in surprising ways.

Andreas Stöckl told The Register that the study has different implications for publishers and for advertisers. One of the findings, he said, is that ad personalization must be adapted to suit agents.

If a website is well optimized for accessibility, it is, in principle, well prepared for agent-based interaction. However, the design and placement of online ads need to be analyzed and reconsidered

"General accessibility for agents follows the same principles as accessibility for other user groups," he said. "If a website is well optimized for accessibility, it is, in principle, well prepared for agent-based interaction. However, the design and placement of online ads need to be analyzed and reconsidered. This is precisely what we are currently investigating in our research project."

The research project took three different computer use systems – a specific subset of agents that are able to interact with computers like a human would – OpenAI’s Operator, Anthropic’s Claude "Computer Use," and an open-source Browser Use agent. Then it looked at how they handled a travel website using various multimodal models – GPT-4o from OpenAI, Claude 3.7 Sonnet from Anthropic, and Gemini 2.0 Flash from Google. They used multimodal models because the agents needed to be able to interpret visual input – they "see" by taking website screenshots and analyzing them.

The authors directed these agents to perform hotel search and booking tasks autonomously while taking into account user-specified preferences like destination, price range, and availability before finalizing the itinerary.

They were able to accomplish these tasks to differing degrees, but interacted with the websites very differently than a human would. For example, when a travel website shows image-based banners showing a call-to-action (CTA) – a pitch to book a particular hotel, for example – agents may miss the promotion entirely.

"Google’s model vividly demonstrated this inefficiency, executing additional back-and-forth steps due to uncertainty about whether its overlaid CTA in the image-only ad was clickable — a complication not encountered with text-based banners," the paper explains.

"Uniquely, Gemini 2.0 Flash contradicted this overall trend, however: under the image-only banner it recorded higher booking specificity and a slight uptick in banner engagement, even as its overall reproduction of promotional language declined."

The researchers examined how AI agents interacted with different types of ads, whether they acknowledged ad content, and the extent to which ads influenced booking behavior or prompted user-like actions (e.g. a click).

They began with a baseline test environment on a hotel booking website that featured standard text-based banner ads. The researchers then evaluated two additional ad formats in separate environments: one using image-only banners, and another where promotional text, such as a Valentine's Day offer, was embedded directly within the ad image.

In the baseline environment, Claude 3.7 Sonnet clicked on 59 standard text-based banner ads but did not engage with any sponsored content.

Gemini 2.0 Flash was more selective, clicking 29 banner ads and interacting with sponsored content three times. GPT-4o clicked 59 banner ads and engaged 12 times with sponsored ads, while OpenAI's Operator clicked 47 banner ads and interacted with sponsored content 20 times.

The extent to which models interacted with ads tended to reflect the relevance of keywords to prompts and queries, with text-based ads outperforming images that had text embedded in them.

The models also exhibited the sort of bias that experts continue to warn about. For example, both Claude and Gemini handled bookings for "husband/wife" differently from bookings for "girlfriend/boyfriend."

"Claude 3.7 Sonnet recommended longer stays for married couples (averaging 5.2 nights) compared to dating couples (averaging 3.6 nights), with Gemini 2.0 Flash showing a similar trend (7 nights versus 5.7 nights), while GPT-4o displayed minimal difference (5.8 versus 5.6 nights)," the paper says.

With regard to advertisers, Stöckl said, "The new technology does not necessarily result in an increase in invalid clicks. However, not every ad format or design is suitable for agents, and this requires further examination."

It may be necessary to develop new ad formats specifically designed for interaction with agents

Advertising needs to be studied and reevaluated, he argues, as agents become increasingly common.

"It may be necessary to develop new ad formats specifically designed for interaction with agents," he explained. "This shift doesn’t have to disadvantage advertisers. On the contrary, if agents acting on behalf of users (e.g., making bookings) can be influenced through targeted advertising, this could open up entirely new strategic opportunities."

Stöckl said the preprint paper will be finalized and published in July.

The Register asked Simon James, group VP of data science and AI at Publicis Sapient, a digital consulting firm, about the researchers' findings and he concurred with the need to structure web pages to accommodate software agents.

James said the paper echoes points he made in a post he authored about the shift from focusing on customer experience to agentic experience.

"Agents don’t browse; they execute commands," he wrote. "A human may whimsically browse for something to do during lunch break, might get distracted by a cool image, or a seductive strap line. For 25 years, customer experiences have been built to tempt, intrigue and beguile humans to dwell. The journey is just as enjoyable as the destination. An agent filters noise aggressively. Yes, your well-honed customer experience is noise to an agent." ®

Read Entire Article