The Old Rules Are Dead

2 days ago 1

Whether it’s the new masters of the internet like OpenAI and Anthropic, or the old masters like Google or Microsoft operating under a new playbook — the internet doesn’t work like it used to. That’s not inherently bad, but let’s not forget the second part of “move fast and break things” involves breaking things. It’s a five-word phrase so you’d think remembering all five should be very doable… but that’s where we’re at. We should consider that in 2025, the “things” that get broken are a lot more critical to people’s lives than in the old days of the internet.

In the latest edition of this, here’s a reminder that prompts aren’t as private as you think they are!

Juliana Jackson, author of the excellent Beyond the Mean Substack, noticed some weird searches showing up in her Google Search Console. Things that had nothing to do with her site and seemed like ChatGPT prompts… Check it out:

This is my more technical write-up, check out Juliana’s article for more on *why* this is a problem.

There’s about 30 different prompts so far, including this epically bizarre brain dump that I feel a bit guilty about sharing but it’s necessary to show what kind of stuff can get leaked:

Wew. But what the heck? How is ChatGPT leaking into Julian’s Google Search Console?

With the help of fellow internet sleuth Slobodan Manić we figured it out.

It’s because her Substack ranks well in Google Search for the search:
https://openai.com/index/chatgpt/

Largely due to an article she wrote a few months ago about ChatGPT chats being indexed in Google. She doesn’t use that URL in the article verbatim, but you can see how Google’s tokenization turns that search into openai + index + chatgpt, which is what this article: https://julianajackson.substack.com/p/chatgpt-indexed-conversations is all about! Oh, the irony…

Don’t get confused though, this is a new & completely different ChatGPT screw-up than having Google index stuff we don’t want them to. Weirder, if not as serious.

It’s been known for a few months now that OpenAI directly scrapes Google Search. This article is more proof of that questionable behavior. In fact as far as we’re aware this is the first definitive proof that OpenAI directly scrapes Google Search with actual user prompts. Not just that OpenAI is scraping SERPs in general to acquire data, but that user prompts are getting sent to Google Search.

Why does OpenAI scrape Google? Google does have APIs and licensing agreements, for example they license search data to Kagi for its search, and I’m going to guess that with a $500B valuation and 700M plus users OpenAI has a bit more cash on hand than tiny startup Kagi with its 59,052 paying members. Did Google not want to allow access to OpenAI for competitive reasons? Quite possibly, but whatever the reason OpenAI said “screw it, let’s just scrape them”.

The choice to scrape instead of using a private API means prompts from OpenAI that use Google Search show up in users’ Search Console data. Did OpenAI go so fast that they didn’t consider the privacy implications of this, or did they just not care?

The ChatGPT prompt box on that particular page: https://openai.com/index/chatgpt/

has a bug in it which causes the URL of that page to be added to the prompt. So whatever you put in there gets the URL prepended to your prompt. E.g.:

So whatever you say gets that “https://openai.com/index/chatgpt/” text added to the front of it.

Normally ChatGPT 5 will choose to do a web search whenever it thinks it needs to, and is more likely to do that with an esoteric or recency-requiring search. But this bugged prompt box also contains the query parameter “hints=search” to cause it to basically always do a search:
https://chatgpt.com/?hints=search&openaicom_referred=true&model=gpt-5

But we know it MUST have used Google Search for those searches to show in Juliana’s GSC logs, and doubly we know it must have scraped those rather than using an API or some kind of private connection — because those other options don’t show inside GSC. Meaning that OpenAI is sharing any prompt that requires a Google Search with both Google and whoever is doing their scraping. And then also with whoever’s site shows up in the search results! Yikes.

To be clear, this data leakage happens with ALL ChatGPT prompts that use Google Search… It’s just this particular odd set of circumstances that shows the leak in action.

Ok, to recap, here’s the steps required for this wackiness to ensue:

Step 1: Multiple posts on Juliana’s site that include the right content to show up for a Google Search on [ https://openai.com/index/chatgpt/ ]

Step 2: A ChatGPT prompt that calls Google Search and returns Juliana’s site. In addition to the buggy prompt box that Slobodan discovered, there’s other ways this could happen too. I originally thought this was probably from users putting their ChatGPT prompts into the Chrome address bar. While you should never underestimate users’ ability to break whatever UI you put in front of them, that’s probably not what’s happening.

Step 3: This prompt shows up in Google Search Console as a search impression! 

If this kind of leakage can happen accidentally, it’s worrying to think about cases with people actively trying to exfiltrate user prompts!

Especially with the latest release of OpenAI’s Atlas browser, we have to remember that user privacy and “playing by the rules” are not a part of big AI’s playbook. Whether its chats getting indexed by Google, LLM bots scraping anything not tied down, web searches being leaked into GSC, or your entire ChatGPT history being stored indefinitely (but only temporarily, if that makes any sense at all!) — this is a new set of norms and standards being worked out real-time in some of the most chaotic conditions possible. 

Read Entire Article