Detecting LLM‑Generated 404s

3 months ago 1

I tried this after repeatedly landing on nonexistent pages while using ChatGPT. I thought: is bugsink.com on the receiving end of this too? So I asked ChatGPT how to detect its own hallucinations.

The Proof of Concept

Let’s start with the result, in 3 pictures.

The Setup

First, we ask ChatGPT to generate a bogus URL and click it.

Ask ChatGPT to generate a bogus URL, then try to visit it

Ask ChatGPT to generate a bogus URL, then try to visit it.

Confirmation

ChatGPT will ask for confirmation before proceeding (I’ve seen this box for links that were not specifically generated to be bogus too, but I suppose it’s some kind of indication that ChatGPT “knows” what’s up:

ChatGPT's interface asks for a final confirmation

ChatGPT's interface asks for a final confirmation.

The Result

Finally, we land on a 404 page that detects the source to be an LLM, and displays a custom message.

 a custom 404 page that detects LLM-generated URLs

The result: a custom 404 page that detects LLM-generated URLs

Code

Example in Django; put this through your favorite LLM to convert to your framework of choice.

from django.shortcuts import render from django.views.decorators.csrf import requires_csrf_token from urllib.parse import urlparse AI_DOMAINS = ( "chat.openai.com", "chatgpt.com", "perplexity.ai", "gemini.google.com", "bard.google.com", "copilot.microsoft.com", "claude.ai", "mistral.ai", ) LLM_UTMS = ( "chatgpt.com", "perplexity.ai", "gemini.google.com", "bard.google.com", "copilot.microsoft.com", "claude.ai", "mistral.ai", ) @requires_csrf_token def page_not_found(request, exception, template_name="404.html"): is_llm_referral = False utm = request.GET.get("utm_source", "").lower() if utm in LLM_UTMS: is_llm_referral = True ref = request.META.get("HTTP_REFERER", "") if ref: try: host = urlparse(ref).netloc.lower() if any(host.endswith(d) for d in AI_DOMAINS): is_llm_referral = True except: pass return render(request, template_name, { "is_llm_referral": is_llm_referral, }, status=404)

Try it yourself

https://bugsink.com/clearly-made-up-url-by-chatgpt?utm_source=chatgpt.com

The above link is “cheating”, of course, since it has the utm_source parameter set to chatgpt.com. Better do as per the screenshots above (ask ChatGPT to generate a bogus URL, then try to visit it).

Article write‑up also ChatGPT‑assisted for maximum irony. Scrubbed for bogus, which was not limited to URLs.

Read Entire Article