So, AI Made You a Superman?

1 month ago 7

Press enter or click to view image in full size

Now you can complete an entire article in seconds with AI, a task that would have taken hours of research just a few years ago. This experience can almost convince you that you possess superpowers.

I will start with the story of an attorney, Steven Schwartz, who used this superpower. His mistake is a stark demonstration of the underlying hazards in this nascent era of Artificial General Intelligence (AGI).

Finally, I will briefly discuss how we can take control of this Pandora’s Box of AI.

This article was partially translated and proofread with the help of Gemini and ChatGPT. The ideas are my own, and I take full responsibility.

The Attorney Who Used ChatGPT

On this day, while drafting a legal brief, attorney Steven Schwartz had a conversation with ChatGPT:

Steven Schwartz: Is varghese a real case?
ChatGPT: Yes.
Steven Schwartz: What is your source?
ChatGPT: I apologize for the confusion earlier… (ChatGPT offered a legal citation)
Steven Schwartz: Are the other cases you provided fake
ChatGPT: No, the other cases I provided are real and can be found in reputable legal databases.

(This conversation is from an NY times article)

Unfortunately, even after Steven Schwartz sought secondary confirmation from ChatGPT, the tool failed to detect its error. It even insisted that the cases “can be found in reputable legal databases.”

The outcome was severe, as you might expect. For submitting a legal brief that included citations of non-existent court cases, attorney Steven Schwartz was subsequently sanctioned by a New York federal judge and fined $5,000 (source).

As a representative of the highly educated community, a lawyer with thirty years of experience still committed such a fundamental error. While this could be considered a simple oversight, it more likely foreshadows a major societal challenge. Moving forward, I want to use this case study to discuss the AI Superpower Illusion.

The Superpower of AI

First, let’s intuitively grasp the deceptiveness of seemingly genuine fake case briefs. Note the example below is fictional and deliberately generated by me solely to demonstrate its misleading nature. Do not treat it as real.

Varghese v. Metro Air Lines, Inc., 984 F.3d 112 (2d Cir. 2019)

Holding: The Second Circuit held that the airline’s failure to disclose significant scheduling changes constituted a material misrepresentation under Section 10(b) of the Securities Exchange Act and Rule 10b-5, reversing the district court’s grant of summary judgment.

Reasoning: Writing for the panel, Judge Carraway emphasized that “silence can be actionable when set against a backdrop of half-truths,” quoting with approval from Hinton v. Barclay Corp., 742 F.2d 56, 61 (2d Cir. 1984), and distinguishing the case from Klein v. Delta Holdings, 913 F.2d 112 (2d Cir. 1990), which involved different disclosure obligations.

Disposition: Reversed and remanded.

If you don’t cross-reference a database, it’s nearly impossible to distinguish this from a real case based purely on the format, tone, and professional citation style. Researchers call this phenomenon hallucination: the model generates content that is plausible but factually incorrect.

Why does this happen? The core technology behind AI, the Large Language Model (LLM), doesn’t answer your questions by looking up facts; it does so by performing probabilistic prediction of the next word. During training, we mask out words from vast amounts of text and task the model with filling in the blanks based on context. Over time, it learns the distribution and style of human language: it can make a statement sound true, but “sounding true” is not equivalent to “being correct.” (Admittedly, modern ChatGPT can now access the internet, which mitigates the problem but does not entirely solve it).

You might ask: But when I use it, it’s correct nine times out of ten. The reason is simple: the training corpus already contains a massive amount of public facts (encyclopedias, news, technical documents), and these facts appear repeatedly across different sources. Consequently, the AI performs robustly on high-consensus topics. However, once it enters long-tail details (case numbers, statute numbers, volume/page numbers, etc.), it is still completing the sequence based on probability and might end up coherently fabricating the lie using the correct format.

There’s also an amplifier: Reinforcement Learning from Human Feedback (RLHF). After pre-training, researchers fine-tune the model using human preferences to make it more polite, more willing to elaborate, and better at completing structured outputs (e.g., automatically filling in the “Case Name — Court — Year — Holding — Reasoning — Disposition” structure). This enhances readability and persuasiveness but does not magically bestow it with fact-checking capabilities.

I like comparing the LLM to a calculator: The LLM is probabilistic, sampling from a distribution with every generation; The calculator is deterministic, always returning the same correct result for the same expression. Don’t get me wrong, I am not criticizing the uncertainty (probabilistic nature) of LLMs, but I want to re-emphasize that only by recognizing the distinction between these two tools can we better solve our problems. For instance, you can use an LLM to search for and provide you with many legal precedents, but you absolutely must verify them yourself, especially when acting as an attorney for others.

Why People Trust AI

AI hasn’t made you a superman, but it has made you feel like one.

When searching databases and encountering a legal summary neatly formatted with “Case Name — Court — Year — Holding — Reasoning,” your first instinct is to believe it’s correct because of how it looks, and only then do you consider if it’s true. When an answer is fluent, professionally formatted, and confident in tone, we are more likely to equate readability with reliability. This is what psychology calls processing fluency: information that is easier for the brain to process is more readily perceived as truer, more familiar, and more trustworthy. LLMs inherently benefit from this; their objective is to generate language that is coherent, natural, and human-like.

Fluency is just the first step. More insidious is automation bias. When an output comes from an intelligent system, people subconsciously defer to the system’s judgment: they use it inappropriately (misuse) and fail to review it when they should (complacency), treating the machine as a more objective version of themselves. Language models maximize this bias because they don’t just act like an expert; they are always available and consistently provide answers in a polite, complete manner. Consequently, a loop forms: “The system looks reliable → I question it less.”

The third, and most concealed, trap stems from that classic cognitive dilemma — you don’t know what you don’t know.

This is the lethal stroke of the Dunning–Kruger effect. Just as someone who has never seen a genuine diamond cannot recognize the cheap gleam of glass, the professional text generated by AI precisely exploits this blind spot. When operating in a domain we are unfamiliar with, our own lack of competence prevents us from reliably evaluating a perfectly formatted and logically consistent answer. As a result, we often unconsciously accept this professional appearance as the highest standard of fact-checking.

As these cognitive biases lead us to habitually trust AI, a deeper change occurs: we begin to outsource core aspects of our thinking — such as judgment, reasoning, and even the construction of our worldview — to the system.

How AI Affects Human Thinking

AI is not the first tool to influence how humans think. For instance, we have long outsourced a portion of our memory to external tools.

A study by Sparrow et al. in Science noted that the anticipation of future access to information decreases the recall of the information itself, while increasing the tendency to remember where the information can be retrieved (the retrieval path). Photography shows a similar side effect: taking pictures of exhibits in a museum can impair memory for the exhibit’s details — a phenomenon called the photo-taking-impairment effect, which Henkel demonstrated in a Psychological Science study.

AI pushes this outsourcing from memory to thinking and judgment.

The trap of this cognitive outsourcing is evident across various fields. In reading, we assume that finishing an AI summary equates to understanding the entire book, forgetting that true insights are often hidden in the author’s detailed arguments — precisely what the summary discards first. In software development, the trap is more insidious: AI-written code might run quickly, but this is like having AI rapidly construct a house without blueprints. Though you can move in fast, when future modifications or repairs are needed, you find its internal structure is chaotic, where changing one part affects everything else.

When this cognitive outsourcing goes to the extreme, AI can even become a tool that reshapes an individual’s worldview, or even their self-perception. A Reddit post documented this unsettling evolution:

My partner has been working with chatgpt CHATS to create what he believes is the worlds first truly recursive ai that gives him the answers to the universe. He says with conviction that he is a superior human now and is growing at an insanely rapid pace.

This is not an isolated case. Such delusional beliefs, constantly reinforced by AI, have already raised alarms in the scientific community. An article in Nature clearly states:

Chatbots can reinforce delusional beliefs, and, in rare cases, users have experienced psychotic episodes.

When our thinking, judgment, and even self-identity begin to be shaped by an external system, the essence of the problem is no longer about efficiency — it’s about ourselves. This forces us to answer a fundamental question: Facing this Pandora’s Box that has already been opened, how do we draw boundaries to ensure we remain the masters of technology, rather than its slaves?

Taking Control of Pandora’s Box

Fundamentally, AI is a commercial product designed to serve us. This means its core goal is to enhance user experience, not to safeguard objective truth. When we outsource our memory, judgment, and even our beliefs to this seemingly omniscient partner, what we often receive is merely an echo of our own preferences.

At the same time, we cannot simply stop using AI. In fields like writing, translation, programming, and design, AI is undeniably reshaping industries. The question of how to use AI correctly is one that professionals in these areas must address.

What distinguishes a PhD-level intelligence AI from a human? I believe the key lies in responsibility and trust.

When you hire an employee, you trust them to perform the job, and they, in turn, assume the responsibility. They will inevitably make mistakes, but a responsible employee accepts the consequences of their errors, reflects, and prevents recurrence. The trust accumulated through this process alleviates your cognitive burden. This level of responsibility and trust is what current AI cannot provide. Even if AI completes a large volume of work for you, the responsibility for checking the details (or ensuring the quality through various means) remains yours.

Press enter or click to view image in full size

In practice, this points to the ideal way to thrive in the current AI era: responsible use of AI. You can use AI to write code, but you must review it, and, crucially, you must be accountable for the code that AI generates. The commit message by developer Xuanwo provides a prime example of this responsibility:

Press enter or click to view image in full size

Some open source projects have recognized similar issues and are starting to roll out corresponding policies for the use of AI tools (1 2). This is because if contributors use AI irresponsibly, it places an immense burden on project maintainers, jeopardizing the project’s long-term health.

Imagine an open source project where contributors constantly submit large volumes of AI-generated code, and maintainers constantly use AI to review it, without any systemic thinking about the project’s evolution. While the project might appear to be receiving massive daily contributions, I believe it would quickly drown and vanish into a quiet corner of GitHub.

Of course, the discussion in this section is far from sufficient, but given the scope and word count of this article, I will reserve more detailed thoughts for a future piece.

Postscript

On my ‘For You’ timeline on X, I am constantly inundated with AI news and shares. Before I finally muted the term, “game changer” appeared almost non-stop. I admit that AI is perhaps the hottest word in the technology world right now, and it has indeed transformed the way many people (including myself) work and study. AI is a powerful tool, but perhaps because “praise is meaningless without criticism,” I wanted to start a series of articles dedicated to exploring the problems of AI and, more importantly, finding the best practices for coexisting with it.

Read Entire Article