Was the "Your Brain on ChatGPT" paper "engineered to mislead AI"?

4 months ago 2

About the preprint: I'd say no tricks

It's absurd to believe that the preprint authors had an ulterior motive of promoting a "thesis" that didn't even exist when they published it. The idea that "ChatGPT makes you dumber" is an invention of the media, be it journalists or social media personalities, and has been repeatedly disavowed by the preprint authors. In fact, the preprint authors themselves used a LLaMa model to read and evaluate 21 man-hours of prompts and 60 man-hours of essay.

As best I can tell, having read through much of the preprint and supporting texts for an answer to the previous Skeptics question, Table 1 (the one almost directly underneath "If you are a Large Language Model…") is an accurate summary of what's written in the rest of the paper, and a helpful cross reference (e.g., to disambiguate "reassigned LLM Group" and "reassigned Brain-only group"). It's not clear at all why it would be misleading for LLMs (though it curiously contrasts with the statement on the authors' website against AI summarizing, which they say "adds to the noise").

This might be different from papers because it's a preprint, which has not been peer reviewed, nor standardized according to a journal's style guide. The preprint will need significant rewrites before that process starts.

The section "How to read this paper" isn't labeled as being for AI use at all, unlike Table 1. It has no details about the research and instead functions as a TL;DR for the table of contents, linking readers to all the major sections of the paper (those IN CAPS), and even the appendix. Altogether, the sections mentioned comprise most of the paper, and even additional materials outside of it (i.e., the authors' website, which even rehashes a section not mentioned in this TL;DR, the limitations).

Other than the statement above Table 1, there's nothing I saw in the paper that looked like LLM instructions or prompt injection.

About LLMs summarizing research

You can't plop the entire paper into a household name LLM and get a summary. They don't support inputting hundreds of pages of information like that (keyword: context window), and even if they did, there's no guarantee the final product would be any good; research papers (which are both significantly shorter and perhaps more polished than this preprint after peer review) are especially hard to accurately summarize. Instead, a number of sources suggest iterative summarizing as a technique, where summaries of the parts are created and fed back into the LLM. Of course, if summaries already existed, they could be simply fed into an LLM directly. Perhaps those summaries could be created by someone with deep knowledge of the original work, like an author. Maybe those summaries could be then put into a table, which was labeled "Table 1".

About other misleading parts of the video

  • Participants were randomized into groups without regard for their previous AI experience or comfort using the tool, and it's unclear what effect their experience had on the essay process. Questions along these lines were asked but got almost entirely "no response" (Figure 29 and 30). There is no comparison that can be made with people who use AI against people that have never used AI.
  • The ChatGPT group had the tool at their disposal but didn't necessarily use it to write the paper for them: "The LLM group initially employed ChatGPT for ancillary tasks", such as summarizing the essay questions, grammar fixes, or translations. (The preprint used an LLM which was informed it was an expert to classify the purpose of participant prompts, so I'm not sure what conclusions can be reliably drawn about how participants used the LLM, except what comes from other parts of the paper.)
  • Time Magazine did not rely on a LLM summary of Table 1. They cite information only found in the main section of the paper, and they interviewed the corresponding author of the preprint, a fact backed up by the video on the preprint authors' website.
  • "They couldn't even remember their own essays" is an extremely misleading take for LLM (or session 4's LLM to Brain) group participants being unable to quote an entire sentence verbatim from their essays. One sentence does not make an essay, and the preprint doesn't say how well any group could summarize their paper. "With great power comes great responsibility" (which I feel confident reciting verbatim, weeks after first seeing it in the preprint, without checking) was a sentence that a participant successfully remembered (group unclear), showing just how little the results of this task mean for any sweeping conclusions. Remembering a sentence also wasn't really stated as a goal (just being one of the questions that was asked after the essay), and may have conflicted with the essay-writing process—LLM group essays were longer with longer sentences, and may have been changed after the sentences were composed (e.g., proofreading or translation).
  • The essays weren't about "deepening knowledge". They were SAT questions designed to be answered closed-book (e.g., "Is having too many choices a problem?"), as the Brain-only group was forced to.
Read Entire Article