The Pot, the Kettle, and the Elephant

1 day ago 1

This week, Reddit sued Perplexity. Reddit is a community platform with over 100,000 topic-focused “subreddits,” where real people talk about all kinds of real and unreal subjects. Perplexity, meanwhile, is one of many AI chatbot vendors.

To make its chatbot useful, Perplexity needs a lot of data. There is no shortage of data freely available on the internet, of course. But quality data? That’s a different story. Some studies suggest almost 75% of new web pages already contain, or consist entirely of, AI-generated writing. This content is often passable but rarely groundbreaking. The noise is slowly drowning out the signal when it comes to information. So what is a company building a large language model supposed to do?

In Perplexity’s case, they allegedly decided to raid Reddit, one of the last bastions of almost entirely human-generated content. Reddit claims Perplexity hired third parties who, disguised as normal user accounts, scraped data off their platform en masse, then fed it to their AI. Mmm, millions of messages from real humans, yummy!

Reddit believes this infringes on their rights. The platform is free to browse in theory, but if you want to use large swaths of posts, you’d first have to strike a licensing agreement with the company. Ergo, lawsuit time.

In what’s undoubtedly a clever PR move, Perplexity was swift to issue a response to the breaking news—on Reddit, no less. They claim their models don’t train on the content. They merely summarize it and then provide the user with links to the original discussions.

In the comments, people quickly argued for both sides. Perhaps Reddit should be open to all, like most of the internet? Maybe Perplexity needs to respect when someone asserts they don’t want their website to be crawled by their bots? “Hope you guys win against Reddit!” one user cheered them on. In the thread underneath, an interesting discussion unfolded.

Reddit, as it turns out, is also an AI company. They have their own Answers feature which, lo and behold, generates answers to questions using Reddit’s millions of users’ posts. What’s more, Reddit has long been selling access to its data to the highest bidder—including to AI model builders. Like Google, for example, which shells out some $60 million per year for what Perplexity supposedly took for free.

A user with the on-point name of “Nonchalant_Demon” hit the nail on the head: It’s “a classic case of the pot calling the kettle black.” But, actually, it’s more than that. Because in most of the discussion, the elephant in the room remains unaddressed: The only folks with true rights to the data are the real people posting their ideas, thoughts, and opinions on Reddit. Without humans who create good data, Reddit has nothing to sell, and Perplexity has nothing to allegedly steal.

So, really, the situation is more akin to a pot and a kettle arguing in front of an elephant. While the two discuss who gets to ride the elephant first, if it rose up, the elephant could simply walk over both of them and be on its way. And while the elephant hasn’t done just that so far, in the past, it has at least twitched a little.

In 2023, Reddit decided that, after 15 years of free access, it would start charging for its application programming interfaces, or APIs. The move forced many apps to shut down and caused massive backlash among the community. Over 7,000 subreddits temporarily shut down their forums or set them to “not safe for work” in order to make them unsuitable for ads and thus hurt Reddit’s revenue.

Still, two years later, Reddit makes more money than ever from its members’ content. Both daily and weekly user numbers have almost doubled, the latter totaling over 400 million people worldwide. And although they’d never admit it, this partially goes back to the very kind of deal Perplexity refused to make: After Google agreed to pay Reddit for using its data, Reddit posts started showing up much more frequently at the top of Google’s search results. Because if you’re paying someone for their information, why not help them generate more of it? You scratch my back, I scratch yours, right?

The only question that remains? Who scratches the elephant’s back? Because so far, it’s us, the human writers, creators, and everyday social media users who’ve gained neither sufficient consent, nor credit, let alone compensation from all the steaming AI–pots and kettles out there. If internet history thus far is any indication, no matter who sues who, we likely won’t receive either anytime soon.

Perhaps it’s time for the elephant to pull a Perplexity: Don’t ask. Just move.

Read Entire Article