Slop Machines: on the interaction between feed recommender systems and GenAI

1 hour ago 2

feeds

A social media feed is:

a sequence of content items that are shown to the user
where the user provides feedback about how much they like each item, and
where the system chooses items to show based on what it knows about that user's preferences

Yes there are feeds that just show items the user explicitly asked for in chronological order, but I'm not talking about those here.

The feedback is often framed as a "reward function", and it's a function of everything the system can see a user doing. You could think of different actions related to a particular item having diferent numbers of points. Let's say items are videos. If the user watches the whole video, that's a lot of points. If the user swipes past without even watching the video, that's no points. If the user watches two seconds of the video and then swipes, that's negative points. If the user watches two seconds of the video and then throws their phone in a lake (or closes the app, which looks the same to the app), that's maximum negative points.

The system has many content items (millions, typically). The goal is to predict how many points each item would get, and show the items that get the most points.

bandits

But starting off you don't know anything about the user, so you don't have any basis on which to make that prediction. As you show the user items and get reward feedback you learn their preferences and are able to make better predictions. This is what's called a bandit problem.

Bandit here means "one-armed bandit", ie slot machine. The setup is: you're at a casino, there's a row of slot machines, each slot machine always has the same odds of paying out, but different machines have different odds and you don't know what the odds are. You have a roll of coins, and you take a sequence of steps where at each step you choose a machine to put a coin in and pull the lever. The more you play a given machine, the more accurate your estimate of that machine's odds is.

With a given coin you have a choice: do you use that coin to play the machine that, as far as you know, has the best odds, or do you play a machine you have less information about in case it turns out to have even beter odds? This is called the exploration/exploitation tradeoff.

There are several algorithms (ie strategies) for doing this. My personal favorite is Thompson sampling, but there's also UCB, greedy and others. What they have in common is that in order to get the most rewards, you have to pay to get information about where those rewards are. Except in feeds, because of a quirk of behavioral psychology, not only do you not have to pay to explore but you actually get paid.

reward loops

Social media feeds are built around a core sequence of actions, sometimes called a core interaction loop. A typical one is something like: see a feed item, watch the item, swipe to the next item. When a loop like that has the potential to do something the user wants it's called a reward loop.

TV channel surfing was a similar kind of reward loop, as are actual slot machines. It turns out that people are more engaged if it's uncertain whether they'll get what they want than if it's certain. This amount of uncertainty (the odds of getting the thing you want) is called a reward schedule This has been known in psychology at least going back to Skinner and it's common sense to anyone who's seen people gamble.

So if you did a bandits strategy where some percentage of the time you picked the item that had the highest expected reward, and some percentage of the time you chose an item at random (this strategy is called epsilon-greedy), then that percentage (the epsilon) would be your reward schedule. So if you set epsilon to a number that makes a good reward schedule then you actually get better performance than you would if you had complete information and didn't have to explore at all.

spaces

So far we've been assuming that each bandit corresponds to a single content item. Of course if you have millions of content items that's impractical not only because of the amount of data you would have to process and store for each user but also because you would have to show a given user millions of things before you knew anything about any significant fraction of items. Really you want to be able to generalize to items you haven't actually shown to the user. That is you want to be able to say "I haven't shown them this, but it's kind of in-between two things that I have shown the user, so I'm going to split the difference".

In order to do that you need some notion of which items are similar to which other items. There are two major approaches to this.

The first approach, called collaborative filtering, is to first count up, for each pair of items, the odds if someone liked item A that they also liked item B. If you have enough data and you don't care about efficiency you are kind of done at that point, but with the amount of data you probably actually have most of those numbers are going to be zero. That is for most pairs of items there wasn't anyone who even looked at both. If that was the only thing you wanted to do this would be a matrix completion problem, but the other issue is that now you have this massive matrix you need to work with. You really also want to crunch that matrix down to a much smaller one, which is called dimensionality reduction. The most common approach is to use PCA, but another algorithm that works and doesn't require knowing much linear algebra to understand is random matrix projection. It turns out that if you take the big matrix and multiply it by a small matrix literally full of random numbers the result will be a smaller matrix where the distances between the rows are close to the same.

The rows of that resulting matrix are called feature vectors. You can now add, subtract, average, and calculate distances (which correspond to similiarities) between items. Being able to do averages means you can now do the splitting-the-difference I mentioned before.

generation is interpolation

The second approach to generating feature vectors is to generate them from the content itself instead of from data about how users respond to the content. A simple example of this that works for text is you just count up how often each word occurs in the feed item and give each word in the dictionary its own dimension (these are called bag-of-words vectors). These days though generating feature vectors from contetnt tends to be done with neural networks.

Using neural networks to do feature embedding is actually very old (going back to the 1980s), but like most things involving neural nets it was out of fashion until the 90s AI winter thawed in the 2010s and it was socially acceptable to use neural networks again.

That 1980s technique, called the autoencoder, is the core of the diffusion models that are the most common way of doing AI image generation. What in stable diffusion is called a "U-net" is a classic convolutional autoencoder.

Prior to having these generative models the purpose of predicting one of these feature vectors was to find content whose vector is close to the one predicted, but with a generative model you can actually produce content (of sorts) from that feature vector directly.

Remember that the way we're picking feature vectors is by interpolating between vectors for wich we have data from the reward function. So what we're doing is blending together content that already exists. But that is fundamentally all that neural networks do. This is literally true in the sense that a fully connected layer (the only type in an MLP, the classic type of neural network) is a weighted sum, a blend function.

the demography of dementia

Being able to generate exactly the thing you predict the user will respond to might sound great in theory, but remember that we're just interpolating and not employing genuine creativity, so the product is slop garbage. Who wants personalized content where everything looks like it's from teletubbies?

Just by virtue of simple demographics, barring some miracle about 14 million Americans will have dementia in 2060. Not only that but this is the demographic that has the most money to spend.

You've probably seen (in images at least) the rows of old ladies sitting in Vegas casinos in front of screens, seemingly catatonic, pushing the same button repeatedly. That's who.

Read Entire Article