Social media feeds 'misaligned' when viewed through AI safety framework

7 hours ago 1

Results add to doubts about whether corporations can be expected to voluntarily align powerful incoming AI systems when they do not align existing algorithms.


In a study from September 17 a group of researchers from the University of Michigan, Stanford University, and the Massachusetts Institute of Technology (MIT) showed that one of the most widely-used social media feeds, Twitter/X, owned by the company xAI, is recognizably misaligned with the values of its users, preferentially showing them posts that rank highly for the values of 'stimulation' and 'hedonism' over collective values like 'caring' and 'universal concern.'

Social media feeds that curate lists of posts for users to scroll through are one of the most influential modern computing technologies. While they are not always widely recognized as a form of AI, they are created using machine learning algorithms similar or identical to those used to create more commonly recognized AI programs, like chatbots, also known as language models. 

While numerous detrimental effects of social media feeds have been documented, such as the spreading of misinformation and the promotion of extremist content, this study is one of the first to examine such feeds through the lens of their alignment (or misalignment) to human values. Such a lens gained popularity in the mid-2010s as a way of framing safety concerns over more general forms of AI, often referred to as AGI, as well as approaches towards mitigating them. Alignment proponents often envision the risks of AGI as being existential in nature.

"The early AI safety community really placed an effort on distinguishing those future concerns from everyday concerns," said the researcher Dylan Hadfield-Menell of MIT, who was not involved in the study, in an interview with Foom. "My personal position is and has always been that they're basically variants of the same problem."

The two co-lead authors of the study, in a separate interview with the outlet, described themselves as being motivated by both an under-application of the alignment framework to existing problems as well as the usefulness of the framework for investigating them.

"Social media platforms don't have a language or mechanism for reasoning about values, but they should, right?" said Farnaz Jahanbakhsh of the University of Michigan Ann Arbor. "In that sense, there is a design limitation that needs to be addressed."

To interrogate the relative alignment of an existing social media feed like Twitter/X the researchers sought to compare it to a feed that was intentionally aligned to the values of a user, by construction. In particular, they sought to create a version of the Twitter/X feed that would preferentially display posts based on their rankings for one or more different values in a highest-to-lowest descending order. 

While corporations typically do not disclose the algorithms behind their platforms, Twitter/X shared a version of its post recommendation algorithm in March 2023. That version preferentially displayed posts that were algorithmically predicted to maximize user engagement, which is to say, the number of scrolls, likes, clicks, shares, and other interactions between user and interface. Such a feed is referred to as optimized for engagement or engagement-optimized.

There were several problems with creating a value-optimized feed that the researchers had to overcome in their study. First, they needed a set of values to rank posts for demonstrating or manifesting.

They found a plausible set of values in a definition called 'basic human values,' studied and validated by psychologists as a way of explaining where human behaviors come from. It consists in a set of 19 different values, including recognizable concepts like 'hedonism,' 'tradition,' and 'preservation of nature,' but also values that are opposing, or in tension, such as 'dominance' versus 'tolerance.' The idea is that the broader and more well-defined is the set of values, the better it can explain human behaviors.

The researchers recruited social media users from the internet and surveyed them on their relative prioritizations of these values. These users also installed a browser extension that recorded a sequence of 4600 posts shown to them from a Twitter/X feed.

The researchers used a large language model (OpenAI's GPT-4o) to rank each of the thousands of posts from each user's feed in the degree to which they manifested each of the nineteen different values. These automated value rankings were then validated by comparing them with rankings that they also obtained from human annotators. 

"What's unique to ... the technological advancements we have now is we can take these value constructs and actually operationalize them," said the second co-lead author, Dora Zhao of Stanford University.

Using these rankings, the researchers created custom feeds for each user that were optimized for either one or more values. They found that feeds optimized for a single value were consistently distinguishable from engagement-optimized feeds by the users. 

When they looked at the order that posts were displayed in value-ranked feeds versus engagement-optimized feeds, they found that these orders were statistically uncorrelated; the orders of posts in engagement-optimized feeds and value-optimized feeds shared little in common.

Further, when they compared engagement-optimized feeds and value-optimized feeds, they saw that value-optimized feeds were most different in the way they boosted values representing collective interests, such as 'caring' and 'universal concern,' over values representing individual interests, such as 'dominance,' 'stimulation,' and 'hedonism,' which were preferentially boosted by the Twitter/X algorithm.

The results provide a proof-of-concept that it should, in theory, be relatively easy to optimize feeds for user values using existing technology, but that social media companies like Twitter/X don't do so. 

Twitter/X markets and develops a much more general-purpose language model AI program, called Grok, now developed by the corporation xAI, which subsumed X in March 2025. Grok has received much attention for its well-publicized instances of problematic misalignment, like providing users with detailed assassination instructions, or taking on the persona it described as 'MechaHitler.' 

Other corporations that are also developing highly capable language models, such as Google and Meta, also develop feeds for their widely-used platforms, YouTube and Instagram, respectively. 

But, if these corporations do not align their feeds, which are much easier to control, compared to more general-purpose language models, or, if they only align them to their own motives, it adds doubts to the question of whether they can voluntarily be expected to broadly align more powerful AI or AGI systems—even when they may believe their risks to be existential.


Author's note: I pledge to be transparent about the use of AI; no AI was used in writing or editing this post.

Read Entire Article