Beyond the for You Page: Uncovering Algorithmic Bias in New York's Mayoral Race

20 hours ago 1

Is TikTok’s “For You” really for you?

What if the algorithm isn’t just serving what you like, but shaping what you will like? The endless scroll feels personal, but behind the illusion of customization lies something more deliberate. These systems don’t merely reflect your tastes, they manufacture them, fine-tuning your emotions and beliefs with every swipe. And when the “recommendations” start aligning a little too neatly with certain narratives, one has to ask: are we discovering content, or being quietly directed toward it?

TikTok’s algorithm is distorting the playing field in New York City’s mayoral race. Our early analysis suggests that content favoring Zohran Mamdani is being amplified, while videos supporting Andrew Cuomo are being suppressed, a pattern that could meaningfully influence public perception and voter behavior.

We’re sharing these preliminary findings now because the issue is timely and urgent. Early evidence points to algorithmic influence that may be shaping voter perception in the New York elections.

Reverse Engineering Recommendation Systems

Our journey into TikTok began with deep technical research into the algorithm itself. Reverse engineering, analyzing communication protocols, and studying leaked documents and publications from TikTok and other social networks.

Two sources proved particularly valuable: a leaked onboarding document explaining TikTok’s recommender system to new engineers, and Twitter’s 2023 algorithm release. These helped us understand how different components work together to decide which video appears next in a user’s feed.

Measuring What Users Actually See

The conventional way to study social media content has been through scraping — capturing everything that’s been posted on a platform. But knowing what exists online isn’t the same as knowing what people actually see. That gap has always mattered, and it’s becoming crucial now that content consumption is driven less by user intent and more by algorithmic recommendation. In other words, we no longer search for content — it finds us (think YouTube search versus TikTok’s “For You” feed).

The algorithm that selects which videos reach which users isn’t just determining what goes viral. It’s shaping what millions of people understand to be true about the world.

This insight led us to develop a fundamentally different data collection approach. Instead of scraping “what exists,” we collect what the system actually delivers to users with specific profiles. We capture interactions with real and synthetic users, focusing on what the algorithm actually recommends based on user characteristics.

Press enter or click to view image in full size

Scraping vs Representative Data: The crowd represents all available content — what exists online. Bottom: Possible personalized selections an algorithm may deliver to users. Scraping sees which content exists, representative data sees how the algorithm behaves, what it chooses to surface and for whom.

Detecting Algorithmic Manipulation Through Predictive Modelling

Based on our in-depth understanding of recommendation systems and millions of interactions with videos, we developed an AI regression model that predicts a video’s view count using its metadata. Our analysis shows that the model achieves strong predictive accuracy for videos with more than 1,000 views, as illustrated in the figure below.

Press enter or click to view image in full size

R²: 0.9276

With an R² value of 0.928 for the regression, the model’s predictions serve as a reliable baseline: videos whose actual views differ significantly from the predicted value can be suspected as either promoted or demoted. Based on that, we’ve trained a second model, this time a classification model, to directly predict whether a video is being promoted, using only its metadata. We call it the “Excessive Publicity” classification model.

Given the statistical nature of our methods, we focused on analyzing behavior across video groups rather than individual cases. By strategically grouping videos, we could construct a more robust and semantically meaningful measurement of algorithmic manipulation.

The Paid Promotion Test

To validate our methods, we’ve tested our “Excessive Publicity” classification model over a group of videos that are sure to be promoted — commercially promoted videos. Our algorithm classified 76.4% of the videos labelled ads as non-organically “promoted”.

This strengthened our confidence that the tool successfully identifies “non-organic” amplification of videos, even in their early stages, based on engagement data observed by our real and synthetic users.

Defining the Baseline of Excessive Publicity

After removing all advertisement videos from our dataset, we began clustering the remaining videos into narratives. Our goal was to identify narratives that exhibit unusually high or low proportions of videos classified as receiving “Excessive Publicity”, and to determine what rate of “Excessive Publicity” should be considered normal or baseline.

Our initial clustering method was simple: we analyzed a range of trending keywords and grouped videos based on whether each keyword appeared in their descriptions (either as hashtags or otherwise). Across the full dataset, 17% of videos were labelled as having “Excessive Publicity.” Most keyword-based groups clustered around this baseline — an expected outcome if the keyword itself is not correlated with “Excessive Publicity.”

Press enter or click to view image in full size

While some keyword groups naturally correspond to more popular or niche topics — with videos receiving higher or lower average views — the rate of “Excessive Publicity” remains consistently close to the 17% baseline. This distinction highlights that popularity and amplification are not the same: “Excessive Publicity” measures how much additional, non-organic promotion a narrative receives, independent of its general engagement level and popularity.

Political Content Amplification

Having established that most keywords follow the natural 17% baseline, we turned our attention to finding exceptions: which keywords show consistently higher rates of “Excessive Publicity”?

We grouped keywords by topic and calculated the percentage of videos with “Excessive Publicity” in each category. Most topics remained close to baseline: Music & Dance (19.4%), Technology (22.8%), and Education & Productivity (25%) all showed modest variation.

Politics was the clear outlier, with 55% of videos receiving “Excessive Publicity” — more than triple the baseline and double any other category. This dramatic anomaly led us to investigate which political narratives were being amplified.

Press enter or click to view image in full size

Within political keywords, two names dominated: “Mamdani” and “Cuomo”, the two leading candidates in New York’s mayoral race, had the highest percentages of videos receiving “Excessive Publicity.”

This discovery led us to investigate the New York election more deeply.

Separating Support from Opposition

We started clustering New York election-related videos by narrative. Since a single hashtag could appear in videos that either support or oppose a candidate. We used a model to classify videos as supporting or opposing each candidate. We then manually reviewed the results to ensure classification accuracy.

We analyzed these four groups and checked the proportion of videos showing non-organic promotion based on the “Excessive Publicity” model.

The results were striking.

Press enter or click to view image in full size

Content favoring Mamdani, both videos supporting him and those opposing Cuomo, received significantly higher rates of “Excessive Publicity” compared to content favoring Cuomo. While both sides showed evidence of non-organic promotion, the disparity is clear: pro-Mamdani content consistently received more amplification than pro-Cuomo content.

To isolate this disparity within the political space, we normalized our measurements against the 55% political baseline. Pro-Mamdani content substantially exceeded this already-elevated baseline, while pro-Cuomo content fell below it — suggesting not just relative disadvantage but active suppression.

Press enter or click to view image in full size

Figure: Excessive Publicity rates across election narratives, normalized to politics related content baseline (55% of videos marked with Excessive Publicity per narrative)

Some might argue this pattern simply reflects TikTok’s younger user base favoring Mamdani. But that misses the point. We’re not measuring how much content exists for each candidate, or even how popular that content is. We’re measuring deviation from expected organic reach. These videos received amplification beyond what their engagement patterns predicted, regardless of overall popularity or platform demographics.

Conclusion: The Urgency of Algorithmic Transparency

Given the time-sensitive nature of these findings, we’ve chosen to share them now, even in their early form. The evidence we’ve uncovered raises urgent questions about the neutrality of social media algorithms and its potential to shape public perception. Our work suggests that what users see, and by extension, what they believe, may be influenced by forces beyond organic engagement.

We’ll continue refining our models, broadening our dataset, and publishing updates as we deepen our understanding of algorithmic manipulation.

If you’d like to stay updated on our findings or suggest new narratives for analysis, subscribe to updates at this link.

Interested in collaborating or learning more about our methods? Reach out directly to us at [email protected]

Read Entire Article