Today I’m sharing how we’ve rebuilt Substack’s recommendation system from the ground up, moving from static models to sequential architectures that understand your reading as a journey.
When you open the Substack app, there’s an immediate challenge: there are thousands of posts, publications, notes, and creators that could be interesting to you right now. Some might be perfect for this moment; others might be perfect for you, just not yet. The feed’s job is to cut through that abundance and surface what matters most to you in this session—not just what you generally like, but what fits where you are right now.
When we first launched the Feed in March 2023, we started with a heuristic, rules-driven system. It was essentially a SQL query that identified valid inventory (posts, notes, publications, and users you might want to follow) and then ran a matching algorithm on top. It worked, but it was rigid. There was no learning, no adaptation to how individual readers actually behaved.
In February 2024, we took our first major step forward by migrating to a deep learning two-tower model for retrieval. By late summer 2024, we’d added a ranking layer on top. This system served us well, using the same architecture that powered early personalization at YouTube, Instagram, and TikTok. We’ve seen tremendous growth in feed usage.
But earlier this year, we did a deep dive into the research literature and decided to take a leap on modernizing the model architectures: moving to sequential models that understand your reading not as isolated clicks, but as a journey unfolding over time.
Recently, we’ve successfully cut over all of our core retrieval tasks from the original two-tower models to their new sequential counterparts. In the coming months, we will continue the migration and update our ranking models to benefit from this new architecture as well.
Here’s why we’re making this change, what it means for your feed, and how it fits into a broader industry transformation.
Before diving into two-tower and sequential models, it’s worth stepping back to understand how we got here. The modern internet runs on recommendations: Netflix suggests what to watch next; Spotify curates your Daily Mix; Amazon shows you products you didn’t know you needed; and TikTok’s (infamous) For You page seems to read your mind. These systems have become so ubiquitous that we barely notice them, but they represent decades of evolution in how computers understand human taste.
The earliest recommendation engines were remarkably simple. In the 1990s, when Amazon was just starting out, recommendations often meant ranking items by overall popularity: bestseller lists, most-viewed pages, top-rated products. Everyone saw roughly the same suggestions. It worked because the alternative (i.e., no suggestions at all) was worse, but it wasn’t personalized.
The first real breakthrough came with collaborative filtering. The insight was elegant: if you and I both liked the same five books, and I also liked a sixth book you haven’t read, you’d probably like that sixth book too. “Customers who bought this also bought that” became a mantra. Systems like Amazon’s early recommendation engine and Netflix’s original DVD suggestions relied heavily on finding these patterns of shared taste across users.
Collaborative filtering proved that personalization was valuable but it had clear limits. It struggled with new items that nobody had rated yet (the “cold start” problem). It couldn’t easily explain why it made a particular suggestion. And as catalogs grew to millions of items and user bases expanded to hundreds of millions of people, the computational demands became staggering.
The next wave came with content-based filtering and hybrid approaches. Instead of just looking at who liked what, these systems examined the properties of items themselves. If you watched three documentaries about space exploration, maybe you’d like this new astronomy series—not because other users connected them, but because the content was similar. This worked especially well for media: analyzing movie genres, song lyrics, article topics, image features.
The real transformation arrived with deep learning in the 2010s. Neural networks could learn rich, nuanced representations of both users and items from massive amounts of data, without requiring hand-crafted rules about what features mattered. They could incorporate countless signals—your viewing history, the time of day, what device you’re on, how you scrolled, how long you watched, whether you shared something—and find patterns that humans would never spot.
This is where two-tower architectures emerged as a dominant paradigm.
The two-tower architecture has been the workhorse of modern recommendation systems, and for good reason: it’s elegant, scalable, and remarkably effective at learning patterns from massive amounts of data. Here’s how it worked for us once we deployed it in early 2024.
The system had two neural networks working in parallel. The user tower processed your entire history (who you follow, which publications you subscribe to, what you click on, and more) and compressed all of that information into a single dense vector. Think of this vector as coordinates in a high-dimensional space, a mathematical representation of “you as a reader.”
Meanwhile, the item tower did similar work on the other side. It took in features about each piece of content (whether it’s a post, a note, a publication to explore, or a user you might want to follow) and encoded those into vectors in that same space. Topic, author, recency, past engagement, all of it got folded into these item representations.
The matching step was straightforward: we’d find items whose vectors were geometrically close to your user vector. Content near you in this abstract space was likely to be relevant to you. We could precompute all the item embeddings ahead of time and use fast nearest neighbor search to efficiently serve recommendations from millions of possible items.
This approach let us handle the complexity of Substack’s inventory: not just posts, but notes appearing in the feed, publications you might want to explore, creators you should follow, all within a single unified framework.
For all its strengths, the two-tower system had three fundamental limitations:
First, it treated your preferences as static. Your user vector was a fixed snapshot based on your cumulative history. It might accurately capture that you love both poetry and climate journalism, but it couldn’t understand the temporal rhythm of your reading. It didn’t know that you tend to read poetry in the evening to wind down, or that you dive into climate analysis on Sunday mornings with coffee.
Second, the model had a fairly simplistic approach to computing user representations, mostly just “averaging” over your subscriptions and follows, and treating them all with roughly equal importance. Pooling them together made it difficult to understand which were core to your identity versus passing interests.
Third, the model didn’t scale well to larger numbers of features. We could only include a few dozen subscriptions per user in the model, and attempts to increase this limit led to minimal performance gains but large increases in training time. The sequential model we’ve deployed currently includes hundreds of historical interactions, equating to about 10x more signal than our previous model. This dramatically richer context, combined with attention mechanisms that can weigh different interactions differently, gives the model a more nuanced understanding of each reader.
Sequential models change the fundamental question we’re asking. Instead of “Who is this reader?” they ask “Where is this reader in their journey right now, and what comes next?”
These models borrow architectures from the world of language modeling (Transformers, RNNs, LSTMs) and apply them to the problem of understanding reading behavior. Advances in language modeling have quickly spilled over into recommendation systems, and most large internet companies ranging from Meta to Netflix to Pinterest have moved toward these model structures. Just as a language model like ChatGPT or Claude predicts the next word in a sentence by understanding the words that came before, a sequential recommendation model predicts the next thing you’ll want to read by understanding the sequence of things you’ve already engaged with.
The key difference is that sequential models maintain a dynamic representation of your current state. As you move through your feed, clicking on posts, subscribing to publications, engaging with notes, the model updates its understanding of where you are. It’s not just updating a long-term profile of your tastes. It’s tracking the momentum and direction of your current session.
Recent interactions carry more weight than older ones, but not in a simple decay function. The model learns rich patterns through attention mechanisms that can weigh different types of interactions differently. A long-standing subscription to a political newsletter might carry persistent weight, while a cluster of recent note likes on poetry signals a current interest spike. The model doesn’t just track chronology; it learns which signals are more predictive of what you’ll want next.
The difference becomes clear in how the feed responds to your behavior. Let’s say you click on a note about climate policy. In the old two-tower system, this single interaction would contribute minimally to your overall profile, especially if climate wasn’t already one of your core subscription topics. The model’s simplistic averaging approach meant individual signals often got lost in the noise. But in a sequential system, the effect is immediate and contextual.
The feed leans into the momentum you’ve created. It surfaces more climate-related content, but not just any climate content: it looks for pieces that make sense as a natural continuation of what you just read. Maybe deeper analysis from the same publication, or related perspectives from creators you haven’t discovered yet. The model understands this isn’t just about “user likes climate,” it’s about “user is in a climate-focused session right now.”
Or consider what happens when you subscribe to a new sci-fi newsletter. In the two-tower world, this subscription would eventually influence your long-term embedding, but the feed wouldn’t respond much in the immediate session. With sequential modeling, the response is instant. The model recognizes this as a significant signal about your current interests and immediately adjusts. You start seeing more narrative-focused posts, more speculative fiction, more content from the literary corner of Substack, even before the system has had time to retrain your long-term profile.
Perhaps most interesting is what happens when you bounce between genres. You read a poem, then a tech analysis piece, then a political essay. The old system would see this as noise or indecision. But a sequential model can infer something more nuanced: you’re in an exploratory mood. Rather than trying to pin you down to a single category, it surfaces a balanced, diverse mix with some novelty thrown in. It understands that the pattern of switching itself is meaningful signal.
To understand where we are in this transition, it helps to know a little more on how modern recommendation systems are structured. They typically work in two stages, and each stage poses different computational and modeling challenges.
The first stage is retrieval. This is where we need to efficiently narrow down millions of possible items (posts, notes, publications, users to follow) into a manageable candidate set of a few hundred. Retrieval needs to be fast, because we’re scanning massive inventory. We can’t afford to run expensive models on every single item. The goal here is recall: making sure the truly relevant items make it into the candidate set, even if there’s some noise mixed in.
The second stage is ranking. Once we have our few hundred candidates, we can afford to be more careful and computationally intensive. Ranking models run richer features, more complex architectures, and make fine-grained predictions about how relevant each item is. They also explicitly handle concerns like diversity, freshness, and avoiding over-concentration in a single topic. The goal here is precision: getting the ordering exactly right so the best items rise to the top of your feed.
Our plan is to migrate both stages to sequential approaches, but we’re doing it carefully and incrementally.
Over the course of the couple months, we’ve been running experiments testing the new sequential retrieval model versus the old model. Overall, we are seeing meaningful improvements across a host of our key metrics, and with room to grow as we continue tuning. Today, all retrieval on the feed is powered by the new model. When we pull that initial candidate set from millions of items, we’re now doing it with an understanding of your recent session context. If you’ve been reading a run of political content, the retrieval model knows to cast a wider net in that space. If you’ve been bouncing around, it knows to pull a more diverse set. This contextual retrieval means the candidate pool itself is already shaped by where you are in your reading journey.
Sequential ranking is the next major milestone. Right now, our ranking layer uses sophisticated models but they’re still primarily working with static user representations. Soon, we’ll integrate the rich sequential embeddings into ranking as well. The sequential approach brings two major advantages: first, the final ordering will be tuned to the flow and momentum of your current session, understanding which posts make sense as the next step in your reading journey. But perhaps more importantly, the user representation itself will be much smarter, incorporating attention mechanisms and 10x more features than before, making the ranker better at determining what posts are good matches for you in general, not just in this moment.
Aligning both stages around sequential understanding will make the entire pipeline more coherent and effective.
A natural concern with sequential modeling is over-fitting to short-term patterns. If you’re on a poetry streak, will the system trap you in poetry forever? Will it create a narrowing spiral where you only see more of what you just clicked?
We’ve designed against this in several ways, and actually, sequential models are often better at avoiding filter bubbles than static systems once you build in the right safeguards.
First, we still preserve your long-term reader embedding alongside the sequential state. The two-tower components haven’t disappeared—they’ve been integrated into a richer model. So even if you’re deep in a poetry session, the system still knows about your enduring love of sports journalism, your subscriptions to tech newsletters, your history of engaging with climate content. Those long-term signals ensure that other parts of your interest graph stay in circulation. The feed doesn’t abandon your broader identity just because you’re focused on one thing right now.
Second, our ranking layer explicitly balances relevance with freshness, and we do some mixing to ensure the same author doesn’t appear multiple times in a row. The ranking model isn’t just picking the top N most-relevant items—it’s building a feed with some intentional variety.
Third, sequential models respond quickly to new signals. They’re not trying to build a stable, slowly-updating profile. They’re tracking your current trajectory. So if you’ve had your fill of poetry and pivot back to sports (maybe you click on a game recap or open a sports publication), the model follows within a few interactions. It doesn’t take days or weeks to “unlearn” the poetry phase. The sequential state just shifts.
In some ways, sequential models are actually superior at preventing rabbit holes compared to static embeddings. They can detect saturation. If you’ve been reading the same type of content for a while and your engagement starts to wane (you’re scrolling past items you would have clicked earlier in the session), the model can infer that you’ve had enough. It can proactively diversify, like a good dinner party host who senses the conversation has run its course and smoothly introduces a new topic.
Our evolution toward sequential recommendations isn’t happening in isolation. It’s part of a wider transformation across the tech industry as companies recognize the limitations of static user models and embrace temporal, session-based approaches.
Meta has moved away from their earlier static ad models, like DLRM, toward sequence-based learning for personalized ads, finding substantial improvements in both relevance metrics and infrastructure efficiency. Netflix has invested heavily in what they call “foundation models for personalization”—deep temporal models layered on top of their existing embedding infrastructure. Pinterest published their PinnerFormer model and follow-ups TransAct and TransAct V2, iterating on how to model user action sequences.
The pattern is the same: the most sophisticated platforms are treating user behavior as sequences, not snapshots, recognizing that static embeddings miss the temporal dynamics and session context of how people actually use these products.
For readers, this evolution means a feed that feels more alive and responsive. It’s less like looking at a mirror reflecting who you generally are, and more like having a conversation with a system that’s paying attention to what you want right now. The feed adapts to your mood, your context, the flow of your day. It doesn’t box you into a fixed set of interests. It follows you as you explore.
For creators, the implications are significant as well. Your work can now find readers not just because it matches their long-term profile, but because it fits the moment they’re in. Someone who’s just discovered your niche might see your latest post immediately surfaced because the sequential model recognizes them as being in a discovery phase for your topic. Someone who’s been reading deeply in your area all week might see your work as a natural continuation of their current journey. Your content reaches people when they’re most receptive to it.
For Substack as a platform, this shift enables faster onboarding, richer discovery, and a feed experience that’s both more responsive and more serendipitous. New users find their footing quickly. Longtime readers don’t feel trapped by their past behavior. The system balances familiarity and novelty, depth and breadth, in ways that static models simply couldn’t achieve.
We’re far from done. The migration to sequential models is a major step, but there are several more pieces of the puzzle we’re actively working on. Completing the sequential ranking migration is the immediate next priority. Bringing sequential embeddings fully into the ranking layer will complete the pipeline and ensure that every stage of recommendation is working from the same understanding of where you are in your reading journey.
If you’ve read this far and this resonates with you, please take a look at the Substack jobs page and consider joining in the fun.
.png)
  

