Building TikTok-Style Video Feed for 100M Users

2 weeks ago 7

An average person spends more than 2 hours daily consuming video content. We are all hooked on various video streaming platforms such as Instagram reels, Youtube shorts, TikTok, etc.

You can read the free version of the article here.

Video streaming platforms have mastered the art of capturing our attention. They can keep us engaged for hours without us realising it.

But do you know what goes behind the scenes? How are different people shown diverse videos? How is the transition so smooth between the videos? How does the platform scale for millions or billions of users? 🤔

In this article, we will answer these questions by building a video feed platform from the ground up. We will take a system design approach, tackle various challenges and explain the trade-offs.

The article will not only help you prepare for system design interviews but also spark your curiosity about the engineering behind your video feed. With that, let's begin.

Problem statement

We will design a product that lets the users view their feed in the form of videos. Users must be able to interact with the video through likes, shares or comments.

Further, they must be able to do an infinite scroll that most of the video apps provide. The product should recommend real-time based on the user's engagement.

We would leverage a ML/AI-based service, but the article would treat it as a black-box. For the scope of this article, we will exclude the internals of ML/AI for feed generation.

Requirements

At a high-level, our product would consist of:

  1. Client — Mobile devices, browsers to view the user's video feed. The client device would also gather the user's engagement and send it to the backend servers.
  2. Server — The server would be responsible for intelligent feed generation. It would manage and process the user's engagement with the different videos.

Now that you understand the problem, let's formulate the problem statement in terms of the functional and non-functional requirements.

Functional Requirements (FRs)

  1. FR-1 — The system provides an interface for the clients to fetch the video feed.
  2. FR-2 — The system must allow the users to scroll through the different videos in their feed. (similar to Instagram reels, Youtube shorts)
  3. FR-3 — It must gather the different user signals such as watch time, likes, comments, shares, etc.

Let's now define the non-functional requirements in the context of each functional requirement.

Non-functional Requirements (NFRs)

  1. NFR-1 — a) The feed generation must be performant and render within 500 ms. b) It must prioritise the fresh content to keep the users engaged.
  2. NFR-2a) Videos should be rendered quickly while scrolling. b) The transition must be seamless while scrolling(no buffering)
  3. NFR-3 a) Gathering signals must be done in a cost-efficient manner. b) The feed must be updated real-time based on the signals collected.

While designing social feeds, remember that different users have different usage patterns. For example: One person might watch videos for 4–5 hours daily, while another may not watch at all.

Such constraints pose significant challenges and influence key design decisions.

Now, that you have a clear understanding of the functional and non-functional requirements of the system, let's identify the key actors in the system.

Entities

User

The user is the central entity of the design. Every user would be served a custom feed. The feed would be influenced by factors such as platform engagement, watched videos, etc.

Video

The videos would be uploaded by the creators. The platform would store the video metadata along with the actual videos. (We will not dive into how system would handle and process uploaded videos)

Interactions

This includes metrics such as the video watch time, whether the user liked the video, count of video shares, comments, etc.

The diagram below shows the entities and the interactions between them.

None

Entities

Based on the above, the following diagram lists basic data model for the different entities.

None

Data Model for the different entities

Let's now establish the communication interface between the client and the server with APIs.

APIs

Let's iterate through our functional requirements and define the APIs.

FR-1: Fetch video feed

Query parameters:

  • limit (optional): Number of videos to fetch (default: 10).

Video feed API response:

FR-2: Scroll Through Videos

Would we fetch a new video every time we scroll? or would we cache it? 🤔

Let's avoid the confusion for now and assume we have an API endpoint for it. We will revisit this during the course of the design.

In interviews, it's at times difficult to quickly decide between brute force and optimized approach (similar to API vs cache in this case). In such cases, always go with the brute force one and let the interviewer know that you would revisit it in the future.

Query parameters:

  • current_video : The ID of the video the user just watched or is currently watching.
  • direction : next or previous (default: next) — supports backward scroll too.

Response

The API would return a single video object (similar FR-1's response).

FR-3: Gather User Engagement

Request body:

Response:

Now, we have good understanding of the system and the APIs. So, let's move on to the high-level system design.

High-level Design

Our system will comprise the following three building blocks:-

  1. Client — This represents the mobile devices, browser and different surfaces where the users would view and interact with the videos.
  2. Video feed service — This service would process the user input, gather data from multiple sources, perform privacy checks and render the feed. It would also collect the engagement data from the users.
  3. Feed ranking service — The ranking service would be responsible for running the different ML models, and generating the user's video feed. It would also train the models real-time, tune their parameters to ensure feed freshness.

The below diagram captures our high-level design. It shows the different components along with their APIs.

None

High-level architecture

Do you think the above architecture would scale for millions of users?

The first high-level design is always basic and doesn't meet your design objectives. Whether it's interview or real-world, you must iterate and refine the design.

So, let's dive deep into each of the aspects and identify opportunities for optimization and improvements.

Bottlenecks, Improvements and Optimizations

The high-level design meets your functional requirements. But it may not meet all your non-functional requirements.

So, let's go through each non-functional requirement and evaluate whether the design satisfies them or not.

Video scroll (NFR-2)

We had defined the following two constraints:-

  • Quick rendering during scroll.
  • Seamless transition between scroll.

In our design, we fetch a video every time the user scrolls. Also, video files wouldn't be returned from the APIs. They would have to be fetched from a system like S3 or a blob storage.

Further, user's network is unpredictable and may slow down intermittently. In case a high resolution video is rendered, it would lead to bufferring.

So, our design doesn't meet our goals. How do we tackle these challenges? Think for a moment, before reading on.

Pagination and Caching

We introduce the following optimizations in the APIs:

  1. Pagination: The server would fetch a fixed number of videos and return it back to the clients as feed.
  2. Caching: Clients would fetch and cache the video feed. As the user scrolls, the videos would be served from the cache.

Here's what the updated architecture would look like.

None

High-level design with pagination and caching

Initially, we had designed an API end-point for the scroll feature. But, now since each scroll is served from the cache, we can eliminate the scroll API and rely only on the feed API.

To improve the user experience, the feed API would be called:

  1. App startup: Everytime the user starts the app, an async request would be sent to fetch the feed.
  2. Cache exhaustion: Once the user watches a video, it would be removed or marked as watched in the cache. If the count of unwatched videos falls below a number (3–4), it would fetch the videos from the Video Feed Service.

Video rendering

The videos would be served from a store like S3. The system will leverage CDN (Content Delivery Network) like AWS CloudFront to optimize the video fetching process.

Further, it would store videos of different resolutions and break the videos into segments of equal length. The client would pre-fetch and download all the initial segments of the videos.

Depending on the network speed, the clients would adapt and intelligently fetch videos of appropriate resolutions (144p, 240p, 480p, etc).

None

Optimizations in video rendering

Smart readers would have noticed that there's a heavy dependency on the client. The client now has a cache, collects the user engagements, etc.

Many engineers (especially backend ones) often treat client as a black-box. This is often a mistake in system design interviews.

The clients must be treated as an integral part of the system. Client-side limitations such as memory, storage, etc influence critical backend decisions such as API response limit.

With these improvements, our design would meet the NFR-2.

Let's now identify other areas where our system would need to be refined.

Video freshness (NFR-1)

While our system is performant to fetch the video feed in 500 ms, but would the feed be fresh and relevant?

As discussed before, usage patterns vary. We might have a few users watching for few minutes, while others would watch for hours.

What do you think would be the bottleneck here? Think for a moment and then read further

Currently, the pagination API returns fixed number of videos to the client. If we increase the number of videos, it benefits the active users while engaged users might see stale feed.

Similarly, decreasing the number of videos, would help the less engaged users but would increase the backend calls for active users.

To solve this, the count of videos could be a function of user's activity on the app. Less active users would have fewer videos while active users would get more videos. This technique is known as dynamic pagination where the page size is determined at the run-time.

Additionally, the cached videos in the server-side cache also could become stale. Hence, we can remove the dependency on the server-side cache and fetch the feed directly from the Feed Ranking Service. This would also reduce the cost and maintenance overhead of the server-side cache.

The technique strikes the right balance between compute costs and feed freshness.

None

Dynamic pagination — Video size is determined based on user's engagement

While the above technique helps to keep the feed fresh, how does the Feed Ranking Service receive the user signals? Should those be sent with every scroll or once in a while.

Let's understand how we can tackle this problem.

Collecting user engagement/signals (NFR-3)

We have the following two approaches to gather the user engagement:-

  1. Send engagement on every scroll.
  2. Batch and send engagement after 5–10 scrolls.

The below table demonstrates the pros/cons of each approach.

None

User Engagement Transmission Trade-off Table

As seen from the above table, both the approaches have their own pros/cons. So, which method is the most suitable for our use case?

We get the best of both the worlds by combining the two solutions. Here's how hybrid batching would work:-

  1. Buffer 3–5 interactions, and flush on scroll pause, timeout, or app goes in the background or closes.
  2. Use local persistence to avoid permanent data loss.

Hybrid batching allows near-real-time personalization while optimizing resource use.

Conclusion

Designing a scalable video feed platform requires careful consideration of multiple dimensions — performance, user experience, data processing, and infrastructure management.

In this article, we learnt how to build a video app from the ground up. We started with the most naive solution, and then iterated and improved the solution focusing on technical efficiency and user engagement.

We addressed three main challenges of video feed platforms through following strategic technical choices:

  1. Handling diverse user engagement patterns — Use of dynamic pagination that adjusts content delivery based on the user engagement.
  2. Feed freshness and relevance — Batching approach for collecting user signals that balances real-time personalization with optimal resource usage.
  3. Seamless user experience — Client-side optimizations such as pre-fetching, adaptive resolution selection and segment-based video loading.

The patterns used in this design extend beyond video platforms. These fundamental techniques apply to many internet products ranging from e-commerce products to dating apps.

The next time you find yourself scrolling through a video feed, you will appreciate the architecture that powers seamless user experience.

What other high-scale systems would you like to see designed using similar principles? Share your thoughts in the comments below.

Before you go:

  • 👏 for the story and follow me for more such articles
  • Subscribe to my free engineering newsletter here
  • 🔔 Follow me: LinkedIn, Twitter, Medium
Read Entire Article