Data That Delivers: Real-Time Insights for Brand Success in Quick Commerce

4 months ago 4

Brands are a core part of the Zepto experience. A rich and diverse catalog helps us serve a wide range of customer preferences while giving brands the reach they need across cities. Every day, we process millions of data points, from product views to city-level sales, offering deep visibility into how products perform across regions.

With so much data, the challenge isn’t access, it’s actionability. That’s where Brand Analytics at Zepto comes in. Built to deliver fast, intuitive insights, it helps brands identify trends, optimize performance, and make smarter decisions in real-time. In a fast-moving space like quick commerce, this kind of agility can make all the difference.

“How can we give our partner brands real-time, actionable insights — without making them wait or oversimplifying the data?”

That’s the question we set out to solve when we first launched Brand Analytics — a dashboard designed to empower our brands with rich data like:

Sales trends
Inventory levels
Search Trends and User Conversions
Subcategory-level performance

Like any good startup, we began lean. Our first version of Brand Analytics was built using PostgreSQL — a familiar, reliable choice that helped us launch quickly and start collecting feedback from a small group of early adopter brands.

We had the GMV, unit sales, etc, at a brand, city, and product level. At that point, our core sales table had just a few million rows. PostgreSQL handled it just fine.

Our analytics team uses Databricks to manage and maintain data tables. It combines the flexibility of notebooks with scalable big data processing, making it easy to run complex queries and build efficient pipelines. Databricks also integrates with Delta Lake, an optimized storage layer that powers lakehouse architecture.

Our initial architecture was simple: multiple tables in Databricks (orders, impressions, product info, etc.) were joined to create a unified Sales table, following a star schema-like structure (reference). A daily cron job synced this data to Postgres. The goal was the speed of delivery — we prioritized a quick proof of concept to gather brand feedback, and daily syncs were enough for our early needs.

Directly powering the UI from Databricks wasn’t an option — response times were too high for an end-user experience, and concurrency was limited. Postgres gave us the responsiveness we needed, even if only as a temporary solution.

In a short amount of time, things changed dramatically:

More brands joined our platform
Adoption skyrocketed as partners relied on the analytics dashboard for business decisions
Product catalogs exploded with more SKUs and categories

We were sitting on 200+ million rows of transactional data — and counting. Our good old friend Postgres started struggling.

Designed primarily for OLTP workloads, Postgres started struggling with the kind of complex, large-scale analytical queries our brand partners needed to run.

It was clear we had to shift gears — we needed a purpose-built OLAP system designed for fast, analytical queries over massive datasets.

We knew we had to evolve. So we drew up a list of non-negotiables:

Support for complex joins without forcing us to denormalize everything
Sub-second response times for external user-facing dashboards
Seamless integration with Kafka and Databricks (our upstream data pipelines)

We benchmarked a few popular OLAP databases. Each had its strengths:

ClickHouse is fast but can struggle with joins
Apache Pinot excels at low-latency ingestion but had limitations in complex queries for our use case
StarRocks? It ticked all the boxes — and more

Here’s what made StarRocks stand out:

Lightning-fast joins
We were impressed with StarRocks’ ability to handle joins efficiently. Their approach to join optimization made it a clear winner for our workloads. Here’s a great deep dive into this.
P99 under 500ms on 300M+ rows
We ran extensive benchmarks simulating our current use cases. StarRocks consistently delivered sub-second performance, even at scale.
Native Kafka + S3 (Parquet) ingestion
StarRocks made it easy to plug into our existing data ecosystem. Their Routine Load for Kafka and S3/Parquet support helped us go from prototype to production with minimal, easy-to-implement changes in our pipeline.

Choosing StarRocks was just the first step. To truly unlock its power, we had to make a few critical architectural decisions.

StarRocks supports two primary storage architectures:

Shared-Nothing: StarRocks stores its own data locally
Shared-Data: StarRocks queries data directly from object storage (like S3)

📖 Architecture Deep Dive

After evaluating both, we went ahead with the shared-nothing architecture, where StarRocks owns and manages the storage locally.

Performance is critical — Since our analytics dashboard is exposed to external brands, latency was a key factor. Local data meant faster query responses.
Our data volume is still manageable — With under Tens of TBs of data, we didn’t see any storage or scaling challenges using local storage.

If you’re dealing with petabytes of data and are willing to trade off some latency for elasticity and separation of compute/storage, the shared-data architecture might be a better fit.

To power our analytics dashboards, we needed to move data from multiple sources into StarRocks. Our ingestion strategy evolved alongside our product maturity, and today we use two main ingestion pipelines:

When we started, our data flow was relatively simple. We had a daily batch sync from our source Databricks table to Postgres

StarRocks made it extremely easy to ingest this data using Pipe Load, which continuously scans a specified S3 path and automatically loads any new files. For context, Databricks stores its data as Parquet files in folders on S3.

Here’s what a typical CREATE PIPE command looks like:

CREATE PIPE <pipe_name>
PROPERTIES (
"AUTO_INGEST" = "TRUE"
) AS
INSERT INTO <tablename>
SELECT * FROM FILES (
"path" = "s3://<bucket-name>/<folder-name>/*.parquet",
"format" = "parquet",
"aws.s3.region" = "<region>",
"aws.s3.access_key" = "<>",
"aws.s3.secret_key" = "<>"
);

💡 Pro Tips for Using PIPEs:

Set “AUTO_INGEST” = “TRUE” for continuous syncing as new Parquet files land in S3
Deletes are not synced: use a soft delete pattern (is_deleted flag) for removals
Choose your primary keys wisely — StarRocks performs upserts on primary key tables

To provide some context about the scale, our primary sales table contains over 300 million rows, and most queries involve joins across 2 to 3 tables.

More here: Pipe Load Docs

As our product matured, so did user expectations.

We realized that daily batch updates weren’t enough. Brands wanted real-time insights — for example, viewing today’s sales and impression data up to the current time to make timely decisions and respond faster to market trends.

This is where StarRocks truly shined.

By using Routine Load, we integrated directly with our Kafka streams — pushing updates into StarRocks in near real-time.

Why Routine Load Made a Difference

Exactly-once ingestion guarantee — ensures clean, consistent analytics
Minimal config — easy to get started with Kafka topics
Native support — no need for third-party connectors or pipelines

Routine Load lets us stream changes from Kafka into StarRocks, enabling dashboards that are nearly real-time without compromising performance.

We currently ingest over 30 million rows per day into a single StarRocks table using Kafka Routine Load — and that’s just one of several high-volume tables, with more on the way.

To deliver fast, reliable insights to our brand partners, we built a streaming data pipeline. Here’s how it all comes together:

Step 1: Event Ingestion
It all begins with data — and lots of it. We ingest over 60,000 events per second from multiple Kafka topics. The impressions topic leads the pack in volume by far, followed by product and order delivery events.

Our platform team has developed a highly scalable platform used across the entire company to power the events pipeline. To learn more about some amazing work by our platform team, check out this blog post: Voyager

Step 2: Real-Time Processing with Apache Flink
Next, Apache Flink takes over to process these events in real-time.

It filters out unnecessary columns
It aggregates metrics over an x-minute window (currently 5 mins) for optimized storage and querying
These aggregated events are then written to a destination Kafka topic for Starrocks to consume

Step 3: Ingestion into StarRocks
The processed event streams flow into StarRocks through the routine load mechanism mentioned above. Within seconds, this data becomes available for querying — no lag, no batch jobs, just real-time readiness.

Step 4: Instant Insights for Brands
With fresh data in StarRocks, our Brand Analytics Service powers lightning-fast queries across SKUs, brands, and regions. Metrics update in near real-time, enabling our partners to make smarter decisions, respond to trends faster, and stay ahead of the curve.

📚 Documentation: Routine Load with Kafka

✨ Impact:
Thanks to StarRocks, Kafka and Flink we transitioned from daily insights to near real-time brand analytics — allowing our partners to react faster, plan better, and grow smarter.

Building a robust and scalable brand analytics platform came with its fair share of challenges — but StarRocks made the journey smoother, faster, and more efficient.

Here’s what we achieved:

Moved from Postgres MVP to a production-grade analytics stack as our data and user base grew
Enabled sub-second query performance for external, user-facing dashboards
Ingested data seamlessly from both Databricks (via S3 Pipes) and Kafka streams (via Routine Load)
Streaming over 30 million rows per day into StarRocks tables
Chose the shared-nothing architecture to keep latency low and performance high

By combining the power of StarRocks with our event pipeline, we’ve gone from daily batch analytics to near real-time insights — empowering our brand partners to make smarter decisions, faster.

In our upcoming posts, we’ll dive deeper into the following:

A benchmarking comparison between StarRocks and other databases, exploring their performance across different workloads.
The key challenges we faced during this process and how we successfully overcame them to improve data workflows and efficiency.

Stay tuned for more insights on optimizing data pipelines and working with large-scale data systems!

A special thanks to Syed Shah, Rajendra Bera, Ashutosh Gupta, Harshit Gupta and Deepak Jain for their tireless efforts and for helping productionize this critical and impactful feature

Read Entire Article