Performance Killers in Axum, Tokio, Diesel, WebRTC, and Reqwest

1 month ago 5

This month, I have been hunting mysterious performance issues in the AutoExplore stream functionality, and what I found surprised me.

I’m writing this article to share my findings with the community to make software better for everyone.

It all started innocently enough…

The Setup

I had built a custom screencasting pipeline:
Chromium WebRTC (video producer) → Rust WebRTC fan-out server → WebRTC browser (viewer).

It worked beautifully.
Compared to Chromium’s default image-based streaming, latency dropped from seconds to milliseconds.
Perfect, right?

Well… almost.

The First Symptom: Random Black Screens

Everything looked smooth until the WebRTC stream started disconnecting randomly.
To users, this appeared as an occasional black screen.

Digging into the issue, I discovered the WebRTC protocol has a “NACK” / PLI feedback mechanism. The idea is simple:
When a client loses a keyframe, it sends a Picture Loss Indication (PLI) request to the producer, asking for a new one.

I added PLI support end-to-end viewer → server → producer and it helped.
The black screen now recovered automatically.

But I wasn’t satisfied. I wanted to eliminate the black screens entirely. It’s not a good user experience if there are black screens in the middle of a stream.

Scaling Up… and Breaking Again

As I increased the number of viewers, the problem returned, worse than before.
This time, even PLIs couldn’t recover the stream.

The chromium producer logs revealed this line:

Timeout: No RTCP RR received.

That led me to Chromium’s source https://source.chromium.org/chromium/chromium/src/+/refs/tags/142.0.7393.8:third_party/webrtc/modules/rtp_rtcp/source/rtcp_receiver.cc;l=294-300, where I learned that RTCP Receiver Reports (RR) are crucial feedback packets that inform the sender what data has been received.

At first, I blamed my custom Chromium producer, maybe it wasn’t reading these packets properly.
But after some tests, I realized the real issue was on the Rust server side.

The Culprit: `tokio::time::interval`

The Rust WebRTC receiver was missing ticks and sending Receiver-Reports (RR) too late, in bursts.
That behavior came from tokio::time::interval, which defaults to MissedTickBehavior::Burst.

After switching to Skip, the bursts disappeared.
I sent a PR to webrtc-rs to patch it. 🎉 https://github.com/webrtc-rs/webrtc/pull/745

Progress! But still, under load, black screens persisted.

Deep Dive: Network Layer & Cryptography

I started by optimizing the WebRTC layer:

- Switched to 1-to-1 UDP4 connections where possible
- Avoided unnecessary STUN/ICE servers
- Predefined codec negotiations

Still, no luck.

So I turned to profiling using perf + hotspot and valgrind + kcachegrind.
A huge chunk of time was being spent on HTTP/2 handshakes.

well… it was sort of expected.

Even though this traffic was all internal (Azure VNET), it was encrypted due to zero-trust policies.
After analyzing traces with GPT-5, it suggested:
👉 Try Elliptic Curve (EC) certificates instead of RSA.

I replaced RSA-2048 with EC P-256 — and boom, 50% faster handshakes. Thanks, GPT-5. 😄 (Elliptic Curve certificates are faster than RSA certificates but does not support old clients)

But sadly, the black screens persisted under load.

Mystery of the Slow Reqwest Client

Profiling again showed the reqwest client eating CPU cycles — even though it wasn’t part of the streaming logic!

Turned out, I was creating a new client in few places to handle custom certificates authentication and trust.
After reading the Reqwest source code, I realized that Client is just an `Arc<ClientRef>` internally, cheap to clone, expensive to create.

I refactored to use a single shared client, cloned wherever needed.

Result: CPU spikes were gone, handshakes were halved, and the app felt faster.
But… black screens? Still there.

The Plot Twist: It Wasn’t WebRTC at All

At this point, I suspected Azure networking. Maybe packet loss? Maybe VM throttling?
To isolate it, I ran everything offline locally, and it worked flawlessly…
Zero delay. Zero black screens. Even with many viewers.

Then, one late night, out of pure frustration, I started hammering F5 to refresh the browser UI while watching the stream playing perfectly.
And suddenly — BOOM! Black screens appeared locally.

That’s when I caught it in the profiler.

Everything looked familiar — CPU spikes, TLS handshakes, encryption…
But one new name stood out:
diesel::pg::connection::result::PgResult::get

Wait — the database?

The True Villain Revealed

The streaming logic barely touched the database after startup.
But I realized that whenever the DB was busy auto vacuuming, analyzing, or under load — everything else slowed down.

AutoExplore is a write-heavy system; AI agents are constantly generating data.
And Diesel’s blocking database queries were freezing the Tokio worker threads.

That single mistake caused a domino effect:

1. Diesel blocked a worker thread.
2. Axum server couldn’t serve new requests
3. Agent Reqwest client tried new handshakes
4. More encryption, more CPU time spent
5. More database connections waiting
6. Tokio’s pool jammed → missed WebRTC ticks → black screen occurred

All from synchronous DB calls inside an async runtime.

The Fix: diesel_async

I switched to diesel_async and refactored all DB queries and connection handling and it changed everything.

Now, when a query stalls, only that task waits.
The server stays fully responsive.
No more black screens. No more lag. No more mystery.

Results

My custom WebRTC solution now streams smoothly — with millisecond latency — no hiccups, even under a heavy load.

And the side effect?
The whole server became twice as fast.

TL;DR — Lessons Learned

✅ Use one shared Reqwest client — it’s cheap to clone, expensive to create
✅ Prefer EC certificates (e.g., P-256) for faster internal HTTPS/TLS
✅ Never block Tokio threads — use diesel_async or equivalent for async DB access

What started as a simple black screen bug turned into a deep dive across the Rust async stack —
from WebRTC internals to cryptography, HTTP handshakes, and database drivers.

Sometimes, fixing performance killers feels like solving a detective mystery, one trace at a time. 😄

Read Entire Article