Design Tradeoffs at the Edge

1 hour ago 2

Layer 7 reverse proxies gain their strength from a deep understanding of HTTP. With that knowledge, they can route based on paths, enforce rate limits, authenticate users, block abuse, cache content, and compress responses. Developers often take advantage of this flexibility, and in the short term it works—systems appear simpler, state is reduced, and the user experience improves. But parsing HTTP at the edge is rarely straightforward. Browsers and user agents interpret the protocol differently, backend servers have their own assumptions, and the proxy must reconcile them all. When interpretations diverge, requests may be dropped or handled inconsistently. At scale, these small mismatches grow into outages or vulnerabilities, and attackers are usually the first to notice.

Challenges in Practice

Oversized URLs are a common challenge. Oversized URLs became a recurring issue in our partner-facing APIs. Batch operations let thousands of campaigns or profiles be fetched in a single call, and additional parameters to customize the result ballooned URLs to tens of kilobytes beyond proxy defaults. Raising limits kept them working but at a steep cost—higher parsing overhead, more memory use, and lower efficiency. What started as a convenience ended up complicating capacity planning and load balancing.

Cookies introduce similar problems. By embedding state in the browser, developers offload complexity from their servers. But cookies spread quickly—between domains, across teams, and even into logs. They leak information, bypass security checks, and present wildly inconsistent behavior across client libraries. Worse, once developers treat cookies as harmless key-value stores, they start using them as caches for sensitive data, creating risks that are difficult to contain.

Headers also become a source of trouble. User-Agent strings look like reliable signals until a regex parser chokes on a crafted input. The X-Forwarded-For header is trusted for hop validation, yet trivial to spoof. In both cases, assumptions about ‘routine’ inputs open easy paths for attackers..

Even when security is the goal, business needs can erode it. Sites often require authentication, yet want visibility in search engines (SEO). To accommodate crawlers, access checks are relaxed based on weak identifiers like a “Googlebot” header. Inevitably, attackers adopt the same identifiers, and the proxy ends up maintaining complex, brittle bot-detection logic to recover from the shortcut.

All of these issues stem from a familiar set of assumptions:

This is for internal use only, no one else will use/abuse it.
Following the spec is enough, so guardrails aren’t necessary
A little extra parsing cost is fine if it makes the client’s job easier.

At scale, none of these assumptions hold, and design shortcuts turn into systemic risks

What We Learned the Hard Way

Never trust input. Be fanatically defensive and sanitize everything. We learned that even a bad HTTP version string like 0.5, which isn’t valid, could trigger crashes in parts of the stack. Trusting a header to mark “safe” upstreams once allowed apps in Azure to be misclassified as internal requests. Failing to reset fields like X-Forwarded-For or UUIDs let bogus values pollute logs and break correlation. The lesson was clear: always reset request metadata at the edge, and only pass through what the backend explicitly needs. If cookies influence policy, encode them consistently, check for tampering, and cross-validate with other signals. Routine fields often become the easiest vectors to exploit.

Fix outliers, don’t excuse them. Some of the most expensive problems came not from the general workload but from a handful of services with unusual patterns. Some APIs that embedded entire queries in GET requests created URLs hundreds of kilobytes long. For a while, we raised limits to keep them working, but that only increased memory pressure and parsing cost. Eventually we forced a redesign, and while it was painful for the product team, it cut the proxy’s footprint significantly. Raw GraphQL queries posed a similar challenge: easy for developers, but dangerous for the proxy tier. Replacing them with vetted query IDs preserved the flexibility and removed the risk of unbounded input. Long-running requests were another case. We shifted them to return a queryable ID immediately, with results fetched through a separate call. That redesign limited how long connections stayed open and freed capacity for other traffic. In each case, redesigning the outlier service gave the whole fleet a more stable baseline, while global exceptions only made things worse.

Headers and cookies aren’t just harmful; they’re costly. At scale, parsing and updating them consumed a large share of CPU. Every rewrite or adjustment added overhead. We learned to parse once, reuse results, and update only when absolutely necessary. In practice, header and cookie handling turned out to be one of the most expensive operations in the proxy. The risks were real too. A bad User-Agent regex once caused stack overflows under crafted input. Cookie quirks between iOS and Android WebViews created login loops that only appeared under load. Over time, we stopped treating headers and cookies as routine fields. We began treating them as costly and fragile, something to minimize, cache, or strip whenever possible.

Keep the stack current. HTTP may look stable, but the protocol and its implementations keep evolving. Staying current was not just maintenance—it was a defensive measure. Several major vulnerabilities only became visible after updates, and we avoided them simply by keeping our stack up to date. Examples included the HTTP/2 Rapid Reset Attack, the HTTP/2 Resource Loop, and the HTTP/2 Continuation Flood. In each case, running a current version closed the door on exploits before they became incidents.

Read Entire Article