"Everyone complains about Datadog but no one leaves"

4 months ago 8

Over the last few weeks, I've been hearing a bunch of founders and senior infra engineers through our network, Rappo. One recurring theme: everyone complains about Datadog… but no one leaves.

Here’s what stood out:

Pricing unpredictability: dynamic host-based APM billing, custom metrics cardinality, and log ingestion cost spikes.
Migration inertia: dashboards, alert configs, integrations are too tightly coupled. Some estimate a full switch would take 3–4 sprints minimum.
Tooling comfort: engineers know Datadog; it “just works” during incidents.

Downsampling + log filtering at source (via OpenTelemetry collectors or vector)
Host affinity hacks (fewer hosts with more services to reduce APM charges)
Sending logs to S3/ClickHouse for post-hoc queries, avoiding Datadog indexing

It's the "default": hiring new engineers is easier when your stack uses tools they’ve seen before.
Alert fatigue mitigation: Datadog has a lower incident-day cognitive load for most teams.

Some folks are testing newer players (Chronosphere, HyperDX, SigNoz), but most still keep a Datadog safety net.

What’s your team’s strategy? Stick with Datadog and optimize? Full migration to OSS? Or hybrid via telemetry pipelines?

Read Entire Article