"Everyone complains about Datadog but no one leaves"

3 hours ago 2

Over the last few weeks, I've been hearing a bunch of founders and senior infra engineers through our network, Rappo. One recurring theme: everyone complains about Datadog… but no one leaves.

Here’s what stood out:

  • Pricing unpredictability: dynamic host-based APM billing, custom metrics cardinality, and log ingestion cost spikes.

  • Migration inertia: dashboards, alert configs, integrations are too tightly coupled. Some estimate a full switch would take 3–4 sprints minimum.

  • Tooling comfort: engineers know Datadog; it “just works” during incidents.

  • Downsampling + log filtering at source (via OpenTelemetry collectors or vector)

  • Host affinity hacks (fewer hosts with more services to reduce APM charges)

  • Sending logs to S3/ClickHouse for post-hoc queries, avoiding Datadog indexing

  • It's the "default": hiring new engineers is easier when your stack uses tools they’ve seen before.

  • Alert fatigue mitigation: Datadog has a lower incident-day cognitive load for most teams.

Some folks are testing newer players (Chronosphere, HyperDX, SigNoz), but most still keep a Datadog safety net.

What’s your team’s strategy? Stick with Datadog and optimize? Full migration to OSS? Or hybrid via telemetry pipelines?

Read Entire Article