Monitoring Claude Code with OTel / Datadog

3 days ago 2

This week, I noticed a tweet announcing OpenTelemetry (OTel) support in Claude Code - Anthropic’s AI-powered coding assistant that lets you delegate tasks directly from your terminal. This new OTel integration allows you to stream token, session, and command metrics to any backend that speaks the OpenTelemetry Protocol (OTLP). Curiously, this feature hasn’t landed in the official changelog yet. Below, we’ll explore three different approaches to wiring Claude Code metrics into Datadog, the trade-offs between each method, and how to leverage these metrics effectively.

Datadog Dashboard excerpt showing Claude Code metrics

Greg ships the most 🚀🔥

Why instrument Claude Code?

Okay, so first, why might you want to track these metrics? Here are some of the first ideas that came to mind for me:

  • Velocity analytics: Correlate AI adoption (or model changes!) with velocity (e.g. with commits) via the github integration, closed tickets via the Jira integration or deployments
  • Incident correlation: be confident your increased velocity isn’t compromising availability by intersecting usage data with DDOG Incident Management or the Pagerduty integration
  • Cost governance: Alert on anomalous token usage or simply roll it up alongside your cloud costs
  • Because data is cool!

Let’s Get Started

You’ll need a few things before we dive in:

  • Claude Code (duh)
  • A Datadog account
  • Something that can ingest OTel metrics, such as:

Configuring Claude Code

No matter which option you choose, you’ll need to tell Claude Code to emit metrics. You can either set the environment variables (e.g. in .zshrc), via ~/.claude/settings.json or the MDM approach through /Library/Application Support/ClaudeCode/managed-settings.json. Claude Code config docs for more information on these.

SetupOTEL_EXPORTER_OTLP_ENDPOINTOTEL_EXPORTER_OTLP_PROTOCOLOTEL_EXPORTER_OTLP_HEADERS
#1 Local Agenthttp://127.0.0.1:4317grpcNot required
#2 Remote Agent via tunnelhttp://127.0.0.1:14317grpcNot required
#3 Agent-less APIhttps://api.datadoghq.com/api/v1/otlphttp/protobufDD-API-KEY=<YOUR_DATADOG_API_KEY>

Copy these values into your chosen Claude Code config location (~/.claude/settings.json, environment variables, or MDM). For example, assuming you’re going with a local agent

{ "env": { "CLAUDE_CODE_ENABLE_TELEMETRY": "1", "OTEL_METRICS_EXPORTER": "otlp", "OTEL_EXPORTER_OTLP_ENDPOINT": "http://127.0.0.1:4317", "OTEL_EXPORTER_OTLP_PROTOCOL": "grpc", "OTEL_METRIC_EXPORT_INTERVAL": "10000" } }

Next, let’s ensure we have something to listen for the metrics and enumerate some of the available options

Option 1: Local Datadog Agent

If you’re already running a local agent, this is the easiest option. If you aren’t, note that this will count as an additional host as part of your DDOG bill and you’ll need to provision an API key as part of the installation.

  1. Enable OTLP ingest by adding this to your Datadog agent config:
# /opt/homebrew/etc/datadog-agent/datadog.yaml or /etc/datadog-agent/datadog.yaml otlp_config: metrics: enabled: true receiver: protocols: grpc: endpoint: 0.0.0.0:4317

Then restart the agent, for non-osx incantations see agent commands.

brew services restart datadog-agent
  1. Verify your metrics by running a simple command like claude-code run "hello", then check your Datadog Metric Summary and filter for claude_code.*.

Option 2: Remote Agent

If you aren’t already running an agent locally, OTLP metrics are a reasonably poor value proposition for doing so. A remote (shareable) centralized agent is a simple, handy, and cost effective approach.

  1. Configure the remote agent with the same OTLP settings as Option 1.

  2. Set up your SSH tunnel (optional) by adding to your ~/.ssh/config:

Host metrics-gw User yourusername LocalForward 14317 localhost:4317 # Forward local 14317 to remote 4317 ProxyCommand /opt/homebrew/bin/cloudflared access ssh --hostname gw.example.com

I’m using Cloudflare Zero Trust here, which provides a nice secure gateway without having to open ports on your server. If you have a host that’s protected by VPN, you can likely skip the tunnel.

  1. Keep the tunnel alive with autossh - create a launchd plist for macOS:
<?xml version="1.0" encoding="UTF-8"?> <!DOCTYPE plist PUBLIC "-//Apple//DTD PLIST 1.0//EN" "http://www.apple.com/DTDs/PropertyList-1.0.dtd"> <plist version="1.0"> <dict> <key>Label</key> <string>com.user.autossh</string> <key>ProgramArguments</key> <array> <string>/opt/homebrew/bin/autossh</string> <string>-M</string> <string>0</string> <string>-N</string> <string>metrics-gw</string> </array> <key>RunAtLoad</key> <true/> <key>KeepAlive</key> <true/> </dict> </plist>

Save this to ~/Library/LaunchAgents/com.user.autossh.plist and load it with:

launchctl load ~/Library/LaunchAgents/com.user.autossh.plist

Option 3: Agent-less OTLP Intake API

This is the shiniest and lightest-weight approach - no agent to maintain, no tunnels to establish. You talk directly to Datadog’s API. However:

  1. Every dev machine needs its own API key (which lack scoping capabilities)
  2. The OTLP intake endpoint is still in preview as of May 2025, so you must fill out a form for access

Sample configuration values:

{ "env": { "CLAUDE_CODE_ENABLE_TELEMETRY": "1", "OTEL_METRICS_EXPORTER": "otlp", "OTEL_EXPORTER_OTLP_PROTOCOL": "http/protobuf", "OTEL_EXPORTER_OTLP_ENDPOINT": "https://api.datadoghq.com/api/v1/otlp", "OTEL_EXPORTER_OTLP_HEADERS": "DD-API-KEY=YOUR_DATADOG_API_KEY", "OTEL_METRIC_EXPORT_INTERVAL": "10000" } }

Troubleshooting Tips

If you’re not seeing metrics, here are some troubleshooting steps:

  • console output: Enable console outputs by changing OTEL_EXPORTER_OTLP_PROTOCOL to "grpc,console".
    • If you don’t see console output, it’s very likely your JSON is invalid in the config file or you have an old Claude Code
  • Check tcp connections: Run lsof -i :14317 (or :4317 for a local agent) to check that your ssh tunnel or datadog agent are listening on the right port
  • Verify agent config: Run sudo datadog-agent status | grep -A10 OTLP to check if the OTLP receiver is properly configured
  • tcpdump to the rescue: try sudo tcpdump -i any port 14317 -s0 -X -v to monitor traffic
  • catching ddog api errors: if you’re using the agentless approach, here’s a handy script to watch errors from the OTLP intake endpoint

A good output from the debug script looks like this:

Debug script output showing successful connection to Datadog

Successful connection to Datadog OTLP intake

Assuming all is well, you should now see your metrics appearing in the Datadog UI 🎉

Datadog Metrics Summary page showing Claude Code metrics

Metrics successfully appearing in Datadog

Key Metrics

Claude Code emits several useful metrics that you can monitor:

Metric NameTypeDescriptionUnique Tags
claude_code.commit.countCounterNumber of git commits created-
claude_code.cost.usageCounterCost of Claude Code usagemodel
claude_code.lines_of_code.countCounterLines of code added/removedtype[added/removed]
claude_code.pull_request.countCounterNumber of pull requests created-
claude_code.session.countCounterNumber of CLI sessions started-
claude_code.token.usageCounterNumber of tokens consumedtype[cachecreation/cacheread/input/output], model

All metrics include the following common tags:
organization.id • service • session.id • user.account_uuid • user.email • user.id • version • host

Here’s a visual view of what some tag pairs look like

Datadog Metrics Tags interface showing Claude Code metric tags

Claude Code metric tags as shown in the Datadog UI

Mitigating Security Concerns

Having unscoped API keys over hundreds of developer machines presents many challenges with everything from metric integrity to data exfiltration and potential cost catastrophes. Let’s look at some ways to mitigate these without compromising visibility.

Best Practices

First, establish defensive monitoring. Create a monitor on overall metrics usage via datadog.estimated_usage.metrics.custom to catch runaway cardinality at a global level. For a more granular, approach you can monitor individual metrics via datadog.estimated_usage.metrics.custom.by_metric

The sample anomaly monitor below will trigger if any metric hits 3 standard deviations above its weekly baseline and is emitting over 1000 unique time series. When tuned to your organization’s scale, this multi-condition approach prevents alert fatigue while still catching expensive outliers.

Datadog anomaly detection monitor configuration for metrics usage

Anomaly detection monitor for custom metrics usage

Next, get surgical with metrics without limits to specify exactly which tags matter for each metric. Claude Code does have some configurability here via OTEL_METRICS_INCLUDE_SESSION_ID, OTEL_METRICS_INCLUDE_VERSION and OTEL_METRICS_INCLUDE_ACCOUNT_UUID but it’s safer at the ingest level.

Datadog Metrics Without Limits configuration interface

Metrics Without Limits configuration to control which tags are indexed

Vector

Vector is close to offering an elegant solution for OpenTelemetry metrics ingestion. This lightweight, Rust-based open-source tool already acts as a powerful observability pipeline, with existing support for OTLP logs and traces. The missing piece - OTLP metrics support - is currently in review. Once merged, Vector will provide a complete metrics gateway solution. This positions Vector as a particularly efficient option for managing observability data flows across your infrastructure.

Here’s a proposed sample Vector pipeline that should address these concerns shortly:

sources: otlp: type: "opentelemetry" grpc: address: "0.0.0.0:4317" http: address: "0.0.0.0:4318" transforms: only_claude: type: "filter" inputs: ["otlp.metrics"] condition: 'starts_with!(.name, "claude_code.")' limit_tags: type: "tag_cardinality_limit" inputs: ["only_claude"] value_limit: 50 mode: "exact" limit_exceeded_action: "drop_tag" sinks: datadog_metrics: type: "datadog_metrics" inputs: ["limit_tags"] default_api_key: "<YOUR_API_KEY>" site: "us5.datadoghq.com"

Once available, this minimal config will deliver filtering, cardinality control, and centralized API management. Deploy it on a single gateway server to reduce your attack surface by orders of magnitude. Until then, an OpenTelemetry Collector can serve as a temporary bridge.

Future Improvements I’d Like to See

  • Claude Code support for adding custom tags (e.g., to decorate with project, repo, team, etc.)
    • I did notice an undocumented OTEL_RESOURCE_ATTRIBUTES is in the CLI, but I couldn’t make it do anything useful
  • Improved debugging from the Claude Code CLI for telemetry issues (e.g. for invalid json, errors from OTLP endpoints, etc.)
  • Ability to scope API keys in Datadog to mitigate the blast radius of distributing keys to hundreds/thousands of engineers
    • Bonus points if you can also scope what metrics they can emit without Vector
  • Vector OTLP Metrics support soon!

Conclusion

You can’t improve what you can’t measure, and OTel support shows a strong commitment to the enterprise from Anthropic. Datadog’s agentless intake API is perfectly timed to capitalize on the movement towards more fragmented autonomy that will undoubtedly need to be observed.

Here’s a more comprehensive Claude Code metrics dashboard:

Datadog dashboard showing sample graphs of Claude Code metrics

Sample Claude Code Usage dashboard in Datadog

The JSON for easy import of this dashboard is available here.

Anything I missed? Questions, feedback, comments? Hit me up on X!

Thanks to Greg, Craig, & Kousha for sending over some metrics to my sandbox datadog instance for prettier graphs :)

Read Entire Article