Datadog vs Grafana Cloud vs New Relic — which observability platform?
Select an observability platform given current pricing models, cardinality limits, OpenTelemetry support quality, incident workflows, retention controls, and AI-assisted debugging features.
Blockers
- Dashboard JSON export via API — No bulk historical data export — metrics, logs, traces stored in Datadog are not bulk-exportable; no API for raw telemetry dump — Rating: none
- PromQL remote_read API (metrics), LogQL API (logs), Jaeger-compatible query API (traces), Dashboard JSON API — All signals exportable via open standard APIs — Rating: full
- Historical Data Export via NerdGraph to S3 bucket — Up to 200M data-point rows per export; time range must end 12h+ in past; result files expire after 1 week. Also: Streaming Data Export via Kinesis Firehose using NRQL rules — Rating: partial
- DogStatsD proprietary metric submission; Datadog-specific semantic conventions/extensions added to OTel ingestion; DD-native SDK instrumentation requires rework for vendor-neutral; Intelligent Retention, Metrics without Limits are proprietary controls
- NRQL is proprietary query language (SQL-like but NR-specific); dashboards and alerts built on NRQL require translation if migrating; query and alerting layer proprietary despite first-class OTel ingestion
- Re-instrument and forward to new backend; OTel Collector community Datadog receiver as migration path — Cannot backfill historical data from Datadog; DD receiver coverage not feature-complete; requires switching from DD-native SDKs to OTel — Pain: high
- Re-instrument and dual-ship via OTel Collector — Cannot backfill historical data; DD-specific semantic conventions need rework — Pain: high
- Export via open APIs (PromQL remote_read, LogQL, Jaeger API) + self-host open-source stack — All protocols are open standards; no re-instrumentation needed — Pain: low
Who this is for
- high-scale
- low-ops
- cost-sensitive
- enterprise
- small-team
- microservices
- real-time
- compliance
Candidates
Datadog
Datadog is the most feature-dense managed option here, with mature APM, log/trace correlation, incident response, and AI features in one SaaS. As of 2026-03-14, its official pricing is still strongly SKU-based: the pricing page lists APM per host, APM ingestion per GB, indexed spans priced separately by retention window, and Incident Management as a separate seat-based SKU.
When to choose
Best for enterprise + low-ops or real-time + microservices environments where fast rollout, integrated incident handling, and deep Datadog-native workflows matter more than backend portability. It is especially strong when you want built-in incident declaration from monitors, rich third-party incident integrations, and AI-assisted alert investigation without assembling multiple products.
Tradeoffs
The platform breadth is excellent, and Datadog explicitly recommends its DDOT Collector path for OpenTelemetry while still supporting standalone OTel Collector and OTLP-based options. Cost modeling is the hardest part: custom metrics, APM hosts, ingestion, indexed spans, and adjacent products can all move the bill. Datadog also steers you toward Datadog-specific controls such as Intelligent Retention, custom retention filters, and Metrics without Limits. Migration/reversibility: metrics, logs, and traces stored in Datadog are not bulk-exportable; there is no API for historical data dump of raw telemetry. Dashboards are exportable as JSON via API. DogStatsD is a proprietary metric submission protocol, and while Datadog supports OpenTelemetry ingestion it adds Datadog-specific semantic conventions and extensions, meaning instrumentation built on DD-native SDKs requires rework to become vendor-neutral. The OTel Collector community Datadog receiver exists as a migration path out, but coverage is not feature-complete. SLA: Datadog guarantees 99.8% monthly availability; if availability falls below 99.8% for two consecutive months, customers may terminate without penalty, but no service credits are issued. Support: Standard support (included) has <2h critical response 24x7 and <48h general response during business hours; Premier support (8% of monthly spend, $2,000/mo minimum, 1-year commitment) upgrades critical response to <30 min 24x7, general to <12h, and adds 5 days/year of elevated support with named engineers.
Cautions
Watch cardinality and indexing closely. Datadog docs state custom metric submission has no enforced fixed rate limit, but you are billed if you exceed your allotment; Metrics without Limits requires careful tag allowlisting and exposes a cardinality estimator before saving. For traces, ingestion and indexing are separate concerns: custom retention filters can change indexed-span usage, and Datadog's intelligent retention keeps representative traces automatically. Verify Bits AI availability for your Datadog site before standardizing, because the docs note site support can differ. Custom metric cardinality: Datadog does not enforce a hard cardinality cap that drops data; instead, high-cardinality metrics are billed at the observed volume. Metrics without Limits lets you restrict queryable tag combinations post-ingestion, but the raw ingestion still counts toward billing. The Metrics Tags Cardinality Explorer surfaces high-cardinality tags and Datadog recommends proactive allowlisting to prevent bill surprises. Operational risk: because there is no bulk historical export path, migrating away requires re-instrumenting and forwarding to the new backend going forward; you cannot backfill historical data from Datadog into a successor platform.
Sources
- www.datadoghq.com/pricing/list/
- docs.datadoghq.com/opentelemetry/setup/
- docs.datadoghq.com/integrations/otel/
- docs.datadoghq.com/incident_response/incident_management/
- docs.datadoghq.com/tracing/trace_pipeline/trace_retention/
- docs.datadoghq.com/metrics/metrics-without-limits/
- docs.datadoghq.com/metrics/custom_metrics/
- docs.datadoghq.com/bits_ai/
Grafana Cloud
Grafana Cloud is the strongest fit for teams that want an OpenTelemetry-first managed stack with clearer cost controls and less lock-in than a fully vendor-native platform. As of 2026-03-14, official pricing is transparent but split by product: metrics are billed by active series, logs/traces/profiles by GB, IRM by active user, Assistant by active user, and Application Observability now has a host-hour model that changed for new customers on February 13, 2026.
When to choose
Best for cost-sensitive + microservices or high-scale + compliance environments where OpenTelemetry and Prometheus compatibility matter, and where you want to keep collector architecture, sampling, and data-routing choices relatively open. It is also a strong fit for serverless + small-team setups because the Application Observability docs state serverless billing is telemetry-based instead of host-hour based.
Tradeoffs
Grafana Labs explicitly recommends Grafana Alloy or OpenTelemetry Collector patterns, and its OTLP endpoint is documented as the recommended OTLP target for metrics, logs, and traces. Cost governance is better surfaced than in many competitors through cardinality dashboards, Adaptive Metrics, Adaptive Logs, and Adaptive Traces. The tradeoff is that some higher-level workflows are separate products or add-ons, and retention is not one simple global knob across all signals. Migration/reversibility: Prometheus metrics are exportable via PromQL remote_read API. Loki logs are queryable and exportable via the LogQL API. Tempo traces are accessible via the Jaeger-compatible query API. Grafana dashboards export as JSON via API. Lock-in is minimal: the entire stack uses open protocols (Prometheus remote_write, OTLP, Loki push API, Tempo/Jaeger API). All backend components (Prometheus/Mimir, Loki, Tempo, Grafana) are open-source and self-hostable, giving a concrete ejection path to self-managed infrastructure without re-instrumentation. SLA: Grafana Cloud guarantees 99.5% monthly availability for paid plans (Pro and above). Service credits are tiered: 10% credit for 99.0-99.5%, 20% for 98.0-99.0%, 50% for 97.0-98.0%, and 100% credit below 97.0%. Credits must be claimed within 10 days. Free tier has no SLA. The SLA excludes planned maintenance (24h notice required) and usage spikes exceeding 300% of baseline. Support: Pro tier includes 8x5 email support; Enterprise tier adds premium support with custom SLAs and faster response times.
Cautions
Be precise about the pricing model on your account. The Application Observability docs say new customers pay $0.025 per host hour plus separate telemetry charges, while existing customers from before February 13, 2026 remain on $0.04 per host hour with included telemetry credits. Trace retention is configurable per stack, but the invoice docs say the minimum retention period is 30 days and extra retention is purchased in 30-day increments. Grafana Assistant is not enabled by default; an administrator must accept terms first, and the docs say no data is sent to an AI provider until a user actively uses Assistant features.
Sources
- grafana.com/pricing/
- grafana.com/docs/grafana-cloud/monitor-applications/application-observability/pricing/
- grafana.com/docs/grafana-cloud/alerting-and-irm/irm/get-started/
- grafana.com/docs/grafana-cloud/machine-learning/assistant/get-started/
- grafana.com/docs/grafana-cloud/cost-management-and-billing/analyze-costs/metrics-costs/prometheus-metrics-costs/cardinality-management/
- grafana.com/docs/grafana-cloud/cost-management-and-billing/manage-invoices/understand-your-invoice/traces-invoice/
- grafana.com/docs/grafana-cloud/send-data/traces/configure/sampling/
- grafana.com/docs/grafana-cloud/monitor-applications/application-observability/otlp-gateway/
New Relic
New Relic is the clearest alternative if you want usage-based full-stack observability without host counting. As of 2026-03-14, its official pricing is based on users plus data ingest or compute plus data ingest, with 100 GB per month free, original data at $0.40 per GB beyond the free tier, Data Plus at $0.60 per GB, optional EU storage at $0.05 per GB, and no separate host charges.
When to choose
Best for small-team + low-ops or enterprise + cost-sensitive environments where unlimited hosts, containers, functions, and Fargate tasks matter more than host-based APM packaging. It is a good fit when you want broad platform access, strong OTLP support, and AI-assisted troubleshooting, but you prefer to reason about ingest and user access instead of per-host observability SKUs.
Tradeoffs
New Relic states it aims for first-class OpenTelemetry support and recommends evaluating both New Relic instrumentation and OpenTelemetry depending on how much out-of-the-box integration versus flexibility you want. Its incident workflow model is centered on alert events, issue correlation via Incident Intelligence, and workflow-based notification/enrichment rather than the more visibly separate IRM product packaging Grafana uses. AI support is broad across chat, NRQL help, log explanation, error-stack analysis, alert-gap detection, and issue summaries. Migration/reversibility: New Relic offers Historical Data Export via NerdGraph, which supports unlimited query duration and up to 200 million data-point rows per export, returned as JSON files to an S3 bucket. The time range must end at least 12 hours in the past, and result files expire after one week. Streaming Data Export sends data at ingestion time via Kinesis Firehose using custom NRQL rules, suitable for populating external data lakes. NRQL queries are also exportable programmatically via NerdGraph. Lock-in: NRQL is a proprietary query language (SQL-like but New Relic-specific); dashboards and alerts built on NRQL require translation if migrating. OpenTelemetry support is first-class for ingestion, but the query and alerting layer remains proprietary. SLA: New Relic guarantees 99.8% monthly availability. Customers can request availability attainment reports by filing a support ticket. Support tiers are spend-based: Silver ($1-$9,999/yr), Gold ($10K-$99,999/yr), Platinum ($100K+/yr). Enterprise edition includes priority ticket routing, 1-hour critical initial response, and support via forum, case, chat, phone, and Slack. Pro edition has 2-hour critical response without phone or Slack access.
Cautions
High-cardinality metrics are a real operational constraint here, not just a billing smell: New Relic docs say metric cardinality limits are enforced at both the per-metric and account level and are evaluated over each UTC day. Retention is also plan-shaped rather than universally fixed; the pricing docs say the free tier has default retention of at least 8 days, Data Plus adds up to 90 days of extra retention over default levels, and longer retention can require add-ons. New Relic AI became billable under Compute Pricing on June 4, 2025, so teams adopting AI-assisted workflows should model Advanced Compute usage explicitly.
Sources
- newrelic.com/pricing
- docs.newrelic.com/docs/opentelemetry/opentelemetry-introduction/
- docs.newrelic.com/docs/data-apis/ingest-apis/metric-api/nrql-high-cardinality-metrics/
- docs.newrelic.com/docs/alerts/get-notified/alert-event-workflows/
- docs.newrelic.com/docs/alerts/alert-event-management/response-intelligence-ai/
- docs.newrelic.com/docs/agentic-ai/new-relic-ai/
- docs.newrelic.com/whats-new/2025/06/whats-new-06-04-nrai/
Try with your AI agent
$ npm install -g pocketlantern $ pocketlantern init # Restart Claude Code, Cursor, or your MCP client, then ask: # "Datadog vs Grafana Cloud vs New Relic — which observability platform?"