19 top distributed tracing tools to know about
Distributed tracing tools help development and DevOps teams monitor microservices applications and resolve performance issues. Here are details on 19 notable tools.
All distributed tracing tools do the same basic thing: monitor user requests as they flow through different services and components in a distributed system, such as a microservices-based application. If performance problems arise, the tools help development and IT operations teams pinpoint bottlenecks and identify which service is slowing down the overall process.
But how different tools function varies. There are also distinctions between them in areas such as pricing, deployment models and integrations with other types of tools. Furthermore, dozens of distributed tracing tools are available, which makes choosing the right one even tougher.
To help organizations navigate such a crowded technology market, this article discusses 19 prominent distributed tracing tools. It outlines their key features and capabilities and offers tips on when it does or doesn't make sense to use them.
How distributed tracing works
First, let's look more deeply at what these tools do, how they do it and why it's important. Distributed tracing measures how long each part of a system takes to process its portion of the request being traced. Applications often involve numerous services -- dozens or even more. The data generated by tracing streamlines efforts to determine the cause of a performance issue among the various services.
This article is part of
What is APM? Application performance monitoring guide
Distributed tracing tools work in a relatively straightforward way that includes the following steps:
- A tool issues a request to an application -- for example, by inputting data and asking the application to return a specific result. Alternatively, it could monitor a request from a real end user.
- The tool assigns a unique identifier, or UID, to the request. The trace ID, as it's known, enables the tool to track the request as it flows through the application.
- Using monitoring code instrumented in the application, the tool measures the processing time of all the services the request hits. This data is collected in real time, and each segment is called a span. A root span represents the entire request, and child spans are nested within it for individual actions.
- When the request is completed, the tool compiles the data from each span to create a full trace.
Tracing tools often present trace data in visualizations, such as bar charts or waterfall diagrams that display each span and its duration. However, traces can also be presented in textual form by listing the service names and their processing times.
Distributed traces are one type of telemetry data that development or DevOps teams can use to assess application health and performance. The other two primary types are application performance metrics -- which provide data on average response times, error rates, uptime and more -- and logs, which record user logins, error messages and other events. Metrics, logs and traces are commonly seen as the three pillars of observability initiatives in IT environments. However, distributed traces are especially valuable because they provide deep visibility into what's happening inside an application.
Distributed tracing tools to consider
Distributed tracing didn't start to become a common practice related to application performance monitoring (APM) until the early 2010s. Not coincidentally, organizations around that time began to adopt microservices architectures and cloud-native applications. Traces became a critical way to gain visibility into the internal workings of these complex distributed systems.
At a fundamental level, APM and distributed tracing are different processes. Early on, few APM tools supported tracing capabilities, but many APM and observability platforms now incorporate them. Standalone tracing tools are also available. The following list includes popular tools from both categories in alphabetical order and also combines commercial and open source technologies. It was compiled based on research of the available offerings and market reports from consulting firms such as Gartner.
1. Apache SkyWalking
Apache SkyWalking is an open source APM tool launched in 2015 and managed by the Apache Software Foundation. It supports distributed tracing as well as metrics aggregation and analysis, log management and visualization of service dependencies -- relationships between microservices that work collectively to process requests. Originally designed to generate traces for Java applications, the tool now also provides agents for Go, Node.js, PHP, Python and Rust.
Key features also include the following:
- High scalability. SkyWalking works well when dealing with large-scale systems, primarily due to its low resource overhead. This means it can generate traces across large numbers of services without consuming excessive resources.
- Flexible storage options. The tool supports various back-end databases for storing trace data, including Elasticsearch, MySQL, PostgreSQL and BanyanDB, its native APM database.
- Open source and free to use. SkyWalking is free to download, although organizations need to provide their own IT infrastructure to run the tool and store trace data.
2. AWS X-Ray
AWS X-Ray is a distributed tracing tool for analyzing and debugging applications in the AWS cloud. Its primary purpose is to help developers and cloud administrators troubleshoot performance problems in applications of any size, including complex microservices ones. Applications can be debugged in real time, and the tool can also be used to monitor application cost and performance metrics. X-Ray's pricing is based on how many traces users generate and how much trace data they store, while a free tier lets users record, retrieve or scan a base number of traces each month.
Key features also include the following:
- Native AWS service integration. X-Ray integrates automatically with other AWS services, such as Amazon Elastic Compute Cloud, Amazon Elastic Container Service, AWS Lambda and AWS Elastic Beanstalk.
- Service map. X-Ray generates service map visualizations to help development teams understand dependencies between different services in an application and pinpoint ones that aren't performing well.
- Data sampling, annotation and filtering. These capabilities enable users to focus on certain sets of trace data when diagnosing issues, which reduces overall data storage and can help developers home in on the data that matters most.
3. Azure Monitor Application Insights
Azure Monitor Application Insights is the APM tool within Azure Monitor, the main performance monitoring service built into the Microsoft Azure cloud. Application Insights can generate distributed traces and collect logs and metrics on Azure-based web applications. Like many other APM and distributed tracing tools, it supports the vendor-neutral OpenTelemetry data collection framework, although native Application Insights SDKs can also be used to set up traces. Microsoft's pricing to use the tool is based primarily on how much data it ingests.
Key features also include the following:
- Simple Azure integration. Azure Monitor Application Insights integrates natively with other Azure services, such as Azure Functions and Azure Kubernetes Service.
- Multiple views of trace data. Application Insights supports both a transaction diagnostics view for analyzing performance issues in individual transactions or requests and an application map to help teams identify performance bottlenecks or failure points.
- AI-powered anomaly detection. The tool provides AI-assisted anomaly detection capabilities that assess telemetry data and alert developers if the rate of failed requests in an application rises in an unusual way.
4. Cloud Trace
Cloud Trace, the native distributed tracing tool for Google Cloud, helps developers understand latency and performance issues in applications running on the cloud platform. It automatically collects and analyzes latency data for Google Cloud applications, with support for using a combination of OpenTelemetry and a built-in Cloud Trace API to send and retrieve the data. Part of the Google Cloud Observability suite, along with companion logging and performance monitoring services, Cloud Trace is enabled by default when applications are created. Pricing is based on the number of traces users generate.
Key features also include the following:
- Easy Google Cloud integration. Cloud Trace integrates automatically with several Google Cloud services, including Compute Engine, Google Kubernetes Engine, App Engine and Cloud Run.
- Automatic sampling. Cloud Trace can be configured to sample traces, which means it selects only relevant trace data for analysis. This reduces processing overhead and costs.
- Detailed latency breakdown. The tool offers precise insights into where delays occur in distributed applications.
5. Datadog Trace Explorer
Trace Explorer is the distributed tracing feature in Datadog, a comprehensive observability platform that also processes application logs and metrics and supports infrastructure monitoring, digital experience monitoring (DEM) and other functions. Part of the cloud-native platform's APM module, Trace Explorer enables users to search collected spans based on applied tags and display aggregated results in list, table and time series views. Traces can be indexed and retained for 15 days using custom retention filters or 30 days with a default filter. Trace Explorer's pricing is based mostly on how many traces users generate.
Key features also include the following:
- Unified observability. By correlating traces with logs, metrics, database queries, network calls and UX data, Datadog provides holistic monitoring to accelerate root cause analysis (RCA) of performance problems.
- Live debugging of performance issues. For real-time issue detection, Trace Explorer supports searches of all the traces ingested in a rolling 15-minute window.
- AI-assisted performance insights. Watchdog, Datadog's AI engine, uses machine learning to detect anomalies and automate RCA based on traces and other telemetry data.
6. Dynatrace Distributed Tracing
Like Datadog, Dynatrace is a full-stack observability platform. As part of a broader platform overhaul, Dynatrace released Distributed Tracing in 2024 to replace its original tracing tool, which is also still available under the name Distributed Traces Classic. The new tool stores trace data in Grail, Dynatrace's data lakehouse platform, for up to 10 years and has a redesigned UI with expanded charting and data visualization capabilities. Pricing depends on factors such as the types of applications being monitored, how they're deployed and the total volume of traces.
Key features also include the following:
- Automatic tracing instrumentation. Distributed Tracing can automatically collect some trace data without requiring users to manually instrument the tracing in their applications.
- AI-driven RCA. Davis AI, Dynatrace's AI engine, can predict and detect performance anomalies and automatically identify the root cause of issues.
- End-to-end observability and analytics. Trace data can be combined with logs, metrics and other information sources to track application performance from front-end user interactions to back-end services.
7. Elastic Observability
Built on the open source Elastic Stack, Elastic Observability supports APM plus log analytics, infrastructure monitoring, DEM and AIOps. Its distributed tracing tool is part of the platform's Elastic APM module. The tool collects trace data through built-in agents and OpenTelemetry APIs, and the data can be displayed in timeline visualizations. A Trace Explorer search tool, currently available in a technical preview, enables users to analyze trace data through custom or automatically generated queries.
Key features also include the following:
- Native integration with Elasticsearch. Elasticsearch, the core Elastic Stack technology, is a tool for searching and analyzing large volumes of data. Its integration with Elastic APM supports efficient trace analysis.
- High levels of scalability. Elastic Observability works well for generating traces and collecting other types of telemetry data in large-scale distributed systems.
- Free tier for production uses. Most features in Elastic Observability, including distributed tracing, are available free of charge without usage limits for on-premises deployments. Some advanced features require a paid license, and vendor Elastic offers serverless and hosted managed services in the cloud.
8. Grafana Tempo
Grafana Tempo is an open source distributed tracing back-end tool that integrates seamlessly with Grafana, a popular data visualization tool, and other observability technologies. Tempo is designed for efficient and scalable trace storage, and it enables users to feed trace data into Grafana for visualization and analysis using a built-in Tempo data source. The tool can ingest trace data through OpenTelemetry or the protocols for Jaeger and Zipkin, two other open source tracing tools.
Key features also include the following:
- No-index architecture. Tempo doesn't index traces in a database; instead, it stores them in lower-cost object storage, reducing storage costs and increasing scalability.
- Native integration with Prometheus and Loki. Tempo can be used alongside these open source tools to combine the collection and analysis of traces, application performance metrics and logs.
- Free and paid options. As open source software, Tempo is free to use. It's also available through vendor Grafana Labs as part of the Grafana Cloud observability managed service, with a free tier for up to 50 GB of trace data and two levels of paid subscriptions. A self-managed Grafana Enterprise version is offered, too.
9. Honeycomb
Like Datadog and Dynatrace, Honeycomb is an observability platform that runs in various clouds and supports distributed tracing alongside log and metrics analysis. The platform is often noted for its focus on scalability. In addition, its pricing is based mainly on the number of events users analyze rather than the data associated with those events, a model that can be more cost-effective when dealing with highly complex traces. Honeycomb's distributed tracing tool uses OpenTelemetry to collect trace data and a Query Builder function that's built into the platform to analyze traces.
Key features also include the following:
- Event-based observability. The platform's event-centric model facilitates advanced debugging of performance issues using high-cardinality data that includes many unique values, such as discrete events and other data elements associated with them.
- Fast query engine. Honeycomb is optimized for quick searches and analysis, which is useful when working with large numbers of traces.
- BubbleUp. This automated analysis feature highlights anomalies and outliers in trace data to help users pinpoint issues that might be causing slow processing of user requests.
10. Instana AutoTrace
Instana AutoTrace is the distributed tracing tool in the IBM Instana Observability platform, which offers usage-based or subscription pricing based on IBM's Managed Virtual Server licensing metric. Automation capabilities designed to simplify the observability process are a key focus of both the overall platform and Instana AutoTrace. As its name implies, the tool automatically deploys sensors in applications written in various languages to instrument tracing. This eliminates the need for manual instrumentation in those cases. AutoTrace also provides automatic discovery of service failures and changes to applications and the IT systems they run on.
Key features also include the following:
- End-to-end tracing. Instana AutoTrace captures end-to-end trace data for all the requests processed by an application, with no data sampling.
- Support for ingesting external trace data. In addition to the data collected by the tool, users can ingest traces created with Jaeger, Zipkin, OpenTelemetry and the latter's two predecessors, OpenTracing and OpenCensus. IBM also provides SDKs for manually adding trace data.
- Built-in analytics capabilities. The Instana platform uses a knowledge graph to help automate anomaly detection and RCA based on traces and other telemetry data. It also automatically creates dashboards for analyzing trace data.
11. Jaeger
Jaeger focuses exclusively on distributed tracing, although it can be used alongside other types of tools in APM initiatives. Originally developed by Uber in 2015 to help the ridesharing company gain insight into its own cloud-native microservices applications, Jaeger was released as open source software the following year. It became one of the first widely used open source tracing tools and remains popular. Jaeger v2, released in November 2024, replaced the tool's native collector for capturing trace data with the OpenTelemetry Collector; extensions to that technology support Jaeger's data storage and querying features.
Key features also include the following:
- Fully open source and free to use. Jaeger is a completely open tracing tool that's free to download and run.
- Support for small-scale and large-scale tracing. Jaeger can collect trace data by itself on a small scale, but it also operates efficiently in larger-scale environments when paired with tools such as Elasticsearch or OpenSearch to collect, store and search data.
- Multiple sampling options. Jaeger supports several approaches for sampling trace data, which its developers recommend doing to reduce overhead in applications and the cost of storing traces.
12. Lumigo
Lumigo is a microservices observability platform that has gained a wide following among users since it was introduced in 2019. The Lumigo software supports end-to-end distributed tracing with full visibility of tracing payloads that contain detailed data about user requests, potentially reducing the need to check application logs when investigating issues. The tool also includes alerting capabilities for errors and other application events, plus a transaction view and an issues page designed to speed up troubleshooting and performance management. Lumigo offers a free basic version and two paid tiers with different usage limits and either monthly or annual subscriptions.
Key features also include the following:
- Dual focus on serverless and containerized applications. Lumigo initially provided tracing and observability only for AWS Lambda serverless functions. In 2022, though, it added support for tracing containerized applications through a Kubernetes operator or OpenTelemetry.
- Multiple OpenTelemetry options. To instrument traces in containerized applications, Lumigo offers OpenTelemetry distributions for Python, Node.js and Java and also supports external OpenTelemetry implementations.
- Automated data correlation. The tool correlates trace data with logs and metrics in real time to provide context on performance issues and help pinpoint their root causes.
13. New Relic
New Relic is another mainstay on the list of APM and observability platforms with distributed tracing capabilities. It was launched as a SaaS APM tool in 2009, before cloud-native computing became widespread. The cloud-based platform has since evolved to support full-stack observability in complex distributed systems. Distributed tracing is enabled by default in many of its individual products, and New Relic provides a built-in tracing UI to search for, view and analyze traces. A free version of the platform is available along with tiered paid versions priced on a combination of data ingested and users.
Key features also include the following:
- Flexible agent instrumentation. Multiple programming languages can be used to instrument trace data collection. New Relic also supports OpenTelemetry and includes a Trace API with support for an internal data format and the Zipkin format.
- Infinite Tracing. This fully managed tracing service can process more trace data than New Relic's standard technology because sampling decisions are made after the data has been collected instead of through upfront filtering.
- Dashboards with unified telemetry data. New Relic combines data on metrics, events, logs and traces into a single dashboard and offers an extensive set of prebuilt dashboard templates.
14. Sentry
Sentry was created mainly to help Python developers debug applications, but it has evolved into a broader APM platform that supports distributed tracing, error monitoring, performance analysis and other capabilities. Still aimed at developers, Sentry includes a Trace Explorer tool for examining span metrics and other trace data, plus a Trace View page for drilling down into the details of a single trace. Another feature, currently available in a free beta release, enables users to connect structured logs to traces. Individual developers can use a free version of the platform, and paid editions are available for a monthly or annual subscription.
Key features also include the following:
- Impact analysis. Sentry focuses not just on generating traces for application requests, but also on assessing how request failures or slow responses affect overall performance.
- Deep code-level insights. The software provides stack traces showing execution sequences in applications to help developers zero in on the code linked to performance issues.
- Developer-focused UI. As a tool built first and foremost for developers, Sentry caters to users seeking a no-nonsense interface free of "eye candy" data visualizations that add no real value.
15. SigNoz
SigNoz -- short for "signal vs. noise" -- aims to provide a lower-cost open source alternative to observability platforms such as Datadog, New Relic and Dynatrace for use by developers. Launched in 2021, SigNoz supports analysis of logs, metrics and traces, as well as infrastructure monitoring and tracking of application exceptions with detailed stack traces. The OpenTelemetry-based platform's distributed tracing capabilities include the ability to filter, aggregate and query trace data and to correlate traces and logs when debugging applications.
Key features also include the following:
- Different views of trace data. In addition to the default list view of trace data, SigNoz's Trace Explorer page supports viewing it by root span and in time series or table formats.
- ClickHouse-driven advanced analytics. SigNoz stores trace data in ClickHouse, an open source columnar database that enables users to write queries for advanced analytics use cases.
- Free and paid versions. SigNoz is available in a free community edition or paid versions for teams and enterprises. The latter are offered through SigNoz Cloud, a fully managed service that's priced based on the amount of data ingested. The enterprise edition can also be deployed as a self-hosted platform.
16. Splunk Observability Cloud
Splunk Observability Cloud is an end-to-end platform that supports distributed tracing in its Splunk APM module. Cisco, which acquired Splunk in 2024, has also added AppDynamics, another observability tool it already owned, to Splunk's product line. Observability Cloud is designed for monitoring cloud-native applications in microservices architectures, while AppDynamics is oriented to three-tier application architectures in hybrid cloud and on-premises environments. Observability Cloud also focuses more directly on distributed tracing: Trace and span data are the backbone of Splunk APM's monitoring capabilities. Pricing is based on the number of physical and virtual hosts reporting data to the platform.
Key features also include the following:
- AI-driven RCA. Splunk APM uses AI features to automatically detect and resolve application performance issues. While AI capabilities are now increasingly common in APM tools, Splunk was one of the first platforms to make them a core component.
- Multiple analytics options. Splunk APM includes a Traces page for examining specific traces and a Trace Analyzer function for researching unknown or new issues, plus a feature for tracking service-level performance through indexed span tags.
- Zero-code instrumentation. Through OpenTelemetry, Splunk Observability Cloud can automatically configure applications written in various languages to export telemetry data.
17. Turbo360
Born out of an earlier tool that monitored Microsoft Azure's application messaging service bus, Turbo360 provides a wide range of Azure monitoring, cost management and observability capabilities. It supports distributed tracing through a module named Business Activity Monitoring (BAM) that developer Kovai added to the platform in 2020. The BAM software tracks end-to-end message flows in Azure-based applications to help organizations identify and resolve bottlenecks affecting business transactions. Pricing is based on customized annual subscription plans.
Key features also include the following:
- Azure-centric design. Turbo360 focuses specifically on observability and management of Azure workloads. While this might be a drawback for businesses that also use other clouds, the platform offers deeper integration into Azure than most other APM and observability tools.
- Self-service portal. The Turbo360 BAM module is designed to enable business users and tech support teams to analyze and resolve performance issues themselves. Development teams can also use it to trace message flows and troubleshoot issues during the software testing process.
- Transaction modeling and mapping. The tool can model transactions as part of business processes and map them to underlying Azure integration services for tracing purposes.
18. Uptrace
Although Uptrace's name might imply that the product only supports tracing, it's an end-to-end observability platform that can also generate, monitor and analyze logs and metrics to provide visibility into microservices applications. Built on OpenTelemetry, it's fully open source, making it one of the few comprehensive APM and observability offerings available for free through a community edition. The paid versions of the tool, which is marketed as a lower-cost alternative to platforms such as Datadog and New Relic, include additional features and are priced based on the amount of data ingested.
Key features also include the following:
- Extensive grouping and filtering features. Uptrace offers a highly customizable approach to grouping and filtering traces, making it easy to work with trace data at scale.
- Support for large traces. The tool can handle distributed traces that include more than 100,000 spans.
- User-friendly tracing. Overall, Uptrace is a no-frills platform that focuses on displaying the most relevant data and making it accessible to software engineers through a built-in query language.
19. Zipkin
Created by Twitter and released publicly in 2012, Zipkin was one of the first open source distributed tracing tools to gain a wide following. Now developed by the OpenZipkin volunteer organization, it and Jaeger are what you might call the OGs of open source distributed tracing. Like Jaeger, Zipkin is fully open source and free to use. It arguably offers a simpler interface and tends to be considered more user-friendly than Jaeger. Trace data is collected through HTTP, Apache Kafka or other transports and can be stored in-memory or in Apache Cassandra, Elasticsearch and MySQL databases.
Key features also include the following:
- Focus on traces. Zipkin supports only distributed tracing, making it an attractive option for teams that prefer to use other tools to handle the additional aspects of APM.
- Data sampling support. The tool offers multiple sampling options to reduce tracing overhead and data storage needs.
- Built-in visualizations. Zipkin provides native data visualization features to display traces, plus a dependency diagram that shows how many traced requests pass through applications to help identify error paths and other issues.
Chris Tozzi is a freelance writer, research adviser and professor of IT and society. He has previously worked as a journalist and Linux systems administrator.