Microservices have gone from vague concept to mainstream phenomenon in a few short years, and no one's looking back. Microservices deliver flexibility and reduce costs, among other benefits. They also deliver complexity, and IT organizations must keep tabs on all that potential chaos.
When users access a service, they actually touch typically 10 to 15 microservices; any one of those hits could slow down the application. IT plays a guessing game to determine the source of an error or latency: Maybe it's in a database, in a single service or in the interaction between multiple services. Distributed tracing, also called distributed request tracing, is an emerging microservices monitoring technique that helps IT and DevOps teams manage distributed applications.
Follow my lead
Distributed tracing is the practice of instrumenting code in various languages so that the monitoring team can follow and analyze paths that transactions take, Forrester analyst Charles Betz said. "It relies on the concept of 'distributed context propagation,' which injects metadata into API requests to be carried throughout complex distributed system interactions," he said.
"Generically, tracing is a process of measuring the timings of each operation executed during an individual request," said Julie Levine, product manager at Datadog, a monitoring and analytics vendor that offers a distributed tracing capability. Tracing performed across multiple services or hosts becomes distributed tracing. Application performance monitoring (APM) vendors are building the capability into products. While distributed tracing provides a similar outcome to traditional APM, this technique for microservices monitoring requires code-level support, so developers must include and configure it as part of system construction.
Charles BetzAnalyst, Forrester
Beyond mapping transactions, other benefits to distributed tracing for microservices monitoring include performance and forensics, Betz said. "I view it as a promising but limited effort to solve the age-old problem of contextual dependencies, with open source in the microservices world." It joins a long line of technologies attempting to do so, he added.
Trace data is universally useful, said Suman Karumuri, software engineer at Pinterest. "Ops can use it for performance debugging; devs can use it for general debugging and understanding what's going on," he said.
Tracing can be used on both monolithic and microservices architectures.
There's no reason why distributed tracing wouldn't work with other application designs, if designed in from scratch. Mobile/back-end, web/back-end, multi-tiered/back-end services and other architectures that process a request across several systems use traces for monitoring. The appeal of microservices monitoring, though, according to Karumuri, is that microservices tend to make tracing a request across multiple tiers more acutely challenging.
Where to get distributed tracing capabilities
The challenges of distributed tracing implementation vary considerably, unless the framework you use is already instrumented, Karumuri said. He recommended the Zipkin or open tracing communities for advice on how to implement tracing in a framework.
The Cloud Native Computing Foundation initiated a notable standard called OpenTracing. Intriguingly, according to Betz, it seems to cover similar ground to an older standard: The Open Group's Application Response Management (ARM) standard. Developer resistance was one of the problems ARM reportedly encountered, Forrester analysts observed.
"The time is probably right for a more technically current approach, but the age of ARM -- 20 years -- indicates that this is not at all a new problem," Betz said, and it is still in need of resolution.
The success of distributed tracing is such that APM vendors, such as Zipkin, LightStep, Envoy Proxy and Datadog, are incorporating distributed tracing to monitor microservices.
Established APM players must take note, Betz said. "Young, small, software-centric companies may opt for baking [distributed tracing] in, in which case it is difficult to see them rushing to a New Relic, [Cisco's] AppDynamics or Dynatrace," he said. More mature IT organizations running legacy technologies must investigate how they can use distributed tracing and what benefits they'll reap from it.