Enterprises are starting to adopt streaming data analytics to provide insight that adapts to new data, often in real time. This goes beyond traditional analytics that operate on batches of data to provide more reactive insight.
The technology is still in its early days, but enterprises are excited about providing more actionable insight to managers and frontline workers and better applications for users.
"Capturing, integrating, analyzing and archiving data streams is much more complex than creating pipelines for batch data at rest," said Torsten Volk, managing research director at Enterprise Management Associates.
Watching data in motion
Streaming data analytics is like trying to analyze how cars work by standing next to a highway watching traffic, compared to examining those same cars in a parking lot. Not only do you not get a good long look at each car in traffic, you also need to constantly switch your attention from one car to another.
"Enterprise data analysts face similar challenges when it comes to integrating and synchronizing streaming data, often from highly dynamic sources such as application containers, IoT sensors or edge devices," Volk said.
Pulling off this challenge requires a whole new set of technologies and human skill, as in addition to dealing with "moving cars" you also need to think about how to accommodate the much more dynamic character of these data streams.
For example, containers can quickly spawn, move or shut down in different clouds or data center locations, while sensors can be added, enhanced, replaced or removed without notice. Additionally, an edge device could deal with patchy connectivity.
"Building dynamic data pipelines to deal with this constant change is a nontrivial task," Volk said.
Volk sees containers, IoT and edge computing as three key drivers to streaming analytics as all three disciplines continuously create increasingly broad streams of operational data. Capturing and correlating these data points often makes the difference between excellent and disastrous decisions or problem responses.
For example, with Kubernetes containers, the more complex container clusters become, the harder it becomes for IT admins to get to the root cause of issues through static log analytics, as these logs often do not capture the early indicators of a problem. Streaming analytics catches these indicators by looking at everything that is going on between and during the time logs are created.
Streaming analytics would enable teams to discover that while the app shows no symptoms of stress at all, under the covers there might be conflicting policy parameters causing containers to needlessly spin up and down or move around. This might be fine under standard conditions but lead to big problems under stress, Volk said.
Enterprises are adopting streaming analytics in response to the need for faster decision-making.
"Organizations strive to reduce delays in analyzing data [at] rest by increasing their ability to analyze data in motion," said Carlton Sapp, senior director analyst at Gartner.
He also sees some tools for streaming analytics to help reduce noisy data and process only data that is relevant for answering or solving a business question or challenge. Streaming analytics platforms have become sophisticated filters for reducing an overall glut of data that is not useful to an application. These tools can also help analyze IoT data closer to where it is collected and appropriately summarized to drive deeper insight.
Sapp believes that enterprises are increasingly looking for advanced analytics capabilities that use AI or machine learning on data in transit. But he noted that performing deep learning on streaming pipelines remains a mystery to many organizations.
Another emerging trend is the development of tools to improve contextualization of streaming data by connecting streams to various data stores at rest.
"Organizations want greater integration with traditional data management platforms," Sapp said.
Enterprises are also looking for great visualization capabilities in the analytics of streaming data pipelines.
Beefing up data infrastructure
Sapp said that enterprises should also consider adapting streaming data analytics tools to improve various aspects of IT and data management infrastructure. For example, the same core streaming analytics technologies can also improve replication and change data capture.
"Streaming analytics can be a Swiss Army knife of capabilities for organizations, especially as it relates to enhancing traditional data processing methods," Sapp said.
Streaming analytics can also be used as a complementary tool to monolithic data storage with data warehouses by using ETL to manage and/or reduce the total cost of data management. Sapp also expects to see streaming analytics spark new use cases that deal with maintenance of IT systems.
Enterprises face a variety of challenges in building the streaming data pipelines to make these programs successful. Sapp said these include:
- integrating time-series streaming analytics with data at rest;
- ingesting and integrating a variety of different data sources (images, text, audio) to support a use case;
- security, governance and privacy as methods for securing data in transit (this remains in its infancy); and
- rising cost of transporting highly volatile streams of data -- even though cloud providers offer elastic services, costs in throughput and payload can be unpredictable.
Hot data paths
A useful concept is to think about hot and cold data paths, and each system may have one or more of each, said Sean Werick, managing director of analytics at Sparkhound, a digital transformation consultancy.
A cold path is done in a batching process or data warehouse manner, which is 95% of reporting and analytics demand.
A hot path is information needed immediately. Examples include a customer service representative speaking with a customer while looking up their profile or a manufacturer who wants to know how their equipment is doing.
"The critical difference is that hot data paths require immediate action once an event occurs, while cold data is typically analyzed ad hoc," Werick said.
He sees hot data paths showing up in diverse fields such as marketing and manufacturing. In marketing, a retailer might be running a campaign and monitoring social feeds in order to adjust the campaign in real time. An auto manufacturer may want to know how all their equipment is performing in real time.
Create a streaming data analytics proof of concept
Toby Olshanetsky, co-founder and CEO of the proof-of-concept-as-a-service platform ProoV, said enterprises should look for key features like integration with other applications, data visualization dashboards, development tools, automation, compatibility with different data sources and real-time data analysis.
Enterprises can determine if these features align with their technical infrastructure as a streaming data analytics proof of concept. This allows the simultaneous evaluation and comparison of multiple products under identical conditions and against identical benchmarks.
Olshanetsky said this process should include the following steps:
- Define the technical and business key performance indicators (KPIs) to be met by the streaming data analytics provider.
- Create mimicked data to be analyzed.
- Create the infrastructure to stream the mimicked data.
- Stream the data.
- Enable the deployment of vendor products in the evaluation environment.
- Analyze and compare the results from each vendor against the enterprise's benchmarks.
A different mode of thinking
Werick cautioned that real-time analytics is not for every application and can add a lot of overhead in terms of infrastructure and management. Data analysts need to focus streaming data analytics on applications that need to update in real time as opposed to several times a day.
"The real-time data results in real-time decisions and are generally critical in nature," Werick said.
He comes across many clients who say they need real-time data, and when he asks what is being impacted in the next 30 minutes, they tend to rethink their decision.
"Streaming analytics is complex and requires a different mode of thinking," Werick said. It's a migration from traditional data warehousing and cold path data methodologies, he said.
It's also a major shift in architecture. A lot of Werick's clients don't know where to start. They know they want to capture the data, but they don't know how to go about doing it since it's not the way they've been doing things for 30 years.
Focus on the use case
Enterprises often over focus on the technical challenges of setting up streaming analytics at the expense of the operational challenges and ignore vital questions of outcomes, value, processes and skills, said Ed Cuoco, vice president of analytics at PTC, a lifecycle management software provider.
The use of streaming data analytics to optimize parts replacement or time-to-failure should present considerations of scale, data quality and what the analytical insight provides to the user.
Common use cases Cuoco sees for improving operations include:
- Operational intelligence. The combination and correlation of data into a real-time insights process improves decision-making, which leads to better performance, increased efficiency and enhanced predictive operational recommendations.
- Predictive maintenance. Proactively scheduling and performing maintenance based on real-world asset conditions and calculated predictions can minimize downtime, leading to increased productivity, higher quality, reduced costs and increased customer satisfaction.
- Asset/process monitoring. Identification of undesirable, unexpected or abnormal asset conditions for experts to more effectively triage and resolve potential issues.