PALO ALTO -- The open source Apache Spark data processing framework could play a central role in businesses' ability to capitalize on the growing trove of data that is the Internet of Things, said attendees at the IoT Data Analytics & Visualization conference.
Adoption of IoT technologies is among the fastest growing areas of IT, but in order for the data from connected devices to be useful, enterprises need to be able to collect, process and analyze it. That's where Spark comes in. The distributed computing framework excels at handling huge volumes at speed, making it a natural choice for IoT data analytics.
"Spark is a perfect fit for IoT," said Ashok Srivastava, chief data scientist at Verizon. He is head of a data science research team at the wireless carrier that is looking for ways to apply advanced analytics and machine learning to existing corporate data to create new revenue producing areas of the business.
Spark handles diverse data types
For example, he and his team are looking at past network traffic to try to predict future traffic levels. They're also working with Verizon's agricultural services division to help optimize crop planting to increase the yield by looking at sensor data to see what types of plantings do well in certain areas and what kind of conditions they need.
In each case, the types of data are diverse, something Spark generally handles well. Across the team's various projects they are using Spark's machine learning library and its streaming analytics algorithms for lower latency applications. Leveraging these diverse data types could potentially unleash huge business value, Srivastava said.
"I think the more we can think about the heterogeneous data structures we have, there are so many opportunities ahead of us."
For Soundar Srinivasan, senior manager of engineering, data mining services and solutions at Robert Bosch LLC, the main draw of Spark is its ability to interface with a range of data systems. Bosch is becoming a big user of IoT technologies to monitor the manufacturing process for its products, including automotive components and power tools. By gathering data on assembly line efficiency, the company can identify slowdowns and implement changes.
Predictive analytics saved manufacturing costs
In the case of one hydraulic pump used in agricultural equipment Srinivasan and his team were able to identify redundant quality tests that were holding up the release of the product to the market. They did this by using predictive analytics to forecast the results of quality tests. Once they were able to accurately predict the test result, the assembly line didn't need to go through with every test. This reduced the testing and calibration phase of manufacturing the pump by 35% and saved the company a half million dollars annually.
"What we tried to do was apply data collection and analytics to this conflict" of needing to get products to market quickly while still ensuring they meet certain quality standards, Srinivasan said.
He said Spark is a good platform for managing this process because his team uses a complex array of data management and analytics tools. On the back end they use Sqoop to ingest data, Hive to store it, R and Python to analyze the data and build predictive models, and Tableau to visualize the results. Spark sits in the middle of all this, effectively stitching the whole thing together.
Not everyone sold on Spark
Despite these advantages, not everyone at the conference is sold on Spark as an analytics engine for IoT. Emil Berthelsen, principal analyst at Machina Research, said Spark can certainly be useful software for building an IoT data analytics platform today. But ultimately, it's not the best tool for supporting IoT data analytics applications.
The reason for this, Berthelsen said, is that it essentially builds on existing data processing and analytics technologies, which he sees as insufficient for the challenges of IoT. He said traditional data management and analysis technologies from the likes of SAS Institute and IBM are great at handling structured data collected at predetermined intervals. But they don't hold up as well to the varied conditions typically seen in IoT applications. For this he recommended more purpose-built tools like Splunk.
"The challenge is these are build-ons rather than built for the domain," Berthelsen said. "IoT is different because it's less structured. You don't know what's coming."
The reality of big data IoT applications
Don't let IoT analytics be an afterthought
The business value of IoT data analytics