Start artificial intelligence for IoT in bite-size pieces

IoT practitioners must follow clear steps to implement an AI analytics process if they want to create an AI application with IoT that improves their deployment's performance.

If organizations want to develop AI applications for IoT deployments, they don't necessarily need to know AI theory. They must consider the many components that create cohesive foundations for AI and IoT and implement them step by step.

With the rapid adoption of IoT, AI applications have made smart devices more practical for process optimizations. IT professionals cannot process the massive amount of IoT data that sensors create in a timely manner when organizations have thousands of devices. Instead, organizations can invest in AI for IoT.

IoT practitioners and data scientists can use Michael Roshak's book, Artificial Intelligence for IoT Cookbook, to break down AI for IoT development into bite-sized pieces. Roshak walks through the processes of IoT device selection and set up, data collection and specific machine learning use cases, including predictive maintenance and anomaly detection. After Roshak guides readers through initial processes such as data collection and modeling, he digs into aspects that can obstruct AI and IoT projects from moving forward, such as natural language processing and computer vision.

In this excerpt from Chapter 4, "Deep Learning for Predictive Maintenance," senior staff enterprise architect Roshak spells out the steps needed to predict machine failures from IoT sensor data. His recipes include how to improve data models, identify patterns, determine when a signal means a machine failure and deploy models to web services.

Enhancing data using feature engineering

One of the best uses of time in improving models is feature engineering. The ecosystem of IoT has many tools that can make it easier. Devices can be geographically connected or hierarchically connected with digital twins, graph frames, and GraphX. This can add features such as showing the degree of contentedness to other failing devices. Windowing can show how the current reading differs over a period of time. Streaming tools such as Kafka can combine different data streams allowing you to combine data from other sources. Machines that are outdoor may be negatively affected by high temperatures or moisture as opposed to machines that are in a climate-controlled building.

In this recipe, we are going to look at enhancing our data by looking at time-series data such as deltas, seasonality, and windowing. One of the most valuable uses of time for a data scientist is feature engineering. Being able to slice the data into meaningful features can greatly increase the accuracy of our models.

Artificial Intelligence for IoT Cookbook cover imageClick here to learn more
about Artificial Intelligence
for IoT Cookbook
Michael Roshak.

More on Artificial Intelligence for IoT Cookbook

Check out the full chapter on developing deep learning for predictive maintenance and the rest of the recipe started below.

Getting ready

In the Predictive maintenance with XGBoost recipe in the previous chapter, we used XGBoost to predict whether or not a machine needed maintenance. We have imported the NASA Turbofan engine degradation simulation dataset which can be found at https://data. In the rest of this chapter, we will continue to use that dataset. To get ready you will need the dataset.

Then if you have not already imported numpy, pandas, matplotlib, and seaborn into Databricks do so now.

How to do it...

The following steps need to be observed to follow this recipe:

  1. Firstly, import the required libraries. We will be using pyspark.sql, numpy, and pandas for data manipulation and matplotlib and seaborn for visualization:

from pyspark.sql import functions as F
from pyspark.sql.window import Window

import pandas as pd
import numpy as np
import matplotlib as mpl
import matplotlib.pyplot as plt
import seaborn as sns

  1. Next, we're going to import the data and apply a schema to it so that the data types can be correctly used. To do this we import the data file through the wizard and then apply our schema to it:

file_location = "/FileStore/tables/train_FD001.txt"
file_type = "csv"

from pyspark.sql.types import *
schema = StructType([
StructField("engine_id", IntegerType()),
StructField("cycle", IntegerType()),
StructField("setting1", DoubleType()),
StructField("setting2", DoubleType()),
StructField("setting3", DoubleType()),
StructField("s1", DoubleType()),
StructField("s2", DoubleType()),
StructField("s3", DoubleType()),
StructField("s4", DoubleType()),
StructField("s5", DoubleType()),
StructField("s6", DoubleType()),
StructField("s7", DoubleType()),
StructField("s8", DoubleType()),
StructField("s9", DoubleType()),
StructField("s10", DoubleType()),
StructField("s11", DoubleType()),
StructField("s12", DoubleType()),
StructField("s13", DoubleType()),
StructField("s14", DoubleType()),
StructField("s15", DoubleType()),
StructField("s16", DoubleType()),
StructField("s17", IntegerType()),
StructField("s18", IntegerType()),
StructField("s19", DoubleType()),
StructField("s20", DoubleType()),
StructField("s21", DoubleType())

  1. Finally, we put it into a Spark DataFrame:

df ="delimiter"," ").csv(file_location,

  1. We then create a temporary view so that we can run a Spark SQL job on it:


  1. Next, we calculate remaining useful life (RUL). Using the SQL magics, we create a table named engine from the raw_engine temp view we just created. We then use SQL to calculate the RUL:


drop table if exists engine;

create table engine as
(select e.*
,mc - e.cycle as rul
, CASE WHEN mc - e.cycle < 14 THEN 1 ELSE 0 END as
from raw_engine e
join (select max(cycle) mc, engine_id from raw_engine group by
engine_id) m
on e.engine_id = m.engine_id)

  1. We then import the data into a Spark DataFrame:

df = spark.sql("select * from engine")

  1. Now we calculate the rate of change (ROC). In the ROC calculation, we are looking at the ROC based on the current record compared to the previous record. The ROC calculation gets the percent of change between the current cycle and the previous one:

my_window = Window.partitionBy('engine_id').orderBy("cycle")
df = df.withColumn("roc_s9",
((F.lag(df.s9).over(my_window)/df.s9) -1)*100)
df = df.withColumn("roc_s20",
((F.lag(df.s20).over(my_window)/df.s20) -1)*100)
df = df.withColumn("roc_s2",
((F.lag(df.s2).over(my_window)/df.s2) -1)*100)
df = df.withColumn("roc_s14",
((F.lag(df.s14).over(my_window)/df.s14) -1)*100)

Dig Deeper on Enterprise internet of things

Data Center
Data Management