In part one of this series, I discussed the mindset needed to get started on building an accurate and successful predictive analytics pipeline. Now, let’s dig into the steps we’ve found to be universal in building effective predictive analytics.
Step one: Get your data ready
Laying infrastructural groundwork is always required to enable rapid deployment of new analytics in the present, and in the future. This is a substantial effort — transforming data to prime the models, building scalable infrastructure that enables efficient orchestration of analytic workloads, aligning ingestion/egestion pipelines to data profiles and so forth — but we view this as a one-time effort that we undergo with our customers on their path to digitalization. For the sake of brevity, let’s assume that work has been completed and focus on building a single predictive analytic.
The first step is to ensure that the data that you will use to build your models is actually usable. While some data sets will check the IDA (initial data analysis) boxes in terms of forensic quality, if the data isn’t trustworthy or useful to your end user it won’t be of any use to you as well. To identify what is “useful,” you should be seeking end-user support in the form of quality assurance as well. If done correctly, it will also reduce effort in the data cleansing process.
Step two: Identify your ‘problem’
Once the data is staged, evaluate the end users’ current state of needs. What specific problem are they trying to solve? Where will predictive capabilities enhance your stakeholders’ ability to “act” before a problem arises? And where will an analytic provide the greatest “improvement” from the current heuristic model?
These improvements can take the form of 1) accuracy or 2) efficiency of prediction. Make sure that the model seeks to improve performance on one or both of these vectors, and that the expected improvement is meaningful from the stakeholder’s perspective. For instance, while improving performance of a heuristic model by 33% is meaningful to a data scientist, no COO is going to approve investment in a new predictive model if it will likely reduce churn from 0.6% down to 0.4% and the impact to his bottom line is minimal.
Step three: Define business impact
So, how do you get in the COO’s good graces? One important way is to identify how business impact will be measured, find the metrics that are mission-critical to the business and build your models to help improve those metrics. Driving the outcomes that matter to the stakeholders will result in a higher likelihood of adoption by these stakeholders.
This exercise will be a combination of understanding the end-user behaviors that you want to enhance or influence to improve outcomes, and then identifying how (and where) data can be surfaced in a manner to drive these behaviors. These outcomes will likely have KPIs associated with them — that’s a great place to start.
As an example, in our utility world, grid reliability metrics (SAIDI, CAIDI, SAIFI, etc.) have been particularly compelling as our stakeholders seek to optimize them regularly. So, to drive the most impact for your stakeholders use these metrics as the needle that you need to move.
Step four: Build it
At this point, it’s time to build the analytic. You will need to connect:
- Your data that has been staged for the analytic build;
- The problem that is needed to be solved and the success metrics associated with that problem; and
- The data science/analytic tools at your disposal.
There are a ton of assorted flavors of tools to use for different analytics. In our day to day work, we’re often using the simpler techniques, such as anomaly detection, linear regressions and forecasting, to solve our problems. But as automation and more advanced means of analytic abstraction become ubiquitous, our ability to dive into more advanced techniques such as neural networks and other forms of deep learning are enabled as well.
That being said, it is critical that the tools and techniques being used are not chosen based on their sophistication, but rather fit with the problem at hand. Solving a problem simply does not make it any less impactful. Remember that, especially when faced with resourcing challenges or aggressive timelines.
Step five: Optimize!
Finally, map out expected ROI for use of the model. This will be helpful in model tuning and optimization. The ROI exercise usually entails understanding costs of Type I and II errors, as well as the benefits of the true positive prediction. Knowing these will help tune optimal sensitivity to (and the right balance of) precision and recall in the results. It’s also the first step toward more powerful prescriptive analytics that not only predict outcomes, but can suggest optimal paths toward resolution or reconciliation.
These ROI calculations also come in handy as a powerful marketing tool — showing precise impact will help with adoption of the analytic and can lead to opportunities to build more analytics.
And there you have it. Predicting the future is not easy, and accurately doing so requires extreme levels of precision, skill and quality data. But if you start with the right data, need, ROI and the right tool, you’ll soon be reaping the many benefits predictive analytics has to offer.
All IoT Agenda network contributors are responsible for the content and accuracy of their posts. Opinions are of the writers and do not necessarily convey the thoughts of IoT Agenda.