Getty Images/iStockphoto

Tip

How to test a predictive model

Strategies for testing predictive models and analytics emphasize data quality, real-time testing and code redundancy, as well as AI and machine learning integration.

Matt Heusser

By

Matt Heusser, Excelon Development

Published: 29 Sep 2023

One of the simplest, most straightforward forms of AI is the predictive model. The predictive model, which uses the same kind of logic that powers large language models, such as GPT-4, might already be in use in your organization. It might be used to predict demand for retail sales, for fraud detection, for dynamic pricing strategies or to recommend products to customers.

Writing code to recommend products could be as easy as joining a few tables in a database: find what people who bought that product also bought. Fundamentally, that's predicting. Predicting the future might seem impossible, but it can be done and done well -- or badly. A bad prediction might mean some retail products are overstocked and other shelves go empty. Organizations test to reduce that risk, yet testing a predictive model brings other challenges.

Predictive model testing strategies

Here are a few strategies to test predictive models and analytics, some context, and ideas about how to use AI and machine learning (ML) to help test.

Create an obvious data set

For e-commerce recommendations, you can make some obvious conclusions. Have a lot of users select two or three other products; maybe they buy a fourth but give it one star. This kind of database can push the requirements with examples. Ideally, you are able to save this database as is and then import it later to a clean system for testing -- especially if page views are part of the algorithm. The obvious data set makes obvious conclusions.

Test the input data

Any serious review of ERP system implementation shows that predictive algorithms might fail based on design but will fail with bad data. Worse, the meaning of the data migrated from one system to another might be different, even if the columns have the same names. Don't simply cut over the data; follow the workflow to see why that data is entered and what it means.

Get live data and run the algorithm

In some cases, you can run the predictive model in real time and compare it to what's happening. Some companies have a shadow tool that enables you to log in as an arbitrary user and see what they see. So, you could find the most active and unique users, look at their history and see if what they see makes sense. Note that, in many industries, personally identifiable information must be anonymized, but that might not matter for testing.

For example, one company I worked with was using analytics to predict when a driver would start a vehicle. It might be that the driver would enter the car somewhere between 5:25 p.m. and 5:45 p.m., if the GPS said the vehicle was parked at work on a weekday. If the conditions were right, the vehicle would check the temperature, start the engine and either turn on the heat or air conditioner. One easy way to test this is to drive a luxury vehicle under development -- or get data from a vehicle and run the algorithm.

Back-test

ERP systems can use ML to predict next year's holiday season based on last year, plus year-over-year growth in the first six months. Use the same math to generate last holiday season compared to the previous year, and compare that to reality. This applies to all kinds of inventory preparation. You can do the same thing with predicting products or movies people should like. Split their activity in half, and then look at what they should like compared to actual product reviews.

Check the output

The model might be correct, but it's always possible there is an error in the output or the extract, transform and load program. That kind of error could result in a downstream program having the wrong information, leading to a bad recommendation.

Code it twice

Although writing the program twice might not be cost-effective, isolating the code that generates the predictions and writing that twice could be surprisingly quick. Having two predictive algorithms on the same data makes it possible to run the program twice and compare output.

The movie Minority Report is based on the idea of comparing three different predictive algorithms, with a requirement that at least two agree. Two different predictive algorithms can find errors interpreting the specification and straight-up coding errors.

Check the model

As Mark Twain said, there are three kinds of deception: lies, damned lies and statistics. There are, for example, entire webpages on how to get a spreadsheet to generate "hockey stick" predictive growth based on any data. Understand how the formula works and why that one was chosen.

Using predictive models in testing

Karen Johnson's RCRCRC heuristic points us to look at changes that are recent, core, risky, configuration-sensitive, recently repaired and chronic. Historically, that was mostly done by an educated team at a whiteboard or in a mind-map tool. Today, with the help of AI and large language models, it's possible to gather those elements as discrete data points to predict what to test. That might involve log data, version control commits, analysis of test coverage, defect tracking and Jira. The simplest part of that could be core -- look at log data, reduce it, assign a ranking and then sort. The features that are used most often are the most important to customers.

Another option is to tie a tool that predicts customer behavior to automated checks -- to run the predicted customer behavior through the software as a kind of test.

Most organizations don't have that wherewithal. For many it's not even in the realm of possibility, even if the data were in place, which it isn't. Still, small strides in that direction could generate test ideas, test data or better evaluations.

Dig Deeper on Software testing tools and techniques

Search Cloud Computing

AWS reports 17.5% growth, fails to impress investors
Amazon's cloud business delivered better-than-expected growth in the second quarter, but pales in comparison with results from ...
Prep data for machine learning with AWS analytics services
Data preparation is crucial when building and training machine learning models with SageMaker AI. What AWS analytics services can...
Microsoft Q4 earnings surge on cloud results; AI gains steam
Booming cloud business drove fourth-quarter and full-year results past analyst expectations as the AI race continues to heat up.

Search App Architecture

Insomnia vs. Postman: Comparing API management tools
Insomnia has a streamlined interface and focus. Postman has extensive features for end-to-end development. Choosing comes down to...
8 best practices for creating architecture decision records
An ADR is only as good as the record quality. Follow these best practices to establish a dependable ADR creation and maintenance ...
Refactor vs. rewrite: Deciding how to fix problem software
At some point, all developers must decide whether to refactor code or rewrite it. Base this choice on factors such as ...

Search ITOperations

Credit Karma leader shares AI governance lessons learned
Start slow and break things -- that's how the head of data and AI at the fintech says enterprises should start building AI ...
Agile methodology reborn as COVID, AI transform enterprises
Enterprises grew disillusioned during the past decade with efforts to scale Agile, but global upheavals since 2020 have pushed ...
A+E Global Media boosts AIOps with deterministic AI
What once needed extensive scripting and fine-tuning has become more precise and easier to use after recent updates to Kubiya's ...

TheServerSide.com

Product backlog vs. sprint backlog: What's the difference?
The sprint backlog and product backlog are important elements of Scrum and essential to iterative and incremental development. ...
Acceptance criteria vs. definition of done: What's the difference?
Software teams must understand the important distinction between acceptance criteria and definition of done and how to use them ...
Spring, Quarkus or Jakarta EE? How to choose a Java framework
Choosing a Java framework is not about which one is best, it's about accepting their tradeoffs of stability, flexibility and ...

Search AWS

Compare Datadog vs. New Relic for IT monitoring in 2024
Compare Datadog vs. New Relic capabilities including alerts, log management, incident management and more. Learn which tool is ...
AWS Control Tower aims to simplify multi-account management
Many organizations struggle to manage their vast collection of AWS accounts, but Control Tower can help. The service automates ...
Break down the Amazon EKS pricing model
There are several important variables within the Amazon EKS pricing model. Dig into the numbers to ensure you deploy the service ...

Close