This is the second piece in a three-part series. Read the first piece here.
I have a belief that’s unorthodox in the data science world: explainability first, predictive power second, a notion that is more important than ever for companies implementing AI.
Why? Because AI is the hottest technology on the planet, and nearly every organization has a mandate to explore its benefits by applying AI models developed internally or acquired as part of a package solution from a third-party provider. Yet a recent London venture capital firm MMC study in Europe found that 40% of startups classified as AI companies don’t actually use AI in a way that is material to their businesses. How can these startups and the customers that buy from them rely on the predictive power of their AI technology when it’s not clear if the models delivering it are truly AI?
Explainability is everything
AI that is explainable should make it easy for humans to find the answers to important questions including:
- Was the model built properly?
- What are the risks of using the model?
- When does the model degrade?
The latter question illustrates the related concept of humble AI, in which data scientists determine the suitability of a model’s performance in different situations, or situations in which it won’t work because of a low density of neural networks and a lack of historical data.
We need to understand AI models better because when we use the scores a model produces, we assume that the score is equally valid for all customers and all scoring scenarios. Often this may not be the case, which can easily lead to all manner of important decision being made based on very imperfect information.
Balancing speed with explainability
Many companies rush to operationalize AI models that are neither understood nor auditable in the race to build predictive models as quickly as possible with open source tools that many users don’t fully understand. In my data science organization, we use two techniques — blockchain and explainable latent features — that dramatically improve the explainability of the AI models we build.
In 2018 I produced a patent application (16/128,359 USA) around using blockchain to ensure that all of the decisions made about a machine learning model, a fundamental component of many AI solutions, are recorded and auditable. My patent describes how to codify analytic and machine learning model development using blockchain technology to associate a chain of entities, work tasks and requirements with a model, including testing and validation checks.
The blockchain substantiate a trail of decision-making. It shows if a variable is acceptable, if it introduces bias into the model and if the variable is used properly. We can see at a very granular level the pieces of the model, the way the model functions and the way it responds to expected data, rejects bad data or responds to a simulated changing environment.
This use of blockchain to orchestrate the agile model development process can be used by parties outside the development organization, such as a governance team or regulatory units. In this way, analytic model development becomes highly explainable and decisions auditable, a critical factor in delivering AI technology that is both explainable and ethical.
An explainable multi-layered neural network can be easily understood by an analyst, a business manager and a regulator, yet a neural network model has a complex structure, making even the simplest neural net with a single hidden layer, which produces a latent feature in the model making it hard to understand, as shown in Figure 1.
I have developed a methodology that exposes the key driving features of the specification of each hidden node. This leads to an explainable neural network. Forcing hidden nodes to only have sparse connections makes the behavior of the neural network easily understandable.
Generating this model leads to the learning of a set of interpretable latent features. These are non-linear transformations of a single input variable or interactions of two or more of them together. The interpretability threshold of the nodes is the upper threshold on the number of inputs allowed in a single hidden node, as illustrated in Figure 2.
As a consequence, the hidden nodes get simplified. In this example, hidden node LF1 is a non-linear transformation of input variable v2, and LF2 is an interaction of two input variables, v1 and v5. These nodes are considered resolved because the number of inputs is below or equal to the interpretability threshold of two in this example. On the other hand, the node LF3 is considered unresolved.
To resolve an unresolved node, thus making it explainable, we tap into its activation. A new neural network model is then trained. The input variables of that hidden node become the predictors for the new neural network, and the hidden node activation is the target. This process expresses the unresolved node in terms of another layer of latent features, some of which are resolved. Applying this approach iteratively to all the unresolved nodes leads to a sparsely connected deep neural network, with an unusual architecture, in which each node is resolved and therefore is interpretable, as shown in Figure 3.
The bottom line
Together, explainable latent features and blockchain make complex AI models understandable to human analysts at companies and regulatory agencies––a crucial step in speeding ethical, highly predictive AI technology into production.
Keep an eye out for the third and final blog in my AI explainer series on the three Es of AI on the topic of efficient AI.
All IoT Agenda network contributors are responsible for the content and accuracy of their posts. Opinions are of the writers and do not necessarily convey the thoughts of IoT Agenda.