You should find out what's going on in that neural network. Y'know they're cheating now?

Robot hand holding cards

Neural networks – the algorithms that many people think of when they hear the words machine learning – aren't very good at explaining what they do. They are black boxes.

At one end you feed in training data such as a set of cat pictures and a set of non-cat pictures, the neural net crunches the data to produce a statistical model and when fresh data is added, the neural network classifies it as cat or not-cat using the mathematical model.

Neural networks classify lots more than photos of cats – from the likelihood of a patient suffering a severe fall to the probability of a convicted person reoffending, everything in between and beyond.

The more complex the information, and the more consequential its recommendations, the more critical it will be that its operators should understand how it decides things.

We need to be able to trust AI. Take medicine, where you'll need to ensure the machine isn't interpreting training data the wrong way and introducing logical fallacies.

Margo Seltzer, Herchel Smith professor of computer science at Harvard University's John A. Paulson School of Engineering and Applied Sciences, warns of a scenario where a model confuses an observed result with a cause.

Seltzer cites an example of a model that noticed asthma patients were less likely to die of pneumonia. The model incorrectly assumed that asthma protected pneumonia patients based on that data.

"The problem is that if you show up at an ER and have asthma and pneumonia, they take it way more seriously than if you don't have asthma, so you're more likely to be admitted," she says. These patients did better than average because doctors had treated them more aggressively, but the algorithm didn't know that.

We call this "interpretability". So how do we open the black box and interpret what it is "thinking"?

It's not as straightforward as you'd think.

Zachary Lipton, assistant professor at Carnegie Mellon University, has a blog called Approximately Correct that he uses to debunk common misperceptions about AI. Lipton argues that before working on interpretability, we must define it. And that itself is a challenge.

He says that there is no commonly agreed definition of interpretability because it depends on our goals. "If I implement software based on this, will it be biased against black people, or how robust will it be if some small changes happen in the world?" he asks. "How susceptible is it to small changes in statistics in the data? I want to know all these different things that are not captured by the evaluation criteria."

So what are our options? One way to interpret ML is to probe inside the black box, either by looking at the machine learning algorithm's internal process, or at the model that it produces.

Algorithmic transparency involves an algorithm that is inherently understandable and can explain what it is doing. We code our concept of interpretability into it from the beginning.

"That [idea] has recently started getting a lot of traction," says Dr Sameer Singh, assistant professor of computer science at the University of California, Irvine. "The problem has been getting them to be as accurate as they would if you didn't have that constraint."

Sounds great, but there's an catch: what's believed to be a general trade-off between accuracy and interpretability in the machine-learning world. Some algorithms, such as linear regression and decision trees, tend to be more interpretable but with less potential for accuracy than neural networks. There are some initiatives to help make them more accurate, such as sparse linear methods (SLIM) and rule lists.

Analysis showed that the network wasn't training itself on the nuances of husky face shapes or curly tails. It had capitalized on a flaw in the training data. One type of animal usually cropped up against a snowy background, so the model had classified pictures based on which pictures had snow and which didn't

Research into interpretable ML methods is ongoing. The US Defense Advanced Research Projects Agency (DARPA) is working with AI researchers to demonstrate such algorithms this year, and a report is due in November.

Another way to evaluate how an ML algorithm decided something is to look at the model that it produces. You could try understanding the whole machine learning model at once, but Lipton and others, like PhD candidate in machine learning Christoph Molnar in his book, argue that this isn't feasible. There are so many dimensions to the data that we can't keep up. Even trying to understand all components of a linear regression model at once is prohibitively complex.

You can break the model down into parts to understand it. Some models, such as decision trees and linear regression, lend themselves to this concept. Molnar calls these "white box" models.

"A tree is equivalent to a bunch of if-else statements. That means, if I learn a decision tree on data to make predictions, I could – assuming the tree is not too deep – sit down and implement the tree as a series of if-else statements," he says.

These white box models can, however, turn into black boxes if they become too complicated. "If a linear model has a lot of inputs (for example, a couple of thousand) or if the decision tree is very deep, these white boxes become impossible to interpret," Molnar says.

Another approach is to examine the post-hoc analysis. In computer vision, for example, a saliency map is a tool that lets researchers visualise what a neural network has concentrated on in an image.

Singh takes yet another approach. He throws lots of variations of a single record at the model to see how it classifies each, as a means for researchers to deduce what particular factors affect a neural network's decision.

Singh happened to co-author LIME, an open-source tool that uses this technique to analyse an AI system's conclusions, rather than its internal processes. Why?

Singh was using a neural network discern wolves from huskies – a task humans can do reasonably well, but nuanced enough for a neural network to find difficult. Nevertheless, the neural network produced better-than-expected results. He found that the neural network he was using was effectively cheating – it was gaming a flaw to produce the intended results but without actually learning.

"Analysis showed that the network wasn't training itself on the nuances of husky face shapes or curly tails. It had capitalized on a flaw in the training data. One type of animal usually cropped up against a snowy background, so the model had classified pictures based on which pictures had snow and which didn't," Singh says.

Researchers created a program to repeatedly alter an image – it would hide parts of the picture, for example. "When we hid the wolf in the image and sent it across, the network would still predict that it was a wolf, but when we hid the snow, it would not be able to predict that it was a wolf anymore," Singh explains.

He describes systems that look at what neural networks are "thinking" as neuroscience, whereas he compares this deductive method, based on what the neural network ends up classifying, to psychology.

Singh believes that you could apply this technique to forensic results. Suppose a customer complained that they didn't get a bank loan based on their race. You might be able to present all the information about them to the network, ranging from age through to transaction history and credit usage many times, each time changing a small element. In that way, you might prove that the network was, in fact, taking race into account, or perhaps it was redlining her based on a late card payment a year ago.

We are chipping away at the concept of interpretable machine learning. Here's the rub, though: that capability could also make it easier to game machine learning systems, creating patterns that they misinterpret. Imagine being able to fool an autonomous vehicle into thinking that a stop sign was a tree. It's like the husky recognition program thrown into reverse.

Further irony: only through finding good explanations for AI can you make it easier to identify the attack points. According to Molnar, adding that transparency could eventually create a game of cat and mouse between AI developers and hackers trying to infiltrate your system.

"An argument in favour of open source is that everyone can read the code and exploits are quickly spotted and fixed," he says. "Explainable machine learning has similarities to open-source code. The openness should make it easier to spot and fix possible exploits, but also (at least at first) easier to exploit the machine learning algorithm."

That's a controversial take. While the many-eyeballs argument has generally helped produce solid open-source software, there have been cases where the community model has not worked and bugs have remained hidden beyond the reach of all those eyeballs.

If AI and ML are in their infancy, interpretable and explainable systems are several stages before that – possibly embryonic. There's no single, right or wrong way to tackle this challenge but there does exist a nagging concern: that the moment AI opens up is the moment it becomes readable to all.

The one silver lining? Hackers will already likely be working to understand and turn AI against us. It is, therefore, in our best interests to make ML understandable in order to act on that knowledge and thereby protect our systems – and us. ®

Related articles