In Partial Information Attacks on Real-world AI, a group of MIT computer science researchers report on their continuing work fooling Google's image-classifier, this time without any knowledge of how the classifier works.
In their new study, researchers used a "black box" approach to discovering adversarial examples: they fire a bunch of images into Google's image classifier, make notes of where it makes mistakes, and systematically explore how to force it to misperceive one object for another.
This is possible because machine-learning image classifiers are using blind, statistical modeling to identify the salient points of any given image: their creators don't say "turtles have shells and stubby legs and tails" or "rifles have long stocks, barrels and triggers." Instead, the classifier is fed a bunch of pictures of turtles, rifles, or you, and asked to figure out what they share in common.
This works great in non-adversarial context: that is, when no one is trying to fool the classifier. But it turns out that making a classifier that works when someone is trying to fool it is very different from making a classifier that works in a peaceful situation. This is often the case with algorithms: think of how Google's Pagerank algorithm made near-miraculous progress in identifying relevant web-pages by counting the links people had made in the past, but had to be radically revised once people started making new links for the purpose of confusing it.
In the new paper from MIT's Lab Six, the team demonstrates impressive progress in using the "black-box" method for fooling the algorithm: they modeled deficits in the classifier that let them undertake arbitrary tricks, like making a row of guns (with slightly tweaked shadows and lighting) look like a helicopter.
Adversarial preturbations and examples are a very exciting domain of security research. I hear that there's going to be a lot of this at next summer's security conferences.
These algorithms are being entrusted to tasks like filtering out hateful content on social platforms, steering driverless cars, and maybe one day scanning luggage for weapons and explosives. That’s a tremendous responsibility, given that don’t yet fully understand why adversarial examples cause deep learning algorithms to go haywire.
There are some hypotheses, but nothing conclusive, Athalye told me. Researchers have essentially created artificially intelligent systems that “think” in different ways than humans do, and no one is quite sure how they work. “I can show you two images that look exactly the same to you,” Athalye says. “And yet the classifier thinks one is a cat and one is a guacamole with 99.99 percent probability.”
Partial Information Attacks on Real-world AI
Researchers Fooled a Google AI Into Thinking a Rifle Was a Helicopter [Louise Matsakis/Wired]