Protecting computer vision from hostile attacks

Advances in computer vision and machine learning have made it possible for a wide variety of technologies to perform sophisticated tasks with little or no human supervision. From autonomous drones and self-driving cars to medical imaging and product manufacturing, many computer applications and robots use visual information to make critical decisions. Cities are increasingly relying on these automated technologies for public safety and infrastructure maintenance.

Compared to humans, however, computers see with a kind of tunnel vision that makes them vulnerable to attacks with potentially catastrophic consequences. For example, a human driver who sees graffiti on a stop sign will still recognize it and stop the car at an intersection. The graffiti can actually cause a self-driving car to miss the stop sign and plow through the intersection. And while the human mind can filter out all kinds of unusual or extraneous visual information when making a decision, computers linger on small deviations from expected data.

This is because the brain is infinitely complex and can simultaneously process masses of data and past experiences to arrive at near-instantaneous decisions appropriate to the situation. Computers rely on mathematical algorithms trained on data sets. Their creativity and cognition are limited by the boundaries of technology, mathematics and human foresight.

An attacker could exploit this vulnerability by altering the way a computer sees an object, either by modifying the object itself or by modifying some aspect of the software involved in the vision technology. Other attacks can manipulate the decisions the computer makes about what it sees. Both approaches can spell disaster for individuals, cities or businesses.

A team of researchers from UC Riverside’s Bourns College of Engineering is working on ways to fend off attacks on computer vision systems. To do that, Salman AsifSrikanth KrishnamurthyAmit Roy-Chowdhuryand Chengyu song find out first which attacks work.

“People would want to do these attacks because there are many places where machines interpret data to make decisions,” said Roy-Chowdhury, the principal investigator of a recently concluded DARPA AI Explorations program called Techniques for Machine Vision Disruption. “It can be in an adversary’s interest to manipulate the data the machine makes a decision about. How does an opponent attack a data stream so that the decisions are wrong?”

Illustration showing how small changes in the way a computer vision program labels objects can lead to bad decisions

Illustration showing how an attacker could cause a computer vision system to miscategorize objects it sees through the camera. Mislabeling one object may not be enough to make a bad decision, but mislabeling several related objects is. (Cai et al. 2022)

For example, an adversary would inject some malware into the software of a self-driving vehicle so that when data comes in from the camera, it is slightly disrupted. As a result, the models installed fail to recognize a pedestrian and the system would hallucinate an object or fail to see an object that does exist. By understanding how to generate effective attacks, researchers can design better defenses.

“We’re looking at how to distort an image so that when it’s analyzed by a machine learning system, it’s miscategorized,” Roy-Chowdhury said. “There are two ways to do this: deepfakes where someone’s face or facial expressions are changed in a video to fool a human, and hostile attacks where an attacker manipulates how the machine makes a decision, but a person is usually not wrong. The idea is that you make a very small change to an image that a human cannot perceive, but that an automated system will do and make a mistake.”

Roy-Chowdhury, his collaborators and their students have found that most existing attack mechanisms are aimed at misclassifying specific objects and activities. However, most scenes contain multiple objects, and there is usually a relationship between the objects in the scene, meaning that certain objects appear together more often than others.

People who study computer vision call this simultaneous occurrence “context.” Members of the group showed how to design context-aware attacks that alter the relationships between objects in the scene.

“For example, a table and chair are often seen together. But a tiger and chair are rarely seen together. We want to manipulate all of these together,” says Roy-Chowdhury. “You could turn the stop sign into a speed sign and remove the zebra crossing. If you replaced the stop sign with a speed sign but left the crosswalk, the computer in a self-driving car may still recognize it as a situation where it needs to stop.

Earlier this year, at the Association for the Advancement of Artificial Intelligence conference, the researchers showed that a machine will want to make a wrong decision if it is not enough to manipulate just one object. The group developed a strategy to perform enemy attacks that change multiple objects simultaneously in a consistent manner.

“Our key insight was that successful handoff attacks require holistic scene manipulation. We learn a context graph to guide our algorithm on which objects to target in order to fool the victim model, while preserving the overall scene context,” said Salman Asif.

in a newspaper Presented this week at the Conference on Computer Vision and Pattern Recognition conference, the researchers along with their collaborators at PARC, a research division of the Xerox company, are building on this concept and proposing a method that the attacker could not access. to the victim’s information. computer system. This is important because with any intrusion, the attacker runs the risk of being discovered by the victim and a defense against the attack. The most successful attacks are therefore likely to be attacks that do not probe the victim’s system at all, and it is critical to anticipate and design defenses against these “zero-query” attacks.

Last year, the same group of researchers used contextual relationships over time to conduct attacks on video sequences. They used geometric transformations to design highly efficient attacks on video classification systems. The algorithm leads to successful disruptions in surprisingly few attempts. For example, examples of opponents generated using this technique have better success rates with 73% fewer attempts compared to advanced video attack methods. This allows for faster attacks with far fewer probes in the victim system. This article was presented at the premier machine learning conference, Neural Information Processing Systems 2021.

The fact that context-aware enemy attacks are much more powerful on natural images with multiple objects than existing attacks that primarily target images with a single dominant object opens the way to more effective defenses. These defenses can take into account the contextual relationships between objects in an image, or even between objects in a scene in multi-camera images. This offers the opportunity for the development of significantly more secure systems in the future.

Citations and links to the original research articles are below:

headline photo: vchal/Getty Images

Leave a Comment

Your email address will not be published.