Geometric Properties of Backdoored Neural Networks

Carrano, Dominic

PDF

Description

Backdoor attacks recently brought a new class of deep neural network vulnerabilities to light. In a backdoor attack, an adversary poisons a fraction of the model’s training data with a backdoor trigger, flips those samples’ labels to some target class, and trains the model on this poisoned dataset. By using the same backdoor trigger after an unsuspecting user deploys the model, the adversary gains control over the deep neural network’s behavior. As both theory and practice increasingly turn to transfer learning, where users download and integrate massive pre-trained models into their setups, backdoor attacks present a serious security threat. There are recently published attacks that can survive downstream fine-tuning and even generate context-aware trigger patterns to evade outlier detection defenses.

Inspired by the observation that a backdoor trigger acts as a shortcut that samples can take to cross a deep neural network’s decision boundary, we build off the rich literature connecting a model’s adversarial robustness to its internal structure and show that the same properties can be used to identify whether or not it contains a backdoor trigger. Specifically, we demonstrate that backdooring a deep neural network thins and tilts its decision boundary, resulting in a more sensitive and less robust classifier.

In addition to a simpler proof of concept demonstration for computer vision models on the MNIST dataset, we build an end-to-end pipeline for distinguishing between clean and backdoored models based on their boundary thickness and boundary tilting and evaluate it on the TrojAI competition benchmark for NLP models. We hope that this thesis will advance our understanding of the links between adversarial robustness and defending against backdoor attacks, and also serve to inspire future research exploring the relationship between adversarial perturbations and backdoor triggers.

Details

Title

Geometric Properties of Backdoored Neural Networks

Creator

Carrano, Dominic, Author

Published

2021-05-14

Full Collection Name

Electrical Engineering & Computer Sciences Technical Reports

Type

Text

Format

technical reports

Extent

47 p

Language

eng

Usage Statement

Researchers may make free and open use of the UC Berkeley Library’s digitized public domain materials. However, some materials in our online collections may be protected by U.S. copyright law (Title 17, U.S.C.). Use or reproduction of materials protected by copyright beyond that allowed by fair use (Title 17, U.S.C. § 107) requires permission from the copyright owners. The use or reproduction of some materials may also be restricted by terms of University of California gift or purchase agreements, privacy and publicity rights, or trademark law. Responsibility for determining rights status and permissibility of any use or reproduction rests exclusively with the researcher. To learn more or make inquiries, please see our permissions policies (https://www.lib.berkeley.edu/about/permissions-policies).

Collection

EECS Technical Reports

Files

Statistics

Download Full History

Download

Formats

Format
BibTeX
MARCXML
TextMARC
MARC
DublinCore
EndNote
NLM
RefWorks
RIS

Add to Basket