Adversarial Examples are Not Bugs, They are Features

by Andrew Ilyas, Shibani Santurkar et al.

Main Idea

We cast adversarial vulnerability as a fundamental consequence of the dominant supervised learning paradigm.

Central Premise

There exist both robust and non-robust features that constitute useful signals for standard classification.

It is possible to disentangle robust from non-robust features (in the standard image classification datasets).

Experimental Setup for Demonstration

Disentangling robust and non-robust features

Non-robust features suffice for standard classification

The phenomenon that non-robust features contribute classification is not identical or due to finite-sample overfitting.

Transferability can arise from non-robust features

They transfer across models with different architectures.

results matching ""

    No results matching ""