Adversarial Examples are Not Bugs, They are Features

by Andrew Ilyas, Shibani Santurkar et al.

We cast adversarial vulnerability as a fundamental consequence of the dominant supervised learning paradigm.

There exist both robust and non-robust features that constitute useful signals for standard classification.

It is possible to disentangle robust from non-robust features (in the standard image classification datasets).

The phenomenon that non-robust features contribute classification is not identical or due to finite-sample overfitting.

They transfer across models with different architectures.

adversarial examples