Susceptibility to adversarial attacks is an issue that plagues many deep neural networks. One method to protect against these attacks is adversarial training (AT), which injects adversarially modified examples into training data, in order to achieve adversarial robustness. By exposing the model to malignant data during training, the model learns to not get fooled by them during inference time. Although, AT is accepted to be the de facto defense against adversarial attacks, questions still remain when using it for practical applications. In this work, we address some of these questions: What ratio of original-to-adversarial in the training set is needed to make them effective?, Does model robustness from one type of AT generalize to another attack?, and Does the AT data ratio and generalization vary depending on model complexity? We attempt to answer these questions using carefully crafted experiments using CIFAR10 dataset and ResNet models with varying complexity.
|