Posted October 24, 2019
Active Learning of Adversarial Attack Boudaries
Summary

This research program introduces a novel framework for generating adversarial examples for machine learning (ML) algorithms. Generating adversarial examples is the critical step towards evaluating the robustness of ML algorithms against malicious adversaries, and is the subject of intense current studies.  The existing approaches to generating such examples often require complete access to the system model and/or the training data, which is unrealistic in many domains. To circumvent these issues, there have been recent studies on designing decision-based attacks, which dispense with the assumption of knowing the systems, and instead, select examples and query their corresponding outputs to identify the adversarial regions. Often these examples are chosen at random, which leads to requiring a prohibitively large number of examples for accurately estimating the attack boundaries. In this research program, we propose an active-learning-based approach to decision-based attacks, according to which the examples are chosen sequentially and adaptively to the past samples. Our preliminary studies on such active sensing/learning mechanisms indicate that such active sensing approaches often lead to significant savings in the number of examples/queries required for forming reliable estimates of the attack regions, hence breaking the implementing of the existing decision-based algorithms to generating adversarial examples. The proposed framework consists in (i) information-theoretic analysis of the fundamental limits of the number of examples/queries required, (ii) designing active-learning algorithms whose sample complexities meet (or are close to) the information-theoretic limits, and (iii) evaluating the robustness of the algorithms via implementing on image and text datasets. Addressing the theoretical questions in this project is at the interface of information theory, sequential statistics, and tatistical learning theory.

Key Findings

Algorithms for generating black box adversarial examples with controlled complexity

Project Start Date
Project End Date