Mathematical Computer Science Seminar

Steve Hanneke
TTI-C
Principles of Active Learning
Abstract: In many machine learning applications, the effort required to manually label the massive data sets necessary to train machine learning systems to a high accuracy presents a major hurdle. One promising approach to reducing the required training sample size is active learning, a technique in which the learning algorithm participates in interactively selecting examples to be labeled for training, in order to focus the human expert's efforts on labeling only informative and non-redundant examples. Active learning holds great potential for dramatically reducing the number of labeled training examples needed for learning. However, despite decades of research on the subject, the most popular active learning algorithms in the applications literature are known to be unreliable and sensitive to violations of modeling assumptions, which has held back the widespread applicability of active learning in practice. At the root of this problem, it seems we have lacked a complete understanding of the basic principles that should underlie the design of good active learning algorithms. Such a situation calls for a careful theoretical approach to the problem.
In this talk, I will articulate essential principles for the design of effective active learning algorithms, distilled from over a decade of research on the theory of active learning. Moreover, I will describe a general active learning strategy based on these principles, which is provably near-optimal, in the sense that the number of labeled training examples sufficient to achieve a given accuracy guarantee cannot be significantly reduced by any other active learning algorithm. In the process, I will discuss the fundamental trade-offs and general complexity measures intrinsic to the active learning setting, and present formulas expressing the minimum number of labeled examples sufficient and necessary for an optimal active learning algorithm to achieve a given accuracy guarantee.
Monday November 5, 2018 at 3:00 PM in 427 SEO
Web Privacy Notice HTML 5 CSS FAE
UIC LAS MSCS > persisting_utilities > seminars >