Georgia Institute of Technology
The Complexity of Learning Neural Networks
Abstract: The empirical successes of ``deep learning'' currently lack rigorous theoretical explanation. As a first step, we ask whether data generated by neural networks with a single hidden layer, smooth activation functions and benign input distributions can be learned efficiently. We give a surprisingly general polynomial-time analysis of gradient descent when the hidden layer uses unbiased sigmoid gates, exploiting new connections we make with tools from spherical harmonics. However, when the generating network uses arbitrary biases, the problem appears intractable. We construct a family of simple neural networks that is hard to learn in the sense that any statistical query algorithm (including all known variants of stochastic gradient descent with any loss function, for any model architecture) needs an exponential number of queries on data labeled by such a network even using tolerance inversely proportional to the input dimensionality. Joint work with Le Song, Santosh Vempala, and Bo Xie.
Friday January 19, 2018 at 3:00 PM in SEO 636