The parameters of a deep neural network are learned from the training data by minimizing a nonconvex empirical risk function. One main issue of deep learning is its lack of analytical performance guarantees on generalizability, namely, whether the learned parameters perform well on the testing data. Moreover, the learning performance degrades significantly when the training set is small, and theoretical characterization of the required training set size for a given network architecture is unavailable. This project establishes theoretical and algorithmic underpinning for deep learning. This project develops computationally efficient learning algorithms to estimate the model parameters. It proves the generalizability of the learned model to the testing data. The theoretical and algorithmic results in this project apply to a broad class of neural networks such as graph convolutional neural networks and convolutional/recurrent neural networks.
1. We proved the generalizability of learning one-hidden-layer convolutional neural networks (CNN) with Rectified Linear Units (ReLU) and characterized the required number of training samples.
2. We proved that the accelerated gradient descent can achieve a faster convergence rate in the nonconvex learning problem of one-hidden-layer neural networks and characterized the improved convergence rate.