Abstract

Neural networks have been a topic of research since 1970s and the Convolutional Neural Networks (CNNs) were first shown to work well for hand written digits recognition in 1998. These early networks were however still shallow and contained only a few layers. Moreover these networks were mostly trained on a small amount of data in contrast to the modern CNNs which contain hundreds of convolution layers and are trained on millions of images. However, this recent shift in machine learning comes at a cost. Modern neural networks have extremely large number of parameters and require huge amount of computations for training and inference. A 2018 study by OpenAI estimated that the compute requirements for training large models have been doubling every 3-4 months. By comparison Moore's law has a 2 year doubling period. Thus, computational requirements have been surpassing the hardware capabilities very quickly. To address this issue, we have not only developed methods for reducing the computational cost for convolution neural networks, but also to enable them to train and continually learn in this framework. Specifically we have developed a new approach for compression based on spectral decomposition of filters, which replaces each convolution layer in the model with compact low-rank approximations. In doing so, we are motivated by the observation that the original filters in a convolution layer can be represented as weighted linear combinations of a set of 3D basis filters with one-dimensional weight kernels. While compression is achieved by using fewer basis filters, we show that these basis filters can be jointly finetuned along with the weight kernels to compensate for any loss in performance due to truncation, and to thereby achieve state of the art results on both classification and regression problems. We then, proposed using a minimum L1-norm regularizer to simultaneously train and compress the convolutional neural networks from scratch. Two popular neural network compression methods are pruning and low rank filter factorization. Since both of these methods depend on filter truncation, it is important for their success that the original filters are compact in the first place and the discarded filters contain little information. However conventional methods do not explicitly train filters for this purpose so any deletion of filters also discards useful information learned by the model. To address this problem we propose to train the model specifically for compression from scratch. We show that unlike conventionally trained networks, models trained with our approach can learn the same information in a much more compact fashion. Moreover the minimum L1 norm regularizer enables us to train the model for subsequent compression by either filter factorization or by pruning, thereby unifying these two compression strategies. We also show that our compression framework naturally extends to allow continual learning, where the model needs to learn continuously from new data as it becomes available. This introduces a problem referred to as catastrophic forgetting where the model fails to preserve previously learned information as it being trained on new data. We propose a solution which eliminates this problem altogether while also needing significantly less FLOPs as compared to other continual learning techniques.

Notes

If this is your thesis or dissertation, and want to learn how to access it or for more information about readership statistics, contact us at STARS@ucf.edu

Graduation Date

2023

Semester

Summer

Advisor

Mahalanobis, Abhijit

Degree

Doctor of Philosophy (Ph.D.)

College

College of Engineering and Computer Science

Department

Computer Science

Degree Program

Computer Science

Identifier

CFE0009806; DP0027914

URL

https://purls.library.ucf.edu/go/DP0027914

Language

English

Release Date

August 2023

Length of Campus-only Access

None

Access Status

Doctoral Dissertation (Open Access)

Share

COinS