When do machine learning models generalize well? A signal-processing perspective

Subramanian, Vignesh

PDF

Description

Contemporary machine learning systems have demonstrated tremendous success at a variety of tasks including image classification, object detection and tracking, and recommendation algorithms. This success has been driven by the enormous advances in computation capabilities that enable us to utilize big training datasets, with large number classes and train models with a vast number of parameters. In fact, these systems use models that have sufficient model capacity to be trained to zero training error on noisy or even completely random labels. However, these models often generalize well in practice and avoid harmful "overfitting". The key to good generalization lies in the implicit bias of the model architecture and training algorithm that steers us towards solutions that generalize well. This thesis works towards a better theoretical understanding of this phenomenon by analyzing overparameterized linear models and proving sufficient and necessary conditions for good generalization. Additionally, we also empirically investigate whether we can engineer the correct implicit bias when training models to solve practical problems in the field of control by making use of our knowledge about the problem domain.

We start by analyzing the simpler setting of overparameterized linear regression, fitting a linear model to noisy data when the number of features exceeds the number of training points. By taking a Fourier-theoretic perspective we map the key challenge posed by overparameterization to the well-known phenomenon of aliasing of the true signal due to under-sampling. Borrowing from the signal-processing concepts of "signal bleed" and "signal contamination", we derive conditions for good generalization for the Fourier-feature setting.

Next, we analyze the generalization error for the minimum-l₂-norm interpolator for the regression and binary classification problems in a Gaussian-feature setting. For regression, we interpolate real-valued labels and for binary classification, we interpolate binary labels. (It turns out that under sufficient overparameterization, minimum-norm interpolation of binary labels is equivalent to other binary-classification training methods such as support-vector machines or gradient descent on logistic loss.) We study an asymptotic setting where the number of features d scales with the number of training points n and both n, d → ∞. Under a bi-level spiked covariance model for the features we prove the existence of an intermediate regime where we we perform well on the classification task but not on a corresponding regression task.

We then extend the analysis to the multiclass classification setting where the number of classes also scales with the number of training points, by deriving asymptotic bounds on the classification error incurred by the minimum-norm interpolator of one-hot encoded labels.

Finally, to understand how we can learn models that generalize well in practice, we empirically study the application of neural networks to learn non-linear control strategies for hard control problems where the optimal solutions are unknown and linear solutions are provably sub-optimal. By intelligently designing neural network architectures and training methods that leverage our knowledge of the dynamics of the control system, we are able to more easily and robustly learn control strategies that perform well.

Details

Title

When do machine learning models generalize well? A signal-processing perspective

Creator

Subramanian, Vignesh, Author

Published

EECS Department, University of California at Berkeley, Berkeley, California, 08/11/22

Full Collection Name

Electrical Engineering & Computer Sciences Technical Reports

Other Identifiers

EECS-2022-199

Type

Text

Format

technical reports

Extent

191 p

Language

eng

Usage Statement

Researchers may make free and open use of the UC Berkeley Library’s digitized public domain materials. However, some materials in our online collections may be protected by U.S. copyright law (Title 17, U.S.C.). Use or reproduction of materials protected by copyright beyond that allowed by fair use (Title 17, U.S.C. § 107) requires permission from the copyright owners. The use or reproduction of some materials may also be restricted by terms of University of California gift or purchase agreements, privacy and publicity rights, or trademark law. Responsibility for determining rights status and permissibility of any use or reproduction rests exclusively with the researcher. To learn more or make inquiries, please see our permissions policies (https://www.lib.berkeley.edu/about/permissions-policies).

Collection

EECS Technical Reports

Files

Statistics

Download Full History

Download

Formats

Format
BibTeX
MARCXML
TextMARC
MARC
DublinCore
EndNote
NLM
RefWorks
RIS

Add to Basket