Description
We study phenomena that arise during classification using linear and lifted models in overparameterized settings, presenting new perspectives on the work in Muthukumar et al. [19, 18]. To simulate real-world setups where only some of our features are actually useful, we consider the simplified 1-sparse model. We review a sharp characterization of generalization of min-ℓ2 interpolation on Gaussian data [18].
In the hard-margin support vector machine (HM-SVM) problem, we show for the stylized ridge featurization that for a sufficient degree of “effective overparameterization”, all training points become support vectors. We remark how this simple featurization captures the essential behavior in other featurizations as well. A consequence of this theorem is the HM-SVM problem is indistinguishable from the min-ℓ2 binary interpolator. Along with prior work that gradient descent initialized at 0 on the squared loss converges to the min-ℓ2 binary interpolator, and that gradient descent on the logistic loss converges to the HM-SVM, our work conveys that the choice of loss function does not have a large effect on the learned parameters in overparameterized settings.
In the regimes that the above theorem holds, we examine whether margin-based explanations for generalization are able to account for some observations for our problem. We observe for linear models that a) the resultant bounds on the probability of misclassification on a test point exceed 1 and are hence tautological and b) a model with a larger margin often does not have a better generalization performance. For kernel-inspired models, we investigate what the right normalization for margins is, and design two new notions of margin that are appropriate for these.
As a preliminary exposition of ongoing work, we examine the ramifications of our results for adversarial performance on the Fourier featurization. We discover, using visualizations that our learned function exhibits Gibbs-like behavior around jump discontinuities, and this causes adversarial examples to proliferate in the vicinity of training points.