Abstract: The implicit bias of gradient descent, and specifically its margin maximization properties, have arisen as a promising explanation for the good generalization of deep networks. The purpose of this talk is to demonstrate the effectiveness of a dual problem to smoothed margin maximization. Concretely, this talk will develop this dual, as well as a variety of consequences in linear and nonlinear settings. In the linear case, this dual perspective firstly will yield fast 1/t rates for margin maximization and implicit bias. This is faster than any prior first-order hard-margin SVM solver, which achieves 1/sqrt{t} at best. Secondly, the dual analysis also allows a characterization of the implicit bias, even outside the standard setting of exponentially-tailed losses; in this sense, it is gradient descent and not a particular loss structure which leads to implicit bias. In the nonlinear case, duality will enable the proof of a gradient alignment property: asymptotically, the parameters and their gradients become colinear. Although abstract, this property in turn implies various existing and new margin maximization results.
Joint work with Matus Telgarsky.