The Puzzling Success of Overparameterization: Lottery Tickets or Escape Dimensions?

Lotteries and tickets are often used as a didactical analogy to explain the success of overparameterized neural networks: โ€œlarger networks succeed because they more likely contain a well-initialized subnetwork that can learn the task in isolation, much like buying more tickets increases the chances of winning a lottery.โ€ This explanation is intuitive but misleading: it suggests that subnetworks can be treated in isolation from the rest of the network. Following this reasoning leads to interpreting learning in wide networks as a multi-start optimization process, where gradient descent simply conducts a parallel search over subnetworks. We argue that this view is flawed since, among other reasons, winning tickets can be made to fail by perturbing the rest of the network. We put forward a more accurate intuitive picture for the success of overparameterization based on the geometry of loss landscapes: increasing width expands the set of available dimensions for optimization, making it easier to escape bad local minima. Moreover, as width grows, bad minima become increasingly rare relative to good minima. As the field grows mature, it is important to refine the analogies we use to explain foundational phenomena, such as the apparent redundancy of large networks, reconciling practitioners' intuitions with modern theoretical insights.

๐Ÿš€ Oh, behold! Another groundbreaking revelation: to make neural networks human-like, just catapult them into the realm of overparameterization! ๐Ÿคฏ Who knew the secret to AI savantism was simply a matter of throwing more darts at the wall and hoping for Picasso? ๐Ÿง ๐Ÿ”ฎ
https://gwern.net/llm-catapult #neuralnetworks #overparameterization #AIinnovation #machinelearning #technews #HackerNews #ngated
Human-like Neural Nets by Catapulting

Speculative proposal to create artificial neural nets with human-like performance by high-learning-rate/regularization training of overparameterized NNs to trigger catapulting/grokking. Over-parameterization as a route to true generalization would resolve many outstanding mysteries of artificial versus natural intelligence.

'Local Linear Recovery Guarantee of Deep Neural Networks at Overparameterization', by Yaoyu Zhang, Leyang Zhang, Zhongwang Zhang, Zhiwei Bai.

http://jmlr.org/papers/v26/24-0192.html

#overparameterization #overparameterized #deep

Local Linear Recovery Guarantee of Deep Neural Networks at Overparameterization

'Preconditioned Gradient Descent for Overparameterized Nonconvex Burer--Monteiro Factorization with Global Optimality Certification', by Gavin Zhang, Salar Fattahi, Richard Y. Zhang.

http://jmlr.org/papers/v24/22-0882.html

#optimality #minimizer #overparameterization

Preconditioned Gradient Descent for Overparameterized Nonconvex Burer--Monteiro Factorization with Global Optimality Certification