Mastodawn

Apologies if I missed this in the article (or in the first article in the series) - what happens if you add two copies of the layer set? Does performance improve over adding one copy of the layer set?

Show thread

yodon Mar 24

If you look at convolutional neural nets used in image processing, it's super common for the first layer or so to learn a family of wavelet basis functions. Later layers then do recognition in wavelet space, without that space ever being explained or communicated to the training algorithm.

This work here is obviously more complex than that, but suggests something similar is going on with early layers transforming to some sort of generalized basis functions defining a universal language representation.

Official	https://
Support this service	https://www.patreon.com/birddotmakeup