Mastodawn

Ari Benjamin Jan 13, 2023

Working memory helps us make dumplings! From Yoo & Collins (2022), "How Working Memory and Reinforcement Learning
Are Intertwined"

Show thread

Ari Benjamin Jan 4, 2023

If we measure how W filters frequencies in time, and construct a threshold of sensitivity to simulate a "network acuity" of the output 𝑊𝑋, we observe a linear increase with training, like in humans. Learning naturally reflects environmental statistics at all stages of learning. As this demo shows, this happens in extremely simple systems you can code and imagine in a few lines.

Show thread

Ari Benjamin Jan 4, 2023

Luckily, there's lots of work on gradient descent in learning theory. It turns out that in this system, gradient descent causes W to learn each principal component of the inputs at different rates. In fact, it learns the PCs in order of their variance. (Specifically we're measuring the projection 𝐯ᵢᵀ𝑊𝐯ᵢ for PC 𝐯ᵢ). And since the first PCs of natural images contain lower spatial frequencies, the result is that W acts like a lowpass filter that gets less strict as W learns.

Show thread

Ari Benjamin Jan 4, 2023

One way to interpret this result is with efficient coding, the idea that neurons optimally represent the world despite unresolvable constraints (like noise). To explain an increase in acuity, one needs to say that 1) perception is always optimal with age, and 2) neural noise decreases with time. (e.g. Kiopes & Movshon (1998)).

We wanted to examine a different possibility: what if the brain's learning algorithm simply prefers to learn low-frequency information first? To demonstrate this, we created a very-very-simple model system – matrix multiplication – and asked what gradient descent would learn first if trained to reconstruct natural images.

Note that the "optimal solution" is trivial – let W be the identity matrix – and does not better represent any aspect of the data.

Ari Benjamin Jan 4, 2023

Time for a #tootprint to celebrate a publication! I want to share a simple demo of the paper's main concept here — five-lines-of-Python simple.

Back in the '80s, researchers Luisa Mayer and Velma Dobson (among others) found that babies and young children slowly get better at seeing fine details as they age. This improvement is *linear* in time. (This is visual acuity: the finest spacing of a grating that can be resolved before it appears pure gray.) What explains this steady improvement?

Twitter	https://twitter.com/arisbenjamin
Website	https://ari-benjamin.com