Mastodawn

https://lispy-gopher-show.itch.io/dl-roc-lisp/devlog/1463131/game-emeddable-small-deep-learning-assets-are-a-receiver-operating-characteristic-part-0

My grandiosity crept in while I was trying to convey that I am pretty sure I am pretty onto something.

Anyway, this is part 0 of my #itchio series on my #deepLearning by receiver operating characteristic #statistics #dl #roc #DLROC implemented in pure ansi #commonLisp based on the condition system.

Time 0:
> Sharks.
Time 1:
> Feed sharks humans.
Time 2:
> Feed sharks.

It is very enlightening for me personally.

It is a bit early, but I hope *you* participate with it.

Game-Emeddable small Deep Learning (-assets) are a Receiver Operating Characteristic Part 0 - DL-ROC.LISP by screwtape

To say the ending first, look at this tiny deep learning inference, where I have embedded symbols in literally the same gold standard (“training data”) of my early article https://screwlisp.small-...

itch.io

Show thread

Thuna Mar 19

@screwlisp I've been trying to make sense of the code, but I'm not managing it. Could you explain what code you ran to get the "Time 0-1-2" output you're showing?

Am I correct in my understanding that for SUM and CONC branches to be meaningful you need to define the corresponding handlers in DL-ROC? As it stands they don't seem to do anything.

Show thread

screwlisp Mar 19

@thuna yes, it is a framework using the condition system 'protocol'. SUM, CONC, COLLECT (and MSCP) are all just local restarts that are potentially available to a COMPARISON signal, and they were chosen to correspond to loop's :sum, :nconc and :collect . (and just :do, for mscp which probably should also have had a restart).

I can explain the time-0-1-2 code from yet-unpublished-article. The data and starting context are the ones used. feed, sharks and humans are feature neurons read by a fun.

Show thread

screwlisp Mar 19

@thuna So in order to get something collected by cl:loop, you would

(defun collect-this (c &aux (r (find-restart 'collect c)))
(when r (invoke-restart r 'thing-to-collect)))

(handler-bind ((comparison #'collect-this))
...)

Show thread

screwlisp Mar 19

@thuna 3/1
The article is scheduled for part-2 -being-a-more-formal-writeup.

My ROC update rule equation adapted from Krotov and Hopfield 2016 for time n+1 is a receiver operating characteristic of the neurons at timestep n, which I initially wrote to show that a grid of sensors whose output is an instruction to feed sharks or feed humans (at my recurring theme of a shark aquarium restaurant) can have training data trivially added to make the hallucination at time step 1 of Feed sharks humans.

Show thread

screwlisp Mar 19

@thuna I could do with some scrutiny. Try writing
Χᵩᵢ = TPᵩⱼ + TNᵩⱼ − FPᵩⱼ − FNᵩⱼ, 𝑗≠𝑖
Yᵩᵢ = P( Χᵩᵢ + ξᵩᵢ ) − P( Χᵩᵢ − ξᵩᵢ )
I am calling Y a receiver operating characteristic because X is X(TP, TN, FP, FN).
Vᵢ⁽ⁿ⁺¹⁾ = -1 if (minusp ∑ᵩYᵩᵢ) else +1.
adapted from:
Krotov and Hopfield 2016
Vᵢ⁽ⁿ⁺¹⁾=sign[∑ᵩ(F(+ξᵩᵢ+∑ⱼξᵩⱼVⱼ⁽ⁿ⁾)-F(-ξᵩᵢ+∑ⱼξᵩⱼVⱼ⁽ⁿ⁾))], 𝑗≠𝑖, 𝜑 indexing into the memories, neurons V at timesteps 𝑛 and 𝑛+1. F, a rectified polynomial.

Show thread

Thuna

@screwlisp I am currently trying to rewrite your code in the CLOS framework, to see if I can more easily follow that, but afterwards I do want to see what the Krotov and Hopfield 2016 paper says to try and square that up against your writeup.

Show thread

screwlisp Mar 19

@thuna idk why it is not loading for me but it should've been the link https://proceedings.neurips.cc/paper_files/paper/2016/file/eaae339c4d89fc102edd9dbdb6a28915-Paper.pdf
which I think is a 2016 conference paper by Krotov and Hopfield titled Modern Hopfield networks which has a couple formulations of modern hopfield networks and a statement/demonstration of duality to deep learning's ffnns of a single hidden layer.

Show thread

Thuna Mar 19

@screwlisp It's probably https://arxiv.org/abs/1606.01164?

Dense Associative Memory for Pattern Recognition

A model of associative memory is studied, which stores and reliably retrieves many more patterns than the number of neurons in the network. We propose a simple duality between this dense associative memory and neural networks commonly used in deep learning. On the associative memory side of this duality, a family of models that smoothly interpolates between two limiting cases can be constructed. One limit is referred to as the feature-matching mode of pattern recognition, and the other one as the prototype regime. On the deep learning side of the duality, this family corresponds to feedforward neural networks with one hidden layer and various activation functions, which transmit the activities of the visible neurons to the hidden layer. This family of activation functions includes logistics, rectified linear units, and rectified polynomials of higher degrees. The proposed duality makes it possible to apply energy-based intuition from associative memory to analyze computational properties of neural networks with unusual activation functions - the higher rectified polynomials which until now have not been used in deep learning. The utility of the dense memories is illustrated for two test cases: the logical gate XOR and the recognition of handwritten digits from the MNIST data set.

arXiv.org

Show thread

screwlisp Mar 19

@thuna looks like it, I think I am saying a different paper with the same authors. I will update my blog when I finally do a little maintenance.

Show thread

screwlisp Mar 19

@thuna Equation (4) is what I summarised above.

Show thread

screwlisp Mar 19

@thuna and thanks for taking a look!

Show thread

screwlisp Mar 19

@thuna @etenil who also rewrote my code not to be crazy before in an early version of exactly this.

Show thread

Thuna Mar 19

@screwlisp So, one thing I'm confused about; why does the first row of CONTEXT look like that? At what point are the second items in e.g. (0 FEED) used?

Also, I don't understand what the point of splitting CONTEXT and GOLD-STANDARD into rows is, since you are just summing/collecting/concatenating over every item.

Show thread

screwlisp Mar 19

@thuna on the first topic, because my neurons are conses with a predicate on a cons.

Most of your questions are covered in an article I wrote first, but did not publish first since I worried it came out too strongly on implementation details and unqualified equations so I moved it to part 2, though I will release it later today after some appointments.

I think emphasising multispectrality is important, and jaggedness being possible. Bands of an audio chirp? Spatial 2D mask?