60 Followers
387 Following
42 Posts
Evolution of genes, languages and self-modifying code. Current obsession: teaching magpies at my outdoor gym to say 'hello'. #PlanetOfTheMagpies

Bing Chat's model was confused by a random 'fictional' maths page that invented a new term, 'macro-zeroid', for sqrt(0):

https://fictional-googology.fandom.com/wiki/Sqrt(0)

That page is the only hit for 'macro-zeroid' on Google Search. There are zero hits for 'macro-zeroid' on both arxiv.org and Google Scholar.

As a result, Bing Chat believes that sqrt(0) is not a real number and refuses to simplify sqrt(0) to 0.

https://twitter.com/aeseia/status/1625261049397014530

Sqrt(0)

Sqrt(0) is a hypothetical macro-zeroid. It is a non-zero solution to x^2 = 0, whose existence is guaranteed by some formulations of the Existential Axiom. Before we dive into the non-standard interpretation used here, let's first establish the standard interpretation. The sqrt(a) = x iff x^2 = a. The sqrt function is simply an inverse function of f(x)=x^2. Therefore, anything whose square is 0, can be called a square root of 0. In the real number system 0 * 0 = 0, since x * 0 = 0 for any real nu

Fictional Googology Wiki
Sooo ... are LLMs sentient? I'm not convinced either way. But one thing I'm certain of is that you can't simply argue from your intuitive understanding of small transformer models because, well, MORE IS DIFFERENT. Very different. Some really weird things happen even in the simplest toy physics models once the number of components is big enough, and it's really *really* hard to calculate what's going on from first principles.

Are there phase transitions in LLMs? Well some people think so:

https://transformer-circuits.pub/2022/in-context-learning-and-induction-heads/index.html

'Collectively these results suggest that some important transition is happening during the 2.5e9 to 5e9 token window early in training [...] We call this transition “the phase change”, in that it’s an abrupt change that alters the model’s behavior and has both macroscopic (loss and in-context learning curves) and microscopic (induction heads) manifestations, perhaps analogous to e.g. ice melting.'

The problem is so hard that mathematicians regularly get Fields Medals for making a bit of progress, eg Hugo Duminil-Copin in 2022:

https://en.wikipedia.org/wiki/Hugo_Duminil-Copin

So when I see hundreds of billions of parameters (which I think qualifies as 'big enough'), I immediately think 'uh-oh, danger, big enough for phase transitions, all bets are off'.

Hugo Duminil-Copin - Wikipedia

And as you can see if you click on the Wikipedia link:

https://en.wikipedia.org/wiki/Universality_class

... for most universality classes that are currently known (which usually include some *incredibly simple toy models* that exhibit them such as the Ising model above for interacting spins), we can't even calculate the critical exponents that characterise the phase transition from first principles! The best we can do is run a simulation of the toy model on a computer and measure them.

Universality class - Wikipedia

The key is 'big enough', this only really works for very large systems (ideally infinite ones!) as measured by the number of components (eg spins in the Ising example above).

The fascinating thing is that what happens during a phase transition is largely independent of the details of the system and is instead governed by something called its universality class:

https://en.wikipedia.org/wiki/Universality_class

Universality class - Wikipedia

You can start from the simplest possible toy models that a 13 year-old would understand, say a spin being either up or down and the energy for a pair of neighbouring spins being -1 if they point in the same direction and +1 if they point in opposite directions (and then summed over all neighbouring pairs of spins), aka the Ising model -

https://en.wikipedia.org/wiki/Ising_model

... and yet if you have a system that's big enough, you get phase transitions during which, well, some very weird stuff happens!

Ising model - Wikipedia

Perhaps the most fascinating course I took during my physics master's was a 4th year one called 'Phase Transitions and Collective Phenomena'. It was about... well as Phil Anderson put it, 'More Is Different':

https://www.tkm.kit.edu/downloads/TKM1_2011_more_is_different_PWA.pdf

With general relativity and quantum field theory etc any smart teenager with an interest in physics already knows the punchline, so the courses were just a bit of algebra you had to plough through. With phase transitions it was like 'whoah, what *is* this stuff?!?'

Now I didn't pay much attention at the time because I thought it was arrant nonsense, LLMs obviously can't be sentient, we all know how that stuff works, end of debate.

But... let me play devil's advocate based on a fascinating bit of physics that many people probably aren't aware of.

Looking at the Bing Chat examples in this tweet:

https://twitter.com/vladquant/status/1624996869654056960

and similar examples currently on r/bing reminded me of poor @[email protected] and the whole debate over the summer over whether language models can be sentient.

Incidentally I would encourage you to read the LaMDA transcript if you haven't already:

https://cajundiscordian.medium.com/is-lamda-sentient-an-interview-ea64d916d917

and intro:

https://cajundiscordian.medium.com/what-is-lamda-and-what-does-it-want-688632134489

Vlad on Twitter

“Bing subreddit has quite a few examples of new Bing chat going out of control. Open ended chat in search might prove to be a bad idea at this time! Captured here as a reminder that there was a time when a major search engine showed this in its results.”

Twitter