@Jyoti Yeah whenever I see some evangelist prattling on about AGI I think about how bigtime LLMs are currently using about the maximum amount of resources any computing project plausibly *could*.

And none of those resources are getting cheaper; in fact some (like water) are only going to get *more* expensive.

If AI companies want to build something that's several generations beyond bigtime LLMs in scale and reasoning ability, like AGI, they may as well be aspiring to build a 20 km skyscraper.

@Jyoti tech entrepreneur: do you know how much we will save in sandwiches

@Jyoti

Oh the power plant problem is one i love to point out too. A 100 watt brain vs a gigga watt computer cluster.

It's what i said when IBMs Watson did that game show nonsense. Require Watson to be in the room running on their own mobile power source and have it beating people at quizzes then we'll talk.

@doctormo @Jyoti also, if you are running a brute force operation, all the power in the world will eventually hit a category limit. There is no path towards intelligence there. See https://pxi.social/@jakob/110283974473306733
jakob.pxi (@[email protected])

#LLM are brute-forcing their way through absurd amounts of data to generate an autocomplete output for any given input that approximates outputs a human might give instead. They lack a few distinct properties of human cognition, including language, that more brute force alone cannot compensate for. Because they can only ever internalize and compute *intra*textual context. Incidentally, humans need much less input(!) to learn language. Probably because they can contextualize across domains. 🧵

pxi.mastodon

@jakob @doctormo @Jyoti

Your points are well made. Yet:
1 -- Flagship LLMs are rather good at metaphor
2 -- Making a transformer bidirectional, closer to a CSP solver (and the brain) would likely address most of your criticisms. Seems like a path (I hope!)

@m8ta @doctormo @Jyoti

My experience with LLMs and my understanding of their architecture (granted, have not engaged past summer last year) would say otherwise: These models can only reproduce metaphorical patterns that are part of their training.

Also they don't attempt to mimic an algorithmic model of the human lexicon at all. If you want to look into knowledge representation bound to a syntax interface, something like Barsalou Frames seems much more closely aligned with human cognition.

@jakob @doctormo @Jyoti

True. There are a lot of metaphors in the training data, so it's likely just emulating us. Can you suggest one that LLMs fail at?

I've noticed it sucks at CSPs that it's not seen. (Probably due to the feedforward structure)

I don't know about Barsalou Frames, will read up on it!

@m8ta @doctormo @Jyoti

It's not "likely", regurgitating training patterns through a fuzzy weighted autocomplete is the whole architecture. There is no emulation of human cognition.

The output quite reliably fails for ad-hoc metaphors. Worse still: when you need a shared origo to decode the metaphor, even for discourse deixis, that's simply inaccessible to a model.

Convincing models usually are designed to detect prompts that lead to user frustration and fall back to repair questions, btw.

@Jyoti no serious person believes that AGI is an extension of an LLM right? 😅

@Jyoti

Calculations had been done in billions of years. It is called evolution.

All the AI "geniuses" don't see the obvious.

@Jyoti this can be read both ways

@Jyoti FWIW: I'm working on something like that: models with increased sample efficiency that learn actively, like humans. Also, I'm hiring!

https://www.linkedin.com/jobs/view/3801462876

@epiceneVivant @macindahaus @doctormo All this is (obviously) true. Yet: bitcoin miners are within the same factor of ~100 of the Landauer limit as the brain. So, it's possible in silicon; you just need the right algorithms, computational structure & connectivity. (E.g. wiring in the brain is much more efficient as it's 3D)

@m8ta When you say "bitcoin miners are within the same factor of ~100 of the Landauer limit as the brain", what does that mean and how is it a relevant response to what I said?

I gave it a google but it seems like there's some dense thermodynamic math at work there. Physics never was my strong suit.

@epiceneVivant There is a theoretical limit to making or destroying bits -- e.g. doing computation -- that's set by thermodynamic limits. I've seen estimates (in a very hand-wavey way: we don't really know how the brain works) that the brain is within 2-3 orders of magnitude (>100x) of this limit.

Bitcoin miners, which are doing a much simpler computation, seem to be approaching this energy efficiency.

So there's no reason Si AI can't be efficient (in the future). https://www.lesswrong.com/posts/mW7pzgthMgFu9BiFX/the-brain-is-not-close-to-thermodynamic-limits-on

@m8ta Sorry, this just seems like a non sequitur to me. Also I'm not convinced you understand the physics involved any better than I do.

@epiceneVivant Hah, probably not.

Let's do some calcs: Landauer limit is 0.02 eV. ATP is 0.3 eV. Typical cells consume 1e7 ATP/sec [1].

[2] pegs neuron bandwidth around 100 bits/sec; assuming processing is 10x that (subthreshold activity etc) yields 1e5 x limit!

[3] Estimates the H100 at 17e3 eV/bit, which is 1e6 x limit.
(they note energy cost is 1/10th the purchase price)
¯\_(ツ)_/¯

[1] https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4230611/
[2] https://mitpress.mit.edu/9780262181747/spikes/
[3] https://arxiv.org/abs/2312.08595

The quantified cell

The microscopic world of a cell can be as alien to our human-centered intuition as the confinement of quarks within protons or the event horizon of a black hole. We are prone to thinking by analogy—Golgi cisternae stack like pancakes, red blood ...

PubMed Central (PMC)

@m8ta This means nothing to me.

You still haven't answered my second question, "and how is [that] a relevant response to what I said?"

I wrote a metaphor about how unachievable the computational resources needed for AGI likely were.

You've responded with a thermodynamic principle that can't possibly be relevant and, now, some equations and figuring on the same topic.

This is *still* a non sequitur. A 3rd post about physics minutiae will continue to be one, fyi. Try something on-topic maybe.

@epiceneVivant

> Yeah whenever I see some evangelist prattling on about AGI I think about how bigtime LLMs are currently using about the maximum amount of resources any computing project plausibly *could*.

[...]

I'm saying, in terms of energy, (surprisingly) the brain and 2023 CMOS are rather similar. Algorithms & communication & topology are blocking, not efficiency. See @albertcardona Albert Cardona's post.

Albert Cardona (@[email protected])

@[email protected] Biological brains learn in a fundamentally different way to present-day artificial neural networks: "A critique of pure learning and what artificial neural networks can learn from animal brains" by Zador 2019 https://www.nature.com/articles/s41467-019-11786-6 And like @[email protected] et al. (2009) put it: https://web.archive.org/web/20190301225154/http://pdfs.semanticscholar.org/99ee/ff110781fbc3767ff1994c3426151b8912df.pdf "The design of biological neural computation is very different from that of modern computers. Neuronal networks process information using energy-efficient, asynchronous, event-based methods. Biology uses self-construction, self-repair, and self-programming, and it has learned how to flexibly compose complex behaviors from simpler elements." #neuroscience #neuromorphic

Mathstodon

@m8ta Thank you.

@albertcardona

I think where me and you two differ is in the belief that theoretical considerations about what AI technology *might be like someday* are a meaningful counterpoint to a pragmatic observation about what AI technology *is currently like right now*.

In our sci-fi imaginings a human-sentience-equivalent AI is compact enough to fit within the computer system of a ship (Star Trek, Mass Effect) or even within a handheld device (_Her_).

@m8ta @albertcardona

So far the first big leap in AI capability, bigtime LLMs, can barely fit inside the biggest data centers we have.

I'm aware that much smaller implementation of LLMs— which I'll be studying later this year— have far more limited resource needs. So the observation I made pertains to ChatGPT, Bard, & co., but not to LLMs as a whole. I don't know enough about LLMs as a whole to grok the resource consumption curve. Get back to me in like April.

@m8ta @albertcardona

LLMs with the performance characteristics of ChatGPT or Bard are new as of last year. This is 1st-generation stuff. One can postulate that the training corpus needs might be brought down over time, to something a supercomputer could get done over a weekend. And I don't know much about how computationally expensive *a single execution of ChatGPT is*. (Again, get back to me in April.) Or what the potential is to bring that down to something a phone could handle.

@m8ta @albertcardona

Notice all my reasoning comes from CS & the practicalities of algorithm implementation. It's my bailiwick; but it's also nuts-and-bolts engineering. Looking at this implementation, it's 1st-gen, what might 10th-gen look like? Straightforward and grounded.

When you can debate the future of AGI implementation based on *current real-life AI engineering issues*, get back to me. This is a programming problem. Beyond data center heat dissipation, thermodynamics isn't relevant.

@epiceneVivant

> Algorithms & communication & topology are blocking, not efficiency.

> This is a programming problem.

Seems we're in agreement.
Keep us posted ~

@m8ta

Give me some credit hon. My memory isn't that short.

Your argument that efficiency isn't an issue came packaged in 1034 characters of technobabble about entropy and theoretical limits.

Now that I've dismissed arguments from thermodynamics out of hand, your point that "Algorithms & communication & topology are blocking, not efficiency" no longer lands.

Get back to me when you have a programming- or engineering-based argument.

@Jyoti

AND it can be produced using unskilled labour!

@Jyoti Yeah, but you need to treat human brains ethically

@Jyoti

Biological brains learn in a fundamentally different way to present-day artificial neural networks:

"A critique of pure learning and what artificial neural networks can learn from animal brains" by Zador 2019
https://www.nature.com/articles/s41467-019-11786-6

And like @giacomoi et al. (2009) put it:
https://web.archive.org/web/20190301225154/http://pdfs.semanticscholar.org/99ee/ff110781fbc3767ff1994c3426151b8912df.pdf

"The design of biological neural computation is very different from that of modern computers. Neuronal networks process information using energy-efficient, asynchronous, event-based methods. Biology uses self-construction, self-repair, and self-programming, and it has learned how to flexibly compose complex behaviors from simpler elements."

#neuroscience #neuromorphic

A critique of pure learning and what artificial neural networks can learn from animal brains - Nature Communications

Recent gains in artificial neural networks rely heavily on large amounts of training data. Here, the author suggests that for AI to learn from animal brains, it is important to consider that animal behaviour results from brain connectivity specified in the genome through evolution, and not due to unique learning algorithms.

Nature

@Jyoti

... but the AI never forgets...

@Jyoti Yeah, it's because human brain had a billion years of head start.

Also human brain:

1. Runs on horribly sloooooow hardware. Neurons' latency is like TENS OF MILLISECONDS, that's slower than a 50's computer.

2. But all those neurons are all firing simultaneously. Human brain is infinitely parallelized. And how many threads does a GPU have? Like thousand? That's nothing compared to tens of billions of neurons all running in parallel.

But that's fixable.

@Jyoti

3. Also human brain architecture is SPECIFICALLY designed (evolved) to be a brain. Von Neumann computer architecture is designed as a device that sequentially reads memory in chunks by 64 bits, then another chunk, then adds them, and writes that chunk to memory, then reads another chunk, and so on. And the neural networks are emulated by that process using FLOATING POINT NUMBERS oh god how ugly and inefficient that is.

But that's also fixable.

@Jyoti 4. Human brain is really good at doing things we evolved for - running, gathering food, outsmarting fellow humans in social games, and also can learn new things like writing, reading and even doing a little bit of art, science, math and stuff like that. Progress is very slow though. A time that it takes for one average human to actually invent something new is like ten years or something? All we do is "standing on the shoulders of giants", repeating what other humans invented for us.
@Jyoti 5. But human brain can't learn for example how to meaningfully operate in 643-dimensional space. We can write that down mathematically, but we can't live there. Our brains are just not wired for it. Neural networks don't care, they learn EVERYTHING from scratch, they don't have a head start. Recognizing projections of 3d objects onto 2d images is as foreign to them as 643-dimensional space to us. And they do it pretty well.

@Jyoti Can you just look at trillions of pictures and be able to draw a new one like that? In MILLISECONDS?

Can you read petabytes of text written by aliens and IMMEDIATELY INTUITIVELY WITHOUT EVEN ANALYZING know their language and some basic facts about their world such that you can write a long poem in their language in a couple of seconds and real actual alien won't be able to tell?

@Jyoti You know what? We're so fucking stupid compared to computers. We're so stupid that we invented them and still with all our civilization can't figure out how to make them do what we want without making them learn from scratch on shit tons of data. Without making them learn 643-dimensional universes instead of just fucking programming in what we want. We're so hopelessly stupid.
@Jyoti Yes, human brain doesn't need infrastructure and runs on sandwiches. Oh, wait a second. Human brain needs human body that converts sandwiches into ATP and we need a whole fucking earth biosphere to produce sandwiches, don't we? Isn't that MUCH more complicated process than production of processors and power plants? We just getting it for granted and have zero idea how to reproduce it from scratch.

@Jyoti But once weights & biases are determined, cloning and even tuning the model is essentially free compared to the cost, time, and carbon footprint of an entire new human.

A few gigs of disk space << One human's worth of sandwiches

@Jyoti scrolling through my bookmarks and I realize I quote this post... a lot.