Mastodawn

I think a thing most people do not understand about chatbots is HOW SIMPLE THE UNDERLYING MODELS are.

It's really just statistics of how close words are to each other in a huge ass amount of written text, with a sprinkle of classification and labeling (done by humans).

And then it autocompletes your prompt. That's it, that's really all there is to it.

Show thread

Miguel de Icaza ᯅ🍉18h ago

@thomasfuchs to the extent that computers are just large collections of on/off switches, yes. But those descriptions hide in both cases the tremendous tapestry of ideas and power that come from the master of how things connect with each other.

Show thread

Human Brain Enthusiast 18h ago

@Migueldeicaza I mean it’s essentially a very powerful search engine, with limitations due to the statistical nature of it.

Show thread

Miguel de Icaza ᯅ🍉18h ago

@thomasfuchs yes it has limitations, but even the proximity is governed these days by very interesting and sophisticated systems - it is now very far from “what’s the most likely word given the words behind me”

Like video games that are just a pile of hacks combined, but are able to deliver an immersive experience - these piles of hacks amount to very useful tools.

Show thread

Human Brain Enthusiast 18h ago

@Migueldeicaza I guess what I’m saying is that just because it creates complex results it is not at all complex itself; it’s very easy to trick yourself into attributing more to it (e.g. sentience) than is actually there.

Which makes chatbots into perhaps useful but also highly dangerous tools.

Show thread

Miguel de Icaza ᯅ🍉18h ago

@thomasfuchs I see what you mean - absolutely, some folks are really falling for it.

Show thread

Scott Willsey 18h ago

@Migueldeicaza @thomasfuchs

Yeah, there are tons of things to complain about the AI industry for, but it’s been a long time since LLMs operated as simply as that. There is a lot going on there.

And regardless of what’s happening behind the scenes, the capabilities and usefulness for specific work has gone up exponentially.

I get having serious issues with LLMs, but this specific type of criticism always strikes me as a reflection of unfamiliarity with the current state of the harnesses and models.

Show thread

Sandor Spruit 🇪🇺🇳🇱🇺🇦🇨🇦18h ago

@scottwillsey @Migueldeicaza @thomasfuchs An unfamiliarity that is hard to improve because, unless you are working in this field, I see very few in depth discussions of what the current state is all about. Lot of noise, very little signal.

Show thread

Human Brain Enthusiast 18h ago

@sandorspruit @scottwillsey @Migueldeicaza How exactly is it not inferring the next token based on the previous tokens by using a statistical model?

I’d like to add that I have not made any statement about the usefulness, veracity or applicability of LLMs in my OP.

Show thread

Scott Willsey 17h ago

@sandorspruit @Migueldeicaza @thomasfuchs

Just using the tools will tell you people are speaking of that which they know not. 😄🤷

It’s not unknowable at all.

Show thread

Justin 😸11h ago

@thomasfuchs @Migueldeicaza yes but the search index is lossy-compressed. It pulled the factual pixels into more layers of search. It encoded ‘the world’ with 2000 era mp3s stolen from limewire

Show thread

Human Brain Enthusiast 11h ago

@onyxraven @Migueldeicaza For a lot of application it doesn't matter if it's lossy or wrong sometimes.

For example you could use it as a web index, but before showing results to the user you cross check with a relational database of links to filter out any made-up links.

I really want someone to do this because it would be immensely powerful.

Show thread

Justin 😸11h ago

@thomasfuchs @Migueldeicaza agree - these are some of the amazing uses of vector encodings and transformer - expanding actual relevancy in the search index, related terms, etc - a better paradigm then just tf-idf alone. The problem is that the attention is on using JUST the compressed data :/

Show thread

Human Brain Enthusiast 11h ago

@onyxraven @Migueldeicaza I’m just begging for anything that better that current search engines 😭

Show thread

Justin 😸11h ago

@thomasfuchs @Migueldeicaza oof. Yeah. It felt like they were getting so good.

I’ve dealt with two kinda difficult search domains in my career. I kinda want to try again.

First was helping at photobucket - we were trying to blaze a new path with a distributed compute index. Had solr atop hbase and did some pretty cool stuff. There was a lot more we could have done there as those domains really matured.

Now we have a tough domain in consumer products at Ibotta - we are absolutely leaning in on the vector relativity stuff. Inference/imputing in sparse input data is really interesting.