Mastodawn

I think a thing most people do not understand about chatbots is HOW SIMPLE THE UNDERLYING MODELS are.

It's really just statistics of how close words are to each other in a huge ass amount of written text, with a sprinkle of classification and labeling (done by humans).

And then it autocompletes your prompt. That's it, that's really all there is to it.

Show thread

Miguel de Icaza ᯅ🍉20h ago

@thomasfuchs to the extent that computers are just large collections of on/off switches, yes. But those descriptions hide in both cases the tremendous tapestry of ideas and power that come from the master of how things connect with each other.

Show thread

Human Brain Enthusiast 20h ago

@Migueldeicaza I mean it’s essentially a very powerful search engine, with limitations due to the statistical nature of it.

Show thread

Justin 😸13h ago

@thomasfuchs @Migueldeicaza yes but the search index is lossy-compressed. It pulled the factual pixels into more layers of search. It encoded ‘the world’ with 2000 era mp3s stolen from limewire

Show thread

Human Brain Enthusiast 13h ago

@onyxraven @Migueldeicaza For a lot of application it doesn't matter if it's lossy or wrong sometimes.

For example you could use it as a web index, but before showing results to the user you cross check with a relational database of links to filter out any made-up links.

I really want someone to do this because it would be immensely powerful.

Show thread

Justin 😸13h ago

@thomasfuchs @Migueldeicaza agree - these are some of the amazing uses of vector encodings and transformer - expanding actual relevancy in the search index, related terms, etc - a better paradigm then just tf-idf alone. The problem is that the attention is on using JUST the compressed data :/

Show thread

Human Brain Enthusiast

@onyxraven @Migueldeicaza I’m just begging for anything that better that current search engines 😭

Show thread

Justin 😸12h ago

@thomasfuchs @Migueldeicaza oof. Yeah. It felt like they were getting so good.

I’ve dealt with two kinda difficult search domains in my career. I kinda want to try again.

First was helping at photobucket - we were trying to blaze a new path with a distributed compute index. Had solr atop hbase and did some pretty cool stuff. There was a lot more we could have done there as those domains really matured.

Now we have a tough domain in consumer products at Ibotta - we are absolutely leaning in on the vector relativity stuff. Inference/imputing in sparse input data is really interesting.