I think a thing most people do not understand about chatbots is HOW SIMPLE THE UNDERLYING MODELS are.

It's really just statistics of how close words are to each other in a huge ass amount of written text, with a sprinkle of classification and labeling (done by humans).

And then it autocompletes your prompt. That's it, that's really all there is to it.

@thomasfuchs to the extent that computers are just large collections of on/off switches, yes. But those descriptions hide in both cases the tremendous tapestry of ideas and power that come from the master of how things connect with each other.
@Migueldeicaza I mean it’s essentially a very powerful search engine, with limitations due to the statistical nature of it.
@thomasfuchs @Migueldeicaza yes but the search index is lossy-compressed. It pulled the factual pixels into more layers of search. It encoded ‘the world’ with 2000 era mp3s stolen from limewire

@onyxraven @Migueldeicaza For a lot of application it doesn't matter if it's lossy or wrong sometimes.

For example you could use it as a web index, but before showing results to the user you cross check with a relational database of links to filter out any made-up links.

I really want someone to do this because it would be immensely powerful.

@thomasfuchs @Migueldeicaza agree - these are some of the amazing uses of vector encodings and transformer - expanding actual relevancy in the search index, related terms, etc - a better paradigm then just tf-idf alone. The problem is that the attention is on using JUST the compressed data :/
@onyxraven @Migueldeicaza I’m just begging for anything that better that current search engines 😭

@thomasfuchs @Migueldeicaza oof. Yeah. It felt like they were getting so good.

I’ve dealt with two kinda difficult search domains in my career. I kinda want to try again.

First was helping at photobucket - we were trying to blaze a new path with a distributed compute index. Had solr atop hbase and did some pretty cool stuff. There was a lot more we could have done there as those domains really matured.

Now we have a tough domain in consumer products at Ibotta - we are absolutely leaning in on the vector relativity stuff. Inference/imputing in sparse input data is really interesting.