People trying to train AIs are now complaining that all of the AI data on the internet are making it hard for them to get quality training sets of natural language and images.

*bitter snickering*

@futurebird One thing that's pretty clear is that LLMs don't learn very efficiently. None of us inhaled that much data to learn to speak one (or more) languages. None of us inhaled that much data to learn to recognize dog breeds, or plants, or ants, etc. The thing that the LLMs seem to have learned better than (most of) us is multi-subject "man on the Internet" confidence.

OTOH, perhaps our human ability to "learn efficiently" makes us vulnerable to learning conspiracy theories from bullshit.

@dr2chase @futurebird Uhhhh, how many years did it take your "efficient" monkey brain to learn language? LLMs may need tons of data, but they make sense of it in days of training, not years. Also, you had far more data feeding your learning than any LLM that has ever existed. That data just didn't seem like data to you. It seemed like "listening to your mom" and "watching TV".
@blterrible @futurebird The larger LLMs have as much text fed to them as thousands of humans could read in their thousands of lifetimes. Your claim fails simple arithmetic.
@dr2chase @futurebird Text is not the only way humans learn language. ChatGPT never had a mother lean over it's crib and coo at it and yet babies start learning language that way very early on. ChatGPT was not trained on all the episodes of Gilligan's island and has little mapping between the usage of hats to represent roles and characters, yet all that maps to language as well. An image captioned "You're dead!" conveys no meaning to an LLM, and little to you without the non-text image.