@StarkRG @Some_Emo_Chick my guess is the last thing people who launder money, evade taxes and scam other people care about is the climate.
But glad that grift is over, can't wait to see what the next obvious one is
@Aradiel @erikcats @Some_Emo_Chick LLMs are a solution looking for a problem. You can usually tell by the way it's marketed as being useful for anything and everything while not actually being better than anything that already exists.
Other types of generative AI aren't as bad, though that isn't saying much since LLMs are the literal worst. There are, at least, a handful of cases where they have advantages over existing solutions, but they still need a lot of handholding.
@erikcats
LLM was trained by "looking" at text and finding patterns and rules. The original text itself is not stored in the trained model. Only the patterns which has been found. LLM is creating text word for word. Always calculating the most probable word based on all the words preceding it.
Summary: The created text by LLM is a patchwork of guessing and not a copy of information.
Why would someone train a LLM only on one news article? And the question would be, is this enough training data for the LLM to create meaningful sentences afterward.
Nice thought. 😀 But often relations are not linear dependent on each other. Your example could lead to overfitting (point proved) or underfitting (point missed).
I added a screenshot for the explanation of overfitting and underfitting.
@seismographix @StarkRG @erikcats @Some_Emo_Chick getting into a grey area here, but in my view, copied data that is corrupted in copying is still copied (in this case it's the transformation corrupting it)
Eg. Download two files, which are 1s and 0s. Shuffle them together
You can't get either file back out, but you still copied them in the first place
Of course, the training input must be from free sources. And it would be correct to let people decide if they want to contribute to the training data.
It is not text only, but here is the image and text database LAION-5B. https://laion.ai/blog/laion-5b/