Mastodawn

StarkRG Sep 25, 2023

@Some_Emo_Chick My initial reaction to them was something along the lines of "Ok, sure, so, what's the point?" And, as it turned out, the point was money laundering, tax evasion, and scamming people. I don't think making climate change worse was entirely intentional, people making them just didn't give a shit.

Show thread

Erik (OLD ACCOUNT)Sep 25, 2023

@StarkRG @Some_Emo_Chick my guess is the last thing people who launder money, evade taxes and scam other people care about is the climate.

But glad that grift is over, can't wait to see what the next obvious one is

Show thread

Joe Dowland Sep 26, 2023

@erikcats @StarkRG @Some_Emo_Chick the next obvious one is "AI" (put in quotes because it's not intelligent, it's just a marketing scam)

Show thread

StarkRG Sep 26, 2023

@Aradiel @erikcats @Some_Emo_Chick LLMs are a solution looking for a problem. You can usually tell by the way it's marketed as being useful for anything and everything while not actually being better than anything that already exists.

Other types of generative AI aren't as bad, though that isn't saying much since LLMs are the literal worst. There are, at least, a handful of cases where they have advantages over existing solutions, but they still need a lot of handholding.

Show thread

Erik (OLD ACCOUNT)Sep 26, 2023

@StarkRG @Aradiel @Some_Emo_Chick explain to someone who's not a techie what LLMs are, without resorting to LMGTFY or similar things

Show thread

seismographix Sep 30, 2023

@erikcats
LLM was trained by "looking" at text and finding patterns and rules. The original text itself is not stored in the trained model. Only the patterns which has been found. LLM is creating text word for word. Always calculating the most probable word based on all the words preceding it.

Summary: The created text by LLM is a patchwork of guessing and not a copy of information.

@StarkRG @Aradiel @Some_Emo_Chick

Show thread

Joe Dowland Sep 30, 2023

@seismographix @erikcats @StarkRG @Some_Emo_Chick what is the training data of not a collection of patterns of words?

Show thread

StarkRG Sep 30, 2023

@Aradiel @seismographix @erikcats @Some_Emo_Chick Among other things, you're unlikely to get the original back as an output, just said that's vaguely similar to the original. It's still close enough to plagiarism that I think it counts.

Show thread

seismographix Sep 30, 2023

@StarkRG
We should start to differentiate. Create an example please. Take an news article and recreate it with chatGPT. One rule so: You are not allowed to instruct chatGPT how to fix the output afterwards. In the last case you as human being would be the driver for plagiarism.
@Aradiel @erikcats @Some_Emo_Chick

Show thread

Joe Dowland Sep 30, 2023

@seismographix @StarkRG @erikcats @Some_Emo_Chick for such an example I would want the training data to be restricted to only that article

Show thread

seismographix Sep 30, 2023

@Aradiel

Why would someone train a LLM only on one news article? And the question would be, is this enough training data for the LLM to create meaningful sentences afterward.

@StarkRG @erikcats @Some_Emo_Chick

Show thread

Joe Dowland Sep 30, 2023

@seismographix @StarkRG @erikcats @Some_Emo_Chick because it would prove my point that it is copying the data. It's transforming it first, but it is storing a copy of it

Show thread

seismographix Sep 30, 2023

@Aradiel

Nice thought. 😀 But often relations are not linear dependent on each other. Your example could lead to overfitting (point proved) or underfitting (point missed).

I added a screenshot for the explanation of overfitting and underfitting.

@StarkRG @erikcats @Some_Emo_Chick

Show thread

Joe Dowland Sep 30, 2023

@seismographix @StarkRG @erikcats @Some_Emo_Chick getting into a grey area here, but in my view, copied data that is corrupted in copying is still copied (in this case it's the transformation corrupting it)

Eg. Download two files, which are 1s and 0s. Shuffle them together
You can't get either file back out, but you still copied them in the first place

Show thread

seismographix Sep 30, 2023

@Aradiel

Of course, the training input must be from free sources. And it would be correct to let people decide if they want to contribute to the training data.

@StarkRG @erikcats @Some_Emo_Chick

Show thread

Joe Dowland Sep 30, 2023

@seismographix @StarkRG @erikcats @Some_Emo_Chick if only that were actually what's happening

Show thread

seismographix Sep 30, 2023

@Aradiel

You can check it out. At least for the open source LLMs. And one important thing, someone has to ensure that the training data has the right quality. Misspelled YouTube comments are not the appropriate training data. The quality verification is a tedious work.
You can experience it yourself, when contributing to this open-source LLM:
https://open-assistant.io/de

@StarkRG @erikcats @Some_Emo_Chick

Open Assistant

Konversations-KI für alle. Ein Open-Source-Projekt zur Erstellung eines Chat-fähigen GPT LLM, das von LAION und Mitwirkenden auf der ganzen Welt betrieben wird.

Show thread

Erik (OLD ACCOUNT)Sep 30, 2023

@seismographix @Aradiel @StarkRG @Some_Emo_Chick just to be clear, I do not have any it training.

What I do have is big fat question marks with the idea you seem to be trying to push that there is a thing such as a standard for ethically trained AI. What you're saying sounds both extremely rare and extremely against the grain of an Economic model where taking value and returning as little as possible is the industry standard