Mastodawn

Frankie ✅Sep 25, 2023

Very easy

@Some_Emo_Chick My initial reaction to them was something along the lines of "Ok, sure, so, what's the point?" And, as it turned out, the point was money laundering, tax evasion, and scamming people. I don't think making climate change worse was entirely intentional, people making them just didn't give a shit.

Erik (OLD ACCOUNT)Sep 25, 2023

@StarkRG @Some_Emo_Chick my guess is the last thing people who launder money, evade taxes and scam other people care about is the climate.

But glad that grift is over, can't wait to see what the next obvious one is

Joe Dowland Sep 26, 2023

@erikcats @StarkRG @Some_Emo_Chick the next obvious one is "AI" (put in quotes because it's not intelligent, it's just a marketing scam)

StarkRG Sep 26, 2023

@Aradiel @erikcats @Some_Emo_Chick LLMs are a solution looking for a problem. You can usually tell by the way it's marketed as being useful for anything and everything while not actually being better than anything that already exists.

Other types of generative AI aren't as bad, though that isn't saying much since LLMs are the literal worst. There are, at least, a handful of cases where they have advantages over existing solutions, but they still need a lot of handholding.

Joe Dowland Sep 26, 2023

@StarkRG @erikcats @Some_Emo_Chick I don't like the fact it's going to be forced on us in the next Windows update (I wonder if forced market saturation will truck investors) and also wonder if LLMs are why Google has suddenly become much worse for search results

StarkRG Sep 26, 2023

@Aradiel @erikcats @Some_Emo_Chick Google search results have been terrible for at least a decade. I switched to DuckDuckGo a few months ago. I have a few complaints about functionality (removing a search term using minus doesn't work) and the search results aren't as good as Google's was in the early 2000s, but it's much better than it is now.

I've used Windows as my daily driver for two years out of the last two decades, I'm always highly annoyed when I have to use it for anything.

Erik (OLD ACCOUNT)Sep 26, 2023

@StarkRG @Aradiel @Some_Emo_Chick I have used DDG for years and it does have some functionality issues indeed but boy it's better. Do you also happen to know if ecosia runs on its own browser or uses Google, and if it's a greenwashing op or genuine? Because I've been on the fence for that one

StarkRG Sep 26, 2023

@erikcats @Aradiel @Some_Emo_Chick I generally assume everything like that is greenwashing. I always wonder where the money comes from and where it goes. If it isn't obvious, It probably is.

Also "carbon neutral" is a marketing lie. You can't pay someone to plant trees in Indonesia while polluting in Canada and expect it all to work out. Not to mention that almost all carbon absorbed by plants is released when they die and decompose. Carbon sequestration is the only real solution.

Erik (OLD ACCOUNT)Sep 26, 2023

@StarkRG @Aradiel @Some_Emo_Chick you mean together with extreme reduction in output

StarkRG Sep 26, 2023

@erikcats @Aradiel @Some_Emo_Chick While that goes without saying, if you're absorbing it all and putting it back in the ground where it belongs, I don't see it as strictly necessary, it's just significantly less expensive if you do.

Zimmie Oct 3, 2023

@Aradiel @StarkRG @erikcats @Some_Emo_Chick Google search has gotten worse largely due to content farms trying to rank highly for everything so people will click on them so they can sell ads. LLMs are just mechanical content farms. They’re making things worse because they are much faster than paying humans to type nonsense, but it’s the same problem at higher volume.

Erik (OLD ACCOUNT)Sep 26, 2023

@StarkRG @Aradiel @Some_Emo_Chick explain to someone who's not a techie what LLMs are, without resorting to LMGTFY or similar things

Joe Dowland Sep 26, 2023

@erikcats @StarkRG @Some_Emo_Chick ok.
LLMs are computer programs that have stored reams of (often stolen) text. When you supply it with a question or prompt, it runs calculations to return a sentence that that mathematically is possibly what you want

Erik (OLD ACCOUNT)Sep 26, 2023

@Aradiel @StarkRG @Some_Emo_Chick oh my fucking god that's the generative text stuff ChatGPT runs on

I know everyone is using it but it hurts my linguist heart to see one of the core activities of human functioning be outsourced to garbage protocols

Joe Dowland Sep 26, 2023

@erikcats @StarkRG @Some_Emo_Chick I'm genuinely curious, did you not know what LLMs are before that toot?
I thought you were being smug or sarcastic

Erik (OLD ACCOUNT)Sep 26, 2023

@Aradiel @StarkRG @Some_Emo_Chick no I genuinely did not know

Joe Dowland Sep 26, 2023

@erikcats @StarkRG @Some_Emo_Chick ok. My apologies if I came across at condescending or anything there, then

Erik (OLD ACCOUNT)Sep 26, 2023

@Aradiel @StarkRG @Some_Emo_Chick no worries, I know the OG mastopeeps are 99% programmers and such. I studied historical linguistics and my interest revolve around that, painting plastic dolls with punch daggers and humans in general. Plus I'll ask if I need to know.

StarkRG Sep 26, 2023

@Aradiel @erikcats @Some_Emo_Chick An example is ChatGPT.

To expand on that, it stands for large language model. It creates an internal model of a language using the aforementioned stolen text and uses it to predict what word comes next. It's basically autocorrect with extra hallucinations.

Erik (OLD ACCOUNT)Sep 26, 2023

@StarkRG @Aradiel @Some_Emo_Chick Ahhh Large Language Model

Madafakas gonna use AI to give names to things now, I see

StarkRG Sep 26, 2023

@erikcats @Aradiel @Some_Emo_Chick Yeah, not to be confused with MLMs which are a *whole* other problem. I dread the day that someone creates an MLM selling LLMs. FML.

Erik (OLD ACCOUNT)Sep 26, 2023

@StarkRG @Aradiel @Some_Emo_Chick NGL, LOL

Lydia T. Pott Oct 3, 2023

@erikcats @StarkRG @Some_Emo_Chick @Aradiel
Thank you. I’m a non-tech-savvy pensioner, & I didn’t know it either.

Steve Hill 🏴󠁧󠁢󠁷󠁬󠁳󠁿🇪🇺Sep 30, 2023

@Aradiel @erikcats @StarkRG @Some_Emo_Chick correction: they return the sentence that is statistically the most plausible looking. It's not necessarily what you want, nor is it necessarily truthful, it's just the one that looks most like a human response.

Erik (OLD ACCOUNT)Sep 30, 2023

@steve @Aradiel @StarkRG @Some_Emo_Chick this

StarkRG Sep 30, 2023

@steve @Aradiel @erikcats @Some_Emo_Chick It's more that it's the word its statistical model suggests comes next. It's pretty much just those "enter this phrase then let autocorrect complete the sentence", but with a bigger statistical model. LLMs are truely useless things.

seismographix Sep 30, 2023

@erikcats
LLM was trained by "looking" at text and finding patterns and rules. The original text itself is not stored in the trained model. Only the patterns which has been found. LLM is creating text word for word. Always calculating the most probable word based on all the words preceding it.

Summary: The created text by LLM is a patchwork of guessing and not a copy of information.

@StarkRG @Aradiel @Some_Emo_Chick

Joe Dowland Sep 30, 2023

@seismographix @erikcats @StarkRG @Some_Emo_Chick what is the training data of not a collection of patterns of words?

StarkRG Sep 30, 2023

@Aradiel @seismographix @erikcats @Some_Emo_Chick Among other things, you're unlikely to get the original back as an output, just said that's vaguely similar to the original. It's still close enough to plagiarism that I think it counts.

seismographix Sep 30, 2023

@StarkRG
We should start to differentiate. Create an example please. Take an news article and recreate it with chatGPT. One rule so: You are not allowed to instruct chatGPT how to fix the output afterwards. In the last case you as human being would be the driver for plagiarism.
@Aradiel @erikcats @Some_Emo_Chick

Joe Dowland Sep 30, 2023

@seismographix @StarkRG @erikcats @Some_Emo_Chick for such an example I would want the training data to be restricted to only that article

seismographix Sep 30, 2023

Why would someone train a LLM only on one news article? And the question would be, is this enough training data for the LLM to create meaningful sentences afterward.

@StarkRG @erikcats @Some_Emo_Chick

Joe Dowland Sep 30, 2023

@seismographix @StarkRG @erikcats @Some_Emo_Chick because it would prove my point that it is copying the data. It's transforming it first, but it is storing a copy of it

seismographix Sep 30, 2023

Nice thought. 😀 But often relations are not linear dependent on each other. Your example could lead to overfitting (point proved) or underfitting (point missed).

I added a screenshot for the explanation of overfitting and underfitting.

@StarkRG @erikcats @Some_Emo_Chick

Joe Dowland Sep 30, 2023

@seismographix @StarkRG @erikcats @Some_Emo_Chick getting into a grey area here, but in my view, copied data that is corrupted in copying is still copied (in this case it's the transformation corrupting it)

Eg. Download two files, which are 1s and 0s. Shuffle them together
You can't get either file back out, but you still copied them in the first place

seismographix Sep 30, 2023

A better analogy is, that you copied two files with text and then the AI is analyzing them. As a result, you will get a joint statistics report, which is not distinguishing between both files. There are no individual statistics for each file. When the original files are deleted, you cannot recreate them from the statistics. But you can mimic written text in general.

@StarkRG @erikcats @Some_Emo_Chick

Joe Dowland Sep 30, 2023

@seismographix @StarkRG @erikcats @Some_Emo_Chick I feel you are trying to obfuscate the point while saying it's "better", and also not getting around the copying (which is still the main point)

seismographix Sep 30, 2023

Of course, the training input must be from free sources. And it would be correct to let people decide if they want to contribute to the training data.

@StarkRG @erikcats @Some_Emo_Chick

Joe Dowland Sep 30, 2023

@seismographix @StarkRG @erikcats @Some_Emo_Chick if only that were actually what's happening

seismographix Sep 30, 2023

You can check it out. At least for the open source LLMs. And one important thing, someone has to ensure that the training data has the right quality. Misspelled YouTube comments are not the appropriate training data. The quality verification is a tedious work.
You can experience it yourself, when contributing to this open-source LLM:
https://open-assistant.io/de

@StarkRG @erikcats @Some_Emo_Chick

Open Assistant

Konversations-KI für alle. Ein Open-Source-Projekt zur Erstellung eines Chat-fähigen GPT LLM, das von LAION und Mitwirkenden auf der ganzen Welt betrieben wird.

Joe Dowland Sep 30, 2023

@seismographix @StarkRG @erikcats @Some_Emo_Chick I'd rather stab myself in the eyes than take part in that, thanks

Erik (OLD ACCOUNT)Sep 30, 2023

@seismographix @Aradiel @StarkRG @Some_Emo_Chick just to be clear, I do not have any it training.

What I do have is big fat question marks with the idea you seem to be trying to push that there is a thing such as a standard for ethically trained AI. What you're saying sounds both extremely rare and extremely against the grain of an Economic model where taking value and returning as little as possible is the industry standard

Erik (OLD ACCOUNT)Sep 30, 2023

@seismographix @Aradiel @StarkRG @Some_Emo_Chick not going to lie, to my trained teacher I you look like a tech fanboy

seismographix Sep 30, 2023

I do not disagree. You have definitely to check the business plan of the organization, you want to contribute to.

But the LAION-5B training data set, for example, is managed by “LAION gemeinnütziger e.V.”, which is a German non-profit association. Such association must register with the German administration. There are obligations for them to fulfill. When the association dissolves, they have to hand over the assets to the public.

@Aradiel
@StarkRG @Some_Emo_Chick

Erik (OLD ACCOUNT)Sep 30, 2023

@seismographix @Aradiel @StarkRG @Some_Emo_Chick point is almost all ai/llm farmers just grab everything and don't ask to contribute. Your single exception and a handful of similar ones aren't going to change the industry

seismographix Sep 30, 2023

It is not text only, but here is the image and text database LAION-5B. https://laion.ai/blog/laion-5b/

@StarkRG @erikcats @Some_Emo_Chick

LAION-5B: A NEW ERA OF OPEN LARGE-SCALE MULTI-MODAL DATASETS | LAION

<p>We present a dataset of 5,85 billion CLIP-filtered image-text pairs, 14x bigger than LAION-400M, previously the biggest openly accessible image-text datas...

seismographix Sep 30, 2023

@StarkRG
Personally I use chatGPT to learn and explore, transform text formats, etc.
But I also dislike people publishing novels written with AI on Amazon trying to make quick money without any effort on their own.

@Aradiel @erikcats @Some_Emo_Chick

seismographix Sep 30, 2023

@Aradiel
The result of looking at text is stored per word. The word Pizza for example would look like this [132,235,793,526,...,888]. Every number is the value of the word regarding a rule detected by the AI. Example: mean distance to the next adverb.

It is like a person had read a lot of books. When he/she is writing the output will be based on her/his knowledge reading books.

@erikcats @StarkRG @Some_Emo_Chick

seismographix Sep 30, 2023

@Aradiel
In one IT article there was a good example. For following text "I live in Berlin. In the evening I like to drink red wine and eat a" then the AI would probably choose Pizza as the next word.
@erikcats @StarkRG @Some_Emo_Chick

Erik (OLD ACCOUNT)Sep 30, 2023

@seismographix @Aradiel @StarkRG @Some_Emo_Chick Bratwurst

seismographix Sep 30, 2023

Bratwurst with wine. Never tried this combination. 😉

@Aradiel @StarkRG @Some_Emo_Chick

Erik (OLD ACCOUNT)Sep 26, 2023

@StarkRG @Aradiel @Some_Emo_Chick people generally need a lot of handholding. Outsourcing human decisions to a machine with an unclear decision making process will be the death of us 🤷🏼‍♂️

Erik (OLD ACCOUNT)Sep 26, 2023

@Aradiel @StarkRG @Some_Emo_Chick yeah my toot last night got me thinking of all the AI startups.

But this here is very much my feeling, I'm neither afraid nor impressed by AI

https://major-grooves.medium.com/blood-bowl-the-ultimate-challenge-for-artificial-intelligence-5bfa8cad259b