🚨 noyb has filed a complaint against the ChatGPT creator OpenAI

OpenAI openly admits that it is unable to correct false information about people on ChatGPT. The company cannot even say where the data comes from.

Read all about it here 👇

https://noyb.eu/en/chatgpt-provides-false-information-about-people-and-openai-cant-correct-it

ChatGPT provides false information about people, and OpenAI can’t correct it

noyb today filed a complaint against the ChatGPT maker OpenAI with the Austrian DPA

noyb.eu
@noybeu
<rubs hands, gets popcorn>
Oh, this is gonna be good.
@noybeu Here, the use of "hallucinating" is particularly obfuscating, as it implies an autonomous subjectivity, whereas what actually happens is just a statistical mash-up.

@noybeu the concept of "correctness" is antithetical to how LLMs function, so what the person is asking for is fundamentally impossible. LLMs do not associate a specific value of a response to a particular question; they have a pool of "values that are responses to requests in the form of X" and pick one from that pool. It might be the right one, it might not.

However, this does not mean the person is wrong for asking; it means that OpenAI and peers are falsely advertising what LLMs do.

@noybeu

Writing a Cloud / AI strategy atm. Thank you for supporting me with arguments.

@noybeu This is an interesting one!

When you use your phone's next word prediction and use it to write about someone, do you expect the text to reflect reality?

And can you then go to Google and tell them the text from next word prediction needs to be changed according to GDPR?

How is ChatGPT different?

Their marketing doesn't change that fundamentally it's just doing next word prediction.

@djh @noybeu Their marketing changes their legal obligations, though. If they have created an expectation that this contains information, then they have an obligation to ensure the information is correct.

@po8crg
Worth noting a case in Canada where an air traveler asked the robotic customer service if they get a reduced price for going to a funeral or something like that. Robot said yes. Then the airline later said something like “no, the bot lied… we will not honor the terms the bot gave”. Court sided with the consumer. There was no marketing here and that bot was expected/obligated to be truthful.

So some people must be looking forward to a fun hobby of coercing chat bots to make too-good-to-be-true promises.
@djh @noybeu

@djh @noybeu Did my phone's next word prediction inhale a huge quantity of personal data to do it? If not, no problem; it's not processing personal data. "AI" is.

@noybeu interesting. I suspect that the defence would be around them not holding records that "relate to an individual" and this is evidenced by the fact they can't correct it. You can't look an individual's "record" up in the backend.

They'll argue it's no worse than having a news article in which someone's date of birth is inaccurate.

@dtwx as far as chatgpt knows i am the same age as homer simpson
@dtwx @noybeu doesn't mean they have no obligation to filter it out when the tool spits out answers.

@Natanael_L @noybeu they might have a social obligation but I don't think they have a legal obligation, as I explained above,

The terms of service are pretty solid https://openai.com/policies/terms-of-use

Terms of use

@dtwx @noybeu no jurisdiction allows ToS to override law. If the interface isn't displaying sufficient disclaimers that the content is likely fictional then they aren't protected.
@Natanael_L @noybeu Like this? I mean, we can go all day, but let's see how the legal challenge goes, eh?
@dtwx @noybeu unlikely to be sufficient. Marketing also matters and has sometimes "overridden" disclaimers because customers couldn't be expected to know the disclaimers would contradict not obviously unreasonable marketing claims.

@dtwx
There’s no legal challenge. @noybeu complained to the Austrian DPA rather than opening a court case. From there the case could just get moth-balled by the DPA. There is nothing to force a DPA to take any action and they generally do nothing. Though I say that without knowing if the Austrian DPA pays any extra attention to NOYB.

@Natanael_L

@bojkotiMalbona thanks for the clarification 👍

@dtwx
#openAI is using Cloudflare & I can’t be bothered to circumvent to read the ToS.

But surely the ToS only makes guarantees to the /consumers/, not the suppliers who involuntarily feed the machine. OpenAI can probably write a ToS that avoids obligation of accuracy to consumers of openAI, but suppliers of the input data are not parties to that ToS. The GDPR guarantees data subjects a right to have their personal info corrected.

@Natanael_L @noybeu

@noybeu

“Lying” about people and making money from it. They should be sued out of existence.

@noybeu Not sure who said it first, but I heard it from Phil Koopman (http://www.koopman.us/): if your premise is, "everything an LLM says is a hallucination", you won't be disappointed by what it puts out.
It's still a problem when the parent company violates people's rights, but again, why are we surprised?
Philip Koopman: autonomous vehicle safety & dependable embedded systems

Phil Koopman home page / Self Driving Car Safety

@noybeu It's a language model. It's not a source of data.

It's like trying to sue the English language because it's possible to use it to say something defamatory.

ChatGPT's ONLY job is to simulate the *structure* of, e.g., a conversation. Managing the *content* of that conversation is beyond its scope.

Substitute "conversation" for letter, essay, lecture, etc.

@waggers5 @noybeu

You're talking about what ChatGPT actually is, but that is NOT how ChatGPT is being marketed and sold to other companies.

ChatGPT is being marketed as a way to replace human beings, not as a way to simulate language structures.

ChatGPT is being used by search engines, and people go to search engines to find useful information, not to explore the structure of language.

@FediThing @noybeu @wizzwizz4 Sadly a very familiar tale of technology marketers not understanding (or wilfully misunderstanding) the capabilities of the product they are selling

@waggers5 @noybeu @wizzwizz4

...but those kinds of applications are where almost all of the money funding ChatGPT comes from.

Most money for AI is, ultimately, from company owners that want to sack human staff to reduce costs, or build rival companies without human staff to outcompete those with human staff.

It is disingenuous/dishonest of ChatGPT's creators to take this money and then feign ignorance about all of this.

The fact it's being done by spreading uncorrectable lies about people makes it even worse.

@waggers5 @noybeu hence you sue the developers of said LLM. Because just like "English" someone is telling it how to behave and respond.

The developers of any given AI are responsible to what it does, just like they are responsible for say copyright violations while training it.

@WhyNotZoidberg
From a GDPR standpoint, developers have no obligations AFAICT. Obligations are on “data controllers” and “data processors”. Whoever *runs* the software that processes personal data has obligations under the GDPR (which could incidentally be the developers if the devs and software users are one in the same).

@waggers5 @noybeu

@noybeu

ChatGPT says my website Fedi.tips is some kind of cryptocurrency website. I have no way of correcting it, and it is still spreading these lies.

No idea where this comes from, no human would make this mistake. The only mention fedi.tips actually makes of blockchain and cryptocurrency is to tell people not to use it as it's a big scam:

https://fedi.tips/does-mastodon-or-the-fediverse-use-ads-or-trackers-or-algorithms-or-blockchain-or-cryptocurrency-or-anything-annoying-like-that/

ChatGPT is garbage, no one should be relying on it for accurate information.

Does Mastodon or the Fediverse use ads or trackers or algorithms or blockchain or cryptocurrency or anything annoying like that? | Fedi.Tips – An Unofficial Guide to Mastodon and the Fediverse

An unofficial guide to using Mastodon and the Fediverse

@FediThing @noybeu probably a connotation it got from the TLD chosen. And because it has no understanding of facts it can't understand it needs to check every individual domain separately before answering, it just LLMansplains

@Natanael_L @noybeu

Probably, but it shows how non-existent its "intellgence" is that it has no way of checking for multiple common meanings of words.

@noybeu reading the article I couldn't help but wonder if some folk deliberately misunderstand what AI is. In particular the I part. What part of an encyclopaedia is intelligent? It's a basic store of information. It can be read, edited corrected where wing and so on.

But a AI is thus named as it is loosely modelled on the imagined workings of a brain. A neutral network ... It's also a word used in the arena for similar reasons.

And the goal is for a similar outcome. That is, learning, which captures information in a form that permits recollection only by reconstruction (much like the brain) and in responding to a prompt or request, to assemble a truly judged most likely to please, an idea governer by a button of familiarity and being the kind of reply we're confident we've heard to similar questions etc...

But you get my drift... Open AIb is not being evasive or inconsiderate. They are working on AI, they can't easily predict is responses nor modify them etc, any more than you can your colleagues brain. The very thing that is evening AI is the ability to handle an abundance of abstract data in real time ... Stored as weights abstracted from learning inputs.

It is no more likely to spit out actuate facts than, wait for it, the brains it is being modelled on. The winner point of the I in AI is being able to reassemble learnings in novel ways and to respond to prompts with diverse goals etc.

In a nutshell, who is surprised that it can and does say untrue things... That would seem to be an inherent property of the endeavour.

@thumbone @noybeu The point is that that inherent property may well be illegal under EU law, not that it is surprising.

@denisbloodnok @noybeu It may be, though that itself works be surprising, given humans do it all the time. Perhaps all this "I have a right to my opinion" people uttering untruths across the internet need to lobby for AI to have that same right 🤣. It is after all emulating us ... That is the goal.

That said, to me the article read like people were indeed surprised at the fallibility of AI. And this is an issue, the quality of language emulation achieved in LLMs can and does lull people into thinking the model is smart.

Ironically , well spoken humans have been all over that since the dawn of language and it's an accepted explanation for the complexity of modern languages (evolving to maintain grading bars so that elocution can serve as a proxy for measuring credibility).

@thumbone @noybeu Humans do it all the time, but there isn't a law against what data humans can have and process in their heads, for obvious reasons. There is law about what organisations and computers can do with data.

"AI" isn't emulating us. Humans - sometimes badly - reason about whether statements are true or false.

@denisbloodnok @noybeu I beg to differ. AI simply scans data much as your eyes do and need not retain it, in fact the whole point is not to, to extract patterns from it. And to argue it is not emulating humans strikes me as disregarding the very evolution of neural networks, LLMs and AI in general. It is at bare minimum inspired by humans and targeted at interfacing with (interacting and communicating with) humans and I don't know what part of "emulating" you think is missing.... Perhaps the physical? Even there robotics is doing an awful lot of human emulation ... And animal emulation... Etc.
@noybeu
ChatGPT is creating a new paradigm for expensive endeavors that never worked.
@noybeu I would imagine the problem is that is does not claim it is accurate information, or that it is about the actual person you ask it. It is fiction, and if it claims to be fiction, it can't be "inaccurate personal information" or even "person information", can it. It I say Granny Weatherwax is dead I am not claiming to be stating accurate information of someone using that name, and can't be expected to "correct" it.
Surly.
@revk @noybeu GDPR aside, this approach would not (I think) get them very far if a defamation action was brought. "Oh, I didn't mean you" or "the small print says its fictional" tend not to work well as defences to libel or slander

@noybeu Da darf man sehr gespannt sein, wie erfolgreich @noybeu hier sein wird.

Das ist übrigens auch eine spannende Frage für manches Kultusministerium, das ChatGPT & Co in die Schule bringt und dort „halluzinierte“ bzw. schlicht falsche und von Betroffenen nicht korrigierbare Personendaten verarbeitet.

P.S. In (digitalen) Fotos pixeln wir (auch automatisiert) personenbeziehbare Merkmale von Menschen. Sollte man sowas vielleicht auch den KIs beibringen - bevor man sie von der Leine lässt?

@noybeu
Don't make tools if you can't make them operate legally. It's a pretty simple ask.
@emilygorcenski
@noybeu Sounds like we reached with ChatGPT the point of Habsburgian AI and Potemkin AI „learning“ from each other causing this dangerous sheit.

@noybeu This is exactly why I maintain a list of utterly nonsensical fictional bios because if you are going to run a bot that uses my content, your output is going to be quirky, to say the least.

I think it is quite funny to read but I would do, I wrote it: https://matrixdreams.com/un-realistic-biographies/

Un-Realistic-Bios

Descriptions of people that might not exist

Un-Realistic-Bios

@noybeu Well, of course. This is pertinent to machine learning based models - nothing to do with OpenAI or ChatGPT in particular. All LLM-based chatbots have this problem.

Once you train a ML model and it starts generating output, you generally cannot (a) tell why it has generated this particular output, (b) which particular part of the training data resulted in this output, and (c) change substantially its behavior without full examination of the training data and re-training of the model.

@noybeu @campuscodi Sounds like a wonderful system we should be building into all technology going forward. What could possibly go wrong with huge misinformation systems we don’t understand?

@noybeu

"Can't"? Clearly it comes from the training data, and if they don't have logs of where they scraped the training data from, then that's clearly a decision to avoid liability.