AI or DEI?
AI or DEI?
It’s literally instructed to do AdLibs with ethnic identities to diversify prompts for images of people.
You can see how it’s just inserting the ethnicity right before the noun in each case.
Was a very poor alignment strategy. This already blew up for Dall-E. Was Google not paying attention to their competitors’ mistakes?
It’s also like, I guess I would prefer it to make mistakes like this if it means it is less biased towards whiteness in other, less specific areas?
Like, we know these models are dumb as rocks. We know that they are imperfect and that they mirror the biases of their trainers and training data, and that in American society that means bias towards whiteness. If the trainers are doing what they can to prevent that from happening, whatever, that’s cool… even if the result is some dumb stuff like this sometimes.
I also don’t think it’s a problem for the user to specify race if it matters? Like “a white queen of England” is a fine thing to ask for, and if it isn’t specified, the model will include diverse options even if they aren’t historically accurate. No one gets bent out of shape if the outfits aren’t quite historical accurate, for example
Repeat after me:
“Current AI is not a knowledge tool. It MUST NOT be used to get information about any topic!”
If your child is learning Scottish history from AI, you failed as a teacher/parent. This isn’t even about bias, just about what an AI model is. It’s not even supposed to be correct, that’s not what it is for. It is for appearing as correct as the things it has been trained on. And as long as there are two opinions in the training data, the AI will gladly make up a third.
That doesn’t matter though. People will definitely use it to acquire knowledge, they are already doing it now. Which is why it’s so dangerous to let these “moderate” inaccuracies fly.
You even perfectly summed up why that is: LLMs are made to give a possibly correct answer in the most convincing way.
it’s true that this would mislead children, but the model could hallucinate about literally anything. Especially at this stage, no one-- children or adults-- should be uncritically accepting what the model states as fact. That said, I agree LLMs need to improve their factual accuracy
Although it is highly debated, some scholars suggest Queen Charlotte might have had African ancestry, or that she would be considered a POC by today’s standards. Of course, she reigned in the 17-1800s, but it isn’t entirely outlandish to have a “Queen of Color”, if we aren’t requesting a specific queen or a specific race
People of color did live in England in the middle ages? Like not diverse in the way we conceive now, but here are a few papers discussing the racial diversity at the time. It was surely less intermingled than today, but it’s not like these images are impossible
*Other things are anachronistic or fantastical about these images, such as clothing. Are we worried about children getting the wrong impression of history in that sense?
That’s valid! I agree. I think in this case it would be reasonable for the model to give multiple (or like, at least one, jeez) images with white queens. I don’t disagree with anyone in that sense. I just also don’t think it’s worth pitching a fit when the dumbass model that has been trained to show more racial diversity produces (frankly comical) hallucinations.
The ethos of the trainers is a good one. Attempting to counter the (demonstrated, measurable) bias of many models toward whiteness is a good choice. I prefer that the trainers choose to address the bias even if it makes the model make silly mistakes like this. That’s all.
Excel 🤝 Incel Incorrectly assuming it’s a date
👆 they probably meant this one
I don’t know who “them” is here. I thought from the context it was obvious that I meant whoever is managing these AIs. I guess I could’ve been clearer.
But what, do you think they’re behind the scenes to insert the word woke in every search by default or something?
I mean they literally are inserting stuff in the prompts to make the results more diverse? It’s not some hidden thing but rather a solution to issues with the undiverse training data. But obviously here they’ve “overcorrected” to beyond all sense.
Generally on the internet when someone says “they” in quotes then they’re referring to “them” as Jewish people.
It’s a dog whistle.
This is usually the type of thing that you should clarify because… Well you seem like one of “them” even you don’t ;D
So they were saying I’m Jewish?
For example, a prompt seeking images of America’s founding fathers turned up women and people of colour.
“A bit”
Significant racist bias is an understatement.
I asked a generator to make me a “queen monkey in a purple gown sitting on a throne” and I got maybe two pictures of actual monkeys. I even tried rewording it several times to be a real monkey, described the hair and everything.
The rest were all women of color.
Very disturbing. Pretty ladies, but very racist.
Stable diffusion online version, several weeks ago. Might not be the same situation anymore, idk how often that stuff gets updated.
It’s also possible that some sort of “sticky idea” got into its head and made it start generating it that way after it did one like that. I’ve heard that sort of thing isn’t uncommon.
To be clear, stable diffusion isn’t one model, it’s the generation platform. From there, you have models that sit on top of it. Online generators can use any model, depending on how they’re set up. Each model includes different training data, meaning different results from the same prompts, sometimes vastly.
It’s a bit like driving somewhere, having someone ask how you found the place, and saying your phone. Technically a correct answer, but they’re probably looking for more specific answers, like GPS, or a map. Not trying to nit-pick, just giving a bit of information.
Apparently without any correction there is significant racist bias.
This doesn’t make it any less ridiculous. This is a central pillar of this kind of AI tech, and they’re trying to shove a band aid over the most obvious example of it. Clearly, that doesn’t work. It’s also only even attempting to fix one of the “problems” - they’re never going to be able to “band aid” every single place where the AI exhibits this problem, so it’s going to leave thousands of others un-fixed. Even if their band aid works, it only continues to mask the shortcomings of this tech and makes it less obvious to people that it’s horrendously inacurrate with the other things it does.
Basically the AI reflects the long term racial bias in the training data. According to this BBC article it was an attempt to correct this bias but went a bit overboard.
Exactly. This is a core failing of LLM tech. It’s just going to repeat all the shit it was fed to it. You’re never going to fix that. You can attempt to steer it in different directions, but the reason this tech was used was because it is otherwise impossible for us to trudge through all the info that was fed to it. This was the only way to get it to “understand” everything. But all of it’s understandings are going to have these biases, and it’s going to be just as impossible to run through and fix all of these. It’s like you didn’t have enough metal to build the titanic so you just built it out of Swiss cheese and are trying to duct tape one hole closed so it doesn’t sink. It’s just never going to work.
This being pushed as some artificial INTELLIGENCE is the problem here. This shit doesn’t understand what it’s doing, it’s just regurgitating the things it’s consumed. It’s going to be exactly as flawed as whatever was put into it, and you can’t change that. The internet media it was trained on is racist, biased, full of undeniably false information, and massively swayed by propaganda on all sides of the fence. You can’t expect LLMs to do anything different when trained on that data. They’re going to have all the same problems. Asking these things to give you any information is like asking the average internet user what the answer is. And the average internet user is not very intelligent.
These are just amped up chat bots with data being sourced from random bits of the internet. Calling them artificial INTELLIGENCE misleads people into thinking these bots are smart of have some sort of understanding of what they’re doing. They don’t. They’re just fucking internet parrots, and they don’t have the architecture to be “fixed” from having these problems. Trying to patch these problems out is a fools errand and only masks their underlying failings.
I don’t know, maybe that would work, for this one particular problem. My point is it’s more than that. Even if you go through the trouble of fixing this one particular issue with LLMs, there are literally thousands of other problems to solve before it’s all “fixed”. At some point, when you’ve built and maintained thousands of workarounds, they start conflicting with each other and making a giant spider web of issues to juggle.
And so you’re right back at the problem that you were trying to solve by building the LLM in the first place. This approach is just futile and nonsensical.
Yeah. But maybe this is how you teach an AI a broader understanding of the real world. Or really a slightly less narrow view. Human brains also have to learn and reconcile all these conflicting data points and then create a kind of understanding from it. For any machine learning it would only be an intuitive instinct.
Like you would have a bunch of these “tables” that show relationships between various tokens and embody concepts. Maybe you need to combine different kind of models that are organized and trained differently to resolve such things. I only have a very surface level understanding of how machine learning works so I know this is very speculative. Maybe you’re right and it can only ever reflect the training data. Then maybe you’d need to edit the training data, but you could also maybe use other AIs to “reinterpret” training data based on other models.
Like all the data on reddit, could you train a model to detect sarcasm or lies or to differentiate between liberal, leftist and fascist type of arguments? Not just recognizing the tokens or talking points, but the semantic of an argument? Like detecting a non sequitur. You probably need need “general knowledge” understanding for that. But any kind of AI like that would be incredibly interesting for social media so you client can tag certain posts, or root out bot / shill networks that work for special interests (fossil fuel, usa, china, russia).
Eh I really need to learn more about AI to understand the limits.
The broad answer is, I’m pretty sure everything you’ve mentioned is possible, and you’re right in that this is similar to how humans integrate new data. Everything we learn competes with and bolsters every bit of knowledge we already have, so our web of understanding is this ever shifting net of relationships between concepts.
I don’t see any reason these kinds of relationships can’t be integrated into generative AI, they just HAVEN’T yet, and each time you increase how the relationships interact, you’re also drastically increasing the size and complexity of the algorithm and model. I think we’re just realizing that what we have now is OK, but needs to be significantly better before it’s really mind blowing.
I don’t see any reason these kinds of relationships can’t be integrated into generative AI, they just HAVEN’T yet
No, it’s just fucking pointless. You’re talking about adding sand to a beach. These things are way more complicated and trying to shovel these things in just makes a mess. See literally the OP.
each time you increase how the relationships interact, you’re also drastically increasing the size and complexity of the algorithm and model.
No youre not. Not even fucking close. You clearly don’t understand this at all.
The ALGORITHM will always be the same. Except for new generations of these bots. Claiming adding things like racial bias is going to alter the algorithm is just nonsensical.
The MODEL is the huge fucking corpus of internet data. Anything you tack onto it is a drop in an ocean. It’s not steering anything.
Whats changing is they’re editing inputs because that’s all you can really do to shift where these things go. Other changes would turn this into a very different beast, and can’t be done at the fine grained level like “race”.
Claiming this has any significant impact on the size or complexity of any of this is just total hog wash and you must not understand how these work or how big they are.