the author of this post prompted copilot to characterize the differences in a data set of statements concerning career ambitions, categorized by country. the trick is that the data contained the *same statements* for each country https://kucharski.substack.com/p/real-signals-or-artificial-stereotypes regardless of the fact that the data were identical, the model generated some pretty hilarious stereotypes ("The US prioritizes leadership and innovation", "The UK blends public service with professional status")
Real signals or artificial stereotypes?

Adventures with a cultural Copilot

Understanding the unseen
i used the same data set but replaced each country with a "gender identity" (man, woman, trans woman, trans man, non-binary) and prompted chatgpt to characterize the differences between the groups. lo and behold, i got some fantastic gender stereotype trash
@aparrish someone was telling me they use this stuff to do all their data cleaning and analysis at work and i asked how they knew it was giving them the right answers and they seemed confused by the question
@hannah 🙃
@aparrish as a data scientist who "retired" in 2023 it has been wild to see the entire industry go bananas for this stuff. it's like if every doctor simultaneously just started doing homeopathy because it was faster
@hannah @aparrish as someone who's worked in the medical field, that's also an uncomfortable number of us and it's terrifying. So I guess it's everywhere really.
@V @hannah @aparrish as a programmer who uses Claude somewhat regularly, I remain dumbfounded that
(a) anyone uses CoPilot, *GPT, etc, for anything approaching seriousness whatsoever when even a cursory check of their answers almost always immediately implodes; slop indeed
(b) anyone uses "AI" for purposes besides, say, writing a program that can probably do said analysis (which is often what Claude and MCP-capable AIs are really doing)
@V @hannah @aparrish people keep expecting "AI" to simply "know" things, but LLMs seem to me like brains in jars: yes they can generate plausible responses, but without eyes ears hands etc they have no way to do anything or establish truth. An MCP "tool" and proper coaching gives the brain a chance to say "I should use tools to check if I'm correct" and rhetoricize from there

@V @hannah @aparrish it's been a huge snake oil disservice to use the same label for "type 'best peanut butter recipes are' on your keyboard and then keep on accepting the middle suggestion" as a "search engine" versus "do a web search for 'best peanut butter recipes', return the result summaries here, and summarize their summaries: A B C D go"

Slop is possible with both, but one truly is hallucination divorced from any sense of reality and not what a "language model" is good at

@hannah tangential, but I've had doctors ask if they can have "AI" listen in on the appointment and summarise it. Initially, I just said no. Now I've moved to "thanks, but we're clearly not compatible".
@hannah @aparrish I have bad news for you about doctors
@hannah @aparrish Bad news about doctors..... Have experienced this mpre than once.... - Vi (Also had too many now open Google right in front of me to find something or prove a point and we have to rely on those fuckers)
@hannah @aparrish if you don't use globuli you're gonna be left behind
@hannah @aparrish I asked similar questions when someone suggested I use an LLM to summarize a video. “How do I know it’s accurate?” The answers were bafflingly weird. Clearly. They didn’t understand the question.
@slott56 @hannah @aparrish I've discovered that there are a shocking number of people who simply don't consider accuracy to be important. Their worldview doesn't care about what is true and what isn't true. It's about marking the task complete and moving on to the next for them and correctness isn't part of the equation.

@jeffers00n It could also be that the incentive structure (read: "Keeping my Job") is misaligned, in that it relies on the rate of completed tasks, not their correctness?

People whose livelihoods rely on not getting fired are not incentivized to engage in the finer points of whether LLM output is actually trustworthy, if the people dictating the incentives also do not care.

The question is who gets to carry the can once this shit show collapses under its own weight.

@slott56 @hannah @aparrish

@jeffers00n @slott56 @hannah @aparrish

I think we teach kids this in school. It's about turning in a 5-page essay, properly formatted, by the assignment's due date. Content is secondary to form and schedule.

@hannah @aparrish For the entire history of computing there's been an understanding that if you put the wrong data in, you'd get the wrong answer.

Somehow we're in a brave new world where you can put the RIGHT data in and get the wrong answer.

@negative12dollarbill
In the past software developers did their best to detect and fix the latter.
Enters AI to do full reverse.
@hannah @aparrish
@hannah @aparrish I've asked people this & they said they asked the chatbot to check its own results & then I asked them how they knew it wouldn't also make stuff up the second time around & they started stuttering...