the author of this post prompted copilot to characterize the differences in a data set of statements concerning career ambitions, categorized by country. the trick is that the data contained the *same statements* for each country https://kucharski.substack.com/p/real-signals-or-artificial-stereotypes regardless of the fact that the data were identical, the model generated some pretty hilarious stereotypes ("The US prioritizes leadership and innovation", "The UK blends public service with professional status")
Real signals or artificial stereotypes?

Adventures with a cultural Copilot

Understanding the unseen
i used the same data set but replaced each country with a "gender identity" (man, woman, trans woman, trans man, non-binary) and prompted chatgpt to characterize the differences between the groups. lo and behold, i got some fantastic gender stereotype trash
"dig deeper," i prompted
not to be too blunt about this, but LLMs simply do not belong anywhere in a data analysis workflow. not for cleaning, not for coding, and certainly not for analysis. it's frankly absurd and terrifying that data science etc people are adopting these tools
@aparrish I feel like the marketing term “AI” is a huge part of what drives this. Because the term is used to describe both LLM chatbots and actual predictive analytics tools (including some that have existed for decades!!!), chatbots get a sort of halo benefit and get confused with data analysis tools by people who don’t know better
@tom @aparrish anything that is promoted as 'intelligence' is tripe. Proper analytical tools should be called that. Random statistical word-jumbler algorithms should be called that, if not called by the more realistic acronym S.H.I.T. (I'm sure someone can make up the words to fit).
@tom @aparrish I mean, lumping old machine learning things under "AI" is part of the problem. People thought they could make their existing stuff sound cooler.

@jens @tom @aparrish See also from a previous hype cycle:

“☁️”