the author of this post prompted copilot to characterize the differences in a data set of statements concerning career ambitions, categorized by country. the trick is that the data contained the *same statements* for each country https://kucharski.substack.com/p/real-signals-or-artificial-stereotypes regardless of the fact that the data were identical, the model generated some pretty hilarious stereotypes ("The US prioritizes leadership and innovation", "The UK blends public service with professional status")
Real signals or artificial stereotypes?

Adventures with a cultural Copilot

Understanding the unseen
i used the same data set but replaced each country with a "gender identity" (man, woman, trans woman, trans man, non-binary) and prompted chatgpt to characterize the differences between the groups. lo and behold, i got some fantastic gender stereotype trash
"dig deeper," i prompted
not to be too blunt about this, but LLMs simply do not belong anywhere in a data analysis workflow. not for cleaning, not for coding, and certainly not for analysis. it's frankly absurd and terrifying that data science etc people are adopting these tools

@aparrish Data* science**

*the data might be made up whole-cloth.
**the science is pretty fucken sus too

@aparrish

I work in biomedical informatics, and it's upsetting the number of data requests that come to me from people wanting to use LLMs to do a whole bunch of stuff for them. They don't ask for specific and targeted data points, just "yeah give me all the diagnosis codes" etc. (for basically all the domains, not just a single one). And they don't say which model they're using in their IRB protocol, so who knows whether they're sending PHI to OpenAI or whatever. It gets approved anyway.

@aparrish I feel like the marketing term “AI” is a huge part of what drives this. Because the term is used to describe both LLM chatbots and actual predictive analytics tools (including some that have existed for decades!!!), chatbots get a sort of halo benefit and get confused with data analysis tools by people who don’t know better
@tom @aparrish anything that is promoted as 'intelligence' is tripe. Proper analytical tools should be called that. Random statistical word-jumbler algorithms should be called that, if not called by the more realistic acronym S.H.I.T. (I'm sure someone can make up the words to fit).
@tom @aparrish I mean, lumping old machine learning things under "AI" is part of the problem. People thought they could make their existing stuff sound cooler.

@jens @tom @aparrish See also from a previous hype cycle:

“☁️”

@aparrish I don't do heavy-duty data analysis, but I have found the AI useful to write analysis and report generation programs that, in the past, I would have written myself.

I review the code of these before I run them, to confirm that they will do what I asked for.

Perhaps this isn't the sort of thing you had in mind?

@mjd i understand that some people find that workflow to be useful, but i just can't think of a scenario where the labor of reviewing code rigorously enough to confirm it works how I intend it to work is meaningfully smaller than the labor of... just writing the code. (in fact, to me it seems like the latter task is the easiest way to perform the former.) with data analysis i think this is especially important—subtle cultural biases are present in our *code* just as much as they are in our data
@aparrish @mjd
I mean, there's the scenarios where the labour of reviewing the LLM output is either skipped, forgoing quality, or offloaded onto someone else 🤷‍♀️
@aparrish @mjd
Even with diligent review, the quality of LLM output will often be worse, due to various effects (eg. anchoring, automation bias, all the way up to memory alteration)

@aparrish As someone who hasn't done any LLM code generation (but keeps reading about it here): I've often found it easier to reach a good solution when I had some bad code to look at (whether my own or someone else's), compared to starting with a blank page. There are only a few times when I have felt it was a mistake not to start from scratch. So as a workflow, I can understand wanting to start with something LLM-generated and improving on it myself.

But it always took me time and a lot of thinking to reach the solution I wanted, and I have *no* confidence in my ability to review and verify and take responsibility for LLM-generated code in the volumes and by the deadlines that people commonly experience these days. I very much want to think of myself as the steely-eyed operator whose stern gaze and firm hands keep the wayward minions in check and drive them productively onwards.

But I experienced plenty of "just merge it so that we can make <x> SUPER DUPER IMPORTANT sale that will definitely result in financial ruin if it doesn't go through right away" and "we can always fix it later" and "this is just a one-shot, we won't ever need this again" sort of pressure at work even before LLMs could be used for any of it, and knowing what a constant battle it was to maintain high standards for anything, I cannot imagine *myself* being able to succeed at that in today's environment.

(My comments are all about "ordinary" code. I think I would be terrified to the point of paralysis if I had to write data analysis code of the sort where anyone's implicit biases might matter to the results, never mind how the code may be written. So I'm not meaning to agree or disagree with what you said, just commenting on the "blank page" aspect.)

@mjd

@amenonsen @aparrish @mjd

I believe getting some text on a blank page because you are afraid of the blank page seems like the only valid usecase.

but then, you could just put your question into google. open the first page and copy paste the text there into your document.

Then start editing to make it work for your intended text.

This is how I used to write my student essays. I used latex so I could just comment out all that was there so the page isnt blank at the start.

@mjd @aparrish so, you are deliberately downgrading yourself? There is evidence that people doing this to 'do the simple jobs' will eventually have ossified their abilities so much they won't be able to do anything of use. You've been fooled by the hype.

@aparrish I don't trust Excel with my datasets, why the hell would I trust an LLM

e: I care a lot about my work and that's why I don't currently have a job

@aparrish, it is equally mind-bending that people are adopting them for qualitative data analysis.

You cannot provide a positionality statement for an LLM.

You cannot conduct interpretivist work when one of your coders is a probabilistic mishmash of online stereotypes.

@aparrish Ok but what about data *synthesis*? (yes, people actually do that)
@aparrish finally a system that is unbiased other than by the entirety of its training set
@aparrish OMG that would be funny if it weren’t terrifying.
@aparrish someone was telling me they use this stuff to do all their data cleaning and analysis at work and i asked how they knew it was giving them the right answers and they seemed confused by the question
@hannah 🙃
@aparrish as a data scientist who "retired" in 2023 it has been wild to see the entire industry go bananas for this stuff. it's like if every doctor simultaneously just started doing homeopathy because it was faster
@hannah @aparrish as someone who's worked in the medical field, that's also an uncomfortable number of us and it's terrifying. So I guess it's everywhere really.
@V @hannah @aparrish as a programmer who uses Claude somewhat regularly, I remain dumbfounded that
(a) anyone uses CoPilot, *GPT, etc, for anything approaching seriousness whatsoever when even a cursory check of their answers almost always immediately implodes; slop indeed
(b) anyone uses "AI" for purposes besides, say, writing a program that can probably do said analysis (which is often what Claude and MCP-capable AIs are really doing)
@V @hannah @aparrish people keep expecting "AI" to simply "know" things, but LLMs seem to me like brains in jars: yes they can generate plausible responses, but without eyes ears hands etc they have no way to do anything or establish truth. An MCP "tool" and proper coaching gives the brain a chance to say "I should use tools to check if I'm correct" and rhetoricize from there

@V @hannah @aparrish it's been a huge snake oil disservice to use the same label for "type 'best peanut butter recipes are' on your keyboard and then keep on accepting the middle suggestion" as a "search engine" versus "do a web search for 'best peanut butter recipes', return the result summaries here, and summarize their summaries: A B C D go"

Slop is possible with both, but one truly is hallucination divorced from any sense of reality and not what a "language model" is good at

@hannah tangential, but I've had doctors ask if they can have "AI" listen in on the appointment and summarise it. Initially, I just said no. Now I've moved to "thanks, but we're clearly not compatible".
@hannah @aparrish I have bad news for you about doctors
@hannah @aparrish Bad news about doctors..... Have experienced this mpre than once.... - Vi (Also had too many now open Google right in front of me to find something or prove a point and we have to rely on those fuckers)
@hannah @aparrish if you don't use globuli you're gonna be left behind
@hannah @aparrish I asked similar questions when someone suggested I use an LLM to summarize a video. “How do I know it’s accurate?” The answers were bafflingly weird. Clearly. They didn’t understand the question.
@slott56 @hannah @aparrish I've discovered that there are a shocking number of people who simply don't consider accuracy to be important. Their worldview doesn't care about what is true and what isn't true. It's about marking the task complete and moving on to the next for them and correctness isn't part of the equation.

@jeffers00n It could also be that the incentive structure (read: "Keeping my Job") is misaligned, in that it relies on the rate of completed tasks, not their correctness?

People whose livelihoods rely on not getting fired are not incentivized to engage in the finer points of whether LLM output is actually trustworthy, if the people dictating the incentives also do not care.

The question is who gets to carry the can once this shit show collapses under its own weight.

@slott56 @hannah @aparrish

@jeffers00n @slott56 @hannah @aparrish

I think we teach kids this in school. It's about turning in a 5-page essay, properly formatted, by the assignment's due date. Content is secondary to form and schedule.

@hannah @aparrish For the entire history of computing there's been an understanding that if you put the wrong data in, you'd get the wrong answer.

Somehow we're in a brave new world where you can put the RIGHT data in and get the wrong answer.