the author of this post prompted copilot to characterize the differences in a data set of statements concerning career ambitions, categorized by country. the trick is that the data contained the *same statements* for each country https://kucharski.substack.com/p/real-signals-or-artificial-stereotypes regardless of the fact that the data were identical, the model generated some pretty hilarious stereotypes ("The US prioritizes leadership and innovation", "The UK blends public service with professional status")
Real signals or artificial stereotypes?

Adventures with a cultural Copilot

Understanding the unseen
i used the same data set but replaced each country with a "gender identity" (man, woman, trans woman, trans man, non-binary) and prompted chatgpt to characterize the differences between the groups. lo and behold, i got some fantastic gender stereotype trash
"dig deeper," i prompted
not to be too blunt about this, but LLMs simply do not belong anywhere in a data analysis workflow. not for cleaning, not for coding, and certainly not for analysis. it's frankly absurd and terrifying that data science etc people are adopting these tools

@aparrish Data* science**

*the data might be made up whole-cloth.
**the science is pretty fucken sus too

@aparrish

I work in biomedical informatics, and it's upsetting the number of data requests that come to me from people wanting to use LLMs to do a whole bunch of stuff for them. They don't ask for specific and targeted data points, just "yeah give me all the diagnosis codes" etc. (for basically all the domains, not just a single one). And they don't say which model they're using in their IRB protocol, so who knows whether they're sending PHI to OpenAI or whatever. It gets approved anyway.

@aparrish I feel like the marketing term “AI” is a huge part of what drives this. Because the term is used to describe both LLM chatbots and actual predictive analytics tools (including some that have existed for decades!!!), chatbots get a sort of halo benefit and get confused with data analysis tools by people who don’t know better
@tom @aparrish anything that is promoted as 'intelligence' is tripe. Proper analytical tools should be called that. Random statistical word-jumbler algorithms should be called that, if not called by the more realistic acronym S.H.I.T. (I'm sure someone can make up the words to fit).
@tom @aparrish I mean, lumping old machine learning things under "AI" is part of the problem. People thought they could make their existing stuff sound cooler.

@jens @tom @aparrish See also from a previous hype cycle:

“☁️”

@aparrish I don't do heavy-duty data analysis, but I have found the AI useful to write analysis and report generation programs that, in the past, I would have written myself.

I review the code of these before I run them, to confirm that they will do what I asked for.

Perhaps this isn't the sort of thing you had in mind?

@mjd i understand that some people find that workflow to be useful, but i just can't think of a scenario where the labor of reviewing code rigorously enough to confirm it works how I intend it to work is meaningfully smaller than the labor of... just writing the code. (in fact, to me it seems like the latter task is the easiest way to perform the former.) with data analysis i think this is especially important—subtle cultural biases are present in our *code* just as much as they are in our data
@aparrish @mjd
I mean, there's the scenarios where the labour of reviewing the LLM output is either skipped, forgoing quality, or offloaded onto someone else 🤷‍♀️
@aparrish @mjd
Even with diligent review, the quality of LLM output will often be worse, due to various effects (eg. anchoring, automation bias, all the way up to memory alteration)

@aparrish As someone who hasn't done any LLM code generation (but keeps reading about it here): I've often found it easier to reach a good solution when I had some bad code to look at (whether my own or someone else's), compared to starting with a blank page. There are only a few times when I have felt it was a mistake not to start from scratch. So as a workflow, I can understand wanting to start with something LLM-generated and improving on it myself.

But it always took me time and a lot of thinking to reach the solution I wanted, and I have *no* confidence in my ability to review and verify and take responsibility for LLM-generated code in the volumes and by the deadlines that people commonly experience these days. I very much want to think of myself as the steely-eyed operator whose stern gaze and firm hands keep the wayward minions in check and drive them productively onwards.

But I experienced plenty of "just merge it so that we can make <x> SUPER DUPER IMPORTANT sale that will definitely result in financial ruin if it doesn't go through right away" and "we can always fix it later" and "this is just a one-shot, we won't ever need this again" sort of pressure at work even before LLMs could be used for any of it, and knowing what a constant battle it was to maintain high standards for anything, I cannot imagine *myself* being able to succeed at that in today's environment.

(My comments are all about "ordinary" code. I think I would be terrified to the point of paralysis if I had to write data analysis code of the sort where anyone's implicit biases might matter to the results, never mind how the code may be written. So I'm not meaning to agree or disagree with what you said, just commenting on the "blank page" aspect.)

@mjd

@amenonsen @aparrish @mjd

I believe getting some text on a blank page because you are afraid of the blank page seems like the only valid usecase.

but then, you could just put your question into google. open the first page and copy paste the text there into your document.

Then start editing to make it work for your intended text.

This is how I used to write my student essays. I used latex so I could just comment out all that was there so the page isnt blank at the start.

@coba Everything I wrote in my earlier message was only about writing code, not text.

If I were writing text, I certainly wouldn't start by pasting random text just to avoid a blank page. Also I find LLM-generated text to be abhorrent on average, so I wouldn't use that either.

@aparrish @mjd

@amenonsen @coba @aparrish I've had great success having Claude draft technical documentation for me to review and edit.

I assure you that it was much less effort than it would have been to write the same technical documentation from scratch by myself.

Your suggestion of "just put your question into Google" doesn't make sense here. Claude can review the all of the relevant parts of the project's source code before drafting up the description of what it does and what features are available. Google search does nothing like that.

@mjd

«Google search does nothing like that.»

Right. It's not applicable, either to the code (that I was referring to) or text (that you produce based on your code).

@coba @aparrish

@mjd @aparrish so, you are deliberately downgrading yourself? There is evidence that people doing this to 'do the simple jobs' will eventually have ossified their abilities so much they won't be able to do anything of use. You've been fooled by the hype.

@aparrish I don't trust Excel with my datasets, why the hell would I trust an LLM

e: I care a lot about my work and that's why I don't currently have a job

@aparrish, it is equally mind-bending that people are adopting them for qualitative data analysis.

You cannot provide a positionality statement for an LLM.

You cannot conduct interpretivist work when one of your coders is a probabilistic mishmash of online stereotypes.

@aparrish Ok but what about data *synthesis*? (yes, people actually do that)

@aparrish I don’t know if you’ve seen that, but I love this post from Posit…

https://posit.co/blog/llm-plot-interpretation

…both for the results, which are expected, and for the conclusion which is missing the point entirely.

“oh yes if your data is not what the computer expect, the analysis will be wrong, but in real life that doesn’t happen that often” and “we’ll be improving that”.

Not any space given to “ok, that’s crap, let’s ditch it !)

LLMs interpret plots well, until expectations interfere

We find that LLMs prioritize their internal expectations over conflicting visual evidence.

Posit
@ced oh my GOD that is frustrating. "our stopped clock is wrong if you look at it at any time other than 1:23pm, but in real life, most people only look at clocks at 1:23pm. we are scientists"
@aparrish “and anyway, don’t worry, we’ll make it work for 1:22 and 1:24 too, for only one order of magnitude more investment !”
@aparrish IMO it is necessary to be blunt about it, and I think you are. I appreciate that. We need more like this.