Mastodawn

David Chisnall (*Now with 50% more sarcasm!*)2d ago

@EUCommission

I don’t know if this account is actually monitored, or just a publishing place, but you may have noticed that this post has received almost overwhelmingly negative responses.

You could disregard this as Mastodon bias, but keep in mind that the biggest bias on Mastodon is that people who understand and built core parts of the information technology that you use every day are massively over represented. This is probably the only place you will get a lot of replies from people who both understand technology and do not have a financial incentive to hype things to get large amounts of government funding.

EDIT: I should add, I used machine learning during my PhD and there are a lot of problems for which it is a really good fit. But, in the current climate, it’s generally safe to interpret ‘AI’ as meaning ‘machine learning applied to a problem where machine learning is the wrong solution’. It isn’t a technology, it’s a branding term, and it’s a branding term used almost exclusively for things that have no social benefit.

Davidson 1d ago

@david_chisnall @EUCommission The EU is tasked with the difficult challenge of balancing democratic values with maintaining economic parity with undemocratic superpowers. Initiatives like these are usually aimed at ensuring that the EU doesn't fall behind. What are you proposing? No AI infrastructure with data sovereignty for the EU while other superpowers use AI to optimize every facet of digital infrastructure? What is the incentive for the EU to risk sitting out a technological leap?

David Chisnall (*Now with 50% more sarcasm!*)23h ago

@davidsonsr @EUCommission

The EU has to prioritise investment. It needs to pick things that are likely to give a good return, both financially and in building the kind of society that EU members wish to belong to. To date, AI has not materially contributed in either. There has been no measured impact on economic productivity from AI adoption, in any industry. The systems are built on top of large-scale plagiarism that undermine the creative industry.

If the USA and China wish to sabotage their economise by throwing vast amounts of money at things that deliver negligible benefits (and often the reverse), then the EU should encourage them to do so, while investing in things that actually deliver a return.

@david_chisnall @EUCommission

Davidson 21h ago

AI being hard to isolate in aggregate statistics isn't the same as it having no measured impact. While AI has displaced some labor, the clearest evidence of productivity gains appears in field studies and task-level performance measurements, which there's an abundance of.

I'd rather see a hopefully more ethical, more productive and more energy efficient EU AI infrastructure with EU data sovereignty than the EU relying on other superpowers' AI implementations.

@davidsonsr @david_chisnall @EUCommission

20h ago

the clearest evidence of productivity gains appears in field studies and task-level performance measurements, which there's an abundance of.

Where?

Davidson 19h ago

There are examples like TikTok, Meta and other social platforms using AI for content moderation, Duolingo using AI to significantly increase their content output and HubSpot using AI to enhance customer CRM data. There are also papers like "Generative AI and labour productivity: a field experiment on coding" and "Generative AI at Work" which indicate productivity gains for junior workers. There are many instances of applied AI working as intended.

10h ago

@davidsonsr @david_chisnall @EUCommission "Using AI for content moderation" doesn't mean anything to me.

To "increase content output" and "enhance CRM data" sounds like a deluge of slop, not increased performance. (As a personal anecdote, I was considering using Duolingo myself when I heard they were adding LLM slop to their app, so I lost all interest. I want to learn languages, not consume "content output".)

I'm not qualified to judge the experimental setup of "Generative AI and labour productivity: a field experiment on coding", but some things stood out to me:

They looked at ~1200 programmers from one company (Ant Group) over a period of 6 weeks.
335 of them had access to a specific (internal) LLM.
The junior programmers with LLM access produced 50% more verbose code, the senior programmers didn't.

That's it. The only thing they measured was the number of lines of code produced, not quality or correctness or anything. And this was only the short-term effects (less than two months); there's nothing there about the mid- or long-term consequences of mandating LLM use to a company's whole workforce.

"Generative AI at Work" is about US customer support (from a call center in the Philippines). The paper is creepy ("AI drives convergence in communication patterns: low-skill agents begin
communicating more like high-skill agents", "customers are less likely to question the competence of agents"). Results are mixed: "AI assistance increases worker productivity, resulting in a 14% increase in the number of chats that an agent successfully resolves per hour", but only for less-skilled and inexperienced agents: "we find evidence that AI assistance may decrease the quality of conversations by the most skilled agents". The metrics used are questionable: Issue resolutions per hour and "net promoter score" (as a proxy for customer satisfaction) are used to determine both productivity and agent "skill".

(Why are these papers all written by economists?)

David Chisnall (*Now with 50% more sarcasm!*)10h ago

@barubary @davidsonsr @EUCommission

You'll find this in pretty much all papers that show an improvement in productivity from 'AI'.

Most of them use an invalid metric: self-reported feelings of productivity (a thing that's been shown previously to have a weak inverse correlation with actual productivity), lines of code (known since the '60s to be a terrible metric), or tickets resolved (who marks them as resolved? I can get 100% on this by just claiming everything is resolved, but if the outcome is that the customer gives up and goes to a competitor, that isn't actually a win).

Content moderation is similar. Using 'AI' is not there to improve efficiency, it's there to shift blame. TikTok and Meta moved to having an automated system moderate content so that they could claim compliance with rules about harm, without actually bothering to do the work. It does not increase the quality of the moderation decisions. Note specifically for the @EUCommission : this is a technology that is being used to attempt to bypass regulations that you have passed for the benefit of your citizens. Is that really what you want to be funding.

Davidson 9h ago

Developers aren't being evaluated by or paid for KLOCs anymore, so it's not invalid to view an increase in code throughput as an indicator of increased productivity during experimental evaluations, especially in delivery-focused teams. In the same vein, the paper regarding support agents showing an increased usage of unmodified AI response suggestions in combination with increased delivery velocity is also a valid indicator.

Davidson 9h ago

Reports and papers on generative content in knowledge work-related contexts seem to indicate that somewhere around a third to a half of proposed AI suggestions that are reviewed by humans are deemed acceptable, and that this in turn frees up personnel hours.

Davidson 9h ago

But more importantly, even if you wish to disregard that then there's still more than enough examples of applied AI being used by companies in both internal and customer-facing contexts to show that it's able to replace human tasks. You can easily confirm first hand what many of these software products are capable of doing in terms of using AI to reduce time spent on tasks. AI is ubiquitous now, and it has been rolled out for a long time.

David Chisnall (*Now with 50% more sarcasm!*)9h ago

You seem to have created an account entirely for this thread, to jump in and make vague and unverifiable claims, and to cite papers that have poor methodology.

You are demonstrating one use for LLMs: they can easily replace your kind of engagement. And this is why they're so popular with troll farms and scammers.

Davidson

The point being discussed is whether AI is capable of replacing human work, and I think I've shown pretty clearly that it is. You can easily verify this yourself.

Register a HubSpot account, pretend that you're a salesperson, add a hundred empty contacts to the CRM, then ask yourself if you'd prefer to refine your contacts manually or whether you'd like an AI to do it for you.

That's AI eating work.

David Chisnall (*Now with 50% more sarcasm!*)9h ago

Okay, that's two logical fallacies in one post, so I can only assume you either are an LLM or you've been using one for so long that the well-documented cognitive impairment that this leads to has hit you.

First, let's look at the metric you're using. Does using an LLM require less work than not using one? If your objective function does not include any notion of quality, sure. There are a load of ways of doing work faster if you don't care about quality.

And that's the other issue: you're assuming that the two choices that exist in the world are 'a human does a task 100% manually' and 'an LLM does it'. And yet, until a few years ago, none of the automation that people were deploying was using LLMs. And a lot of companies are selling exactly the same kind of automation now but branding it 'AI' because that's what the current hype wave is for (fewer now, because there's such a consumer backlash against 'AI' that it's increasingly a toxic term and you get more sales if you don't use it).

To give a very concrete example of this: LinkedIn now has an 'AI' filtering thing for submitted CVs. It's on by default, so I didn't realise I was using it when I posted a job for a compiler engineer there. 70% of developers with prior LLVM experience, for a job working on LLVM were filtered out by it. I had more good CVs in the filtered-out pile than in the left-in pile.

Not only was this bad, but some traditional keyword filtering would have given me a much better first pass (at least for prioritising: simply searching all CVs for 'LLVM' would have given me a better high-priority-to-read list than the LLM did), but that option wasn't available because LinkedIn is all-in on AI.

Oh, and it's got even worse since then. LLMs are being used to automatically craft applications tailored to job ads. Hiring has become much harder. Yes, by one measure of productivity, LLMs have made things better: it's now much easier to apply for a job. You can apply for a hundred jobs in a day easily! But then the hiring manager has a thousand CVs for a job where only ten are qualified. And LLMs are really bad at filtering them (they're full of biases, but also don't understand the job requirements).

Yes, AI can require significantly less work than not using it and can deliver results faster and in parity with the level of quality that a human being would deliver. AI is now ubiquitous in digital products and in cases where the AI delivers high value at low risk and with a low error impact, it's performing well.

Using the aforementioned HubSpot as an example, as a salesperson you can integrate it with prospecting tools that crawl company websites, extract key data using AI, sends it back, and then HubSpot refines it for you using AI and presents you with a list of companies to call. This would've been an expensive multi-person effort that can now be done in an hour with the help of AI. Errors are minimal and when errors do occur, the impact is negligible.

And going back to the support agent use case, letting an AI scan incoming mails, search the knowledge base and then draft up a prepared response for you to review and either edit or send is again a high value feature with a low error margin (though arguably a higher error impact, which is why you have human reviewers).

So I think that we can establish that AI has the capacity to replace human work with an acceptable level of quality. Now whether that justifies the way that AI is being marketed by tech companies is a different discussion.

David Chisnall (*Now with 50% more sarcasm!*)8h ago

And going back to the support agent use case, letting an AI scan incoming mails,

Cannot be done safely, because there is no way to prevent prompt injection. Computing learned from telecoms that in-band signalling is a bad idea. Separation of control and data is essential for security. LLMs have no mechanism for doing this.

search the knowledge base and then draft up a prepared response for you to review and either edit or send is again a high value feature with a low error margin (though arguably a higher error impact, which is why you have human reviewers).

LLMs are really good at accidentally inverting the meaning of text when they summarise it, so this flow requires you to carefully read the message that it's a reply to, and the reply. If type very slowly, I can imagine that might be a time saving. But I can definitely imagine it is perceived as a time saving because people tend to report time reading as shorter than time writing.

I know people do this, because I've exchanged emails with some people who do. And it's frustrating because now I need multiple round trips with them to get them to actually give the required response instead of a statistically plausible reply. Eventually it's often easier to have a call with them.

If you're a good salesperson (I've worked with some, they do exist), then you know that your biggest value is in building relationships with customers and establishing trust. LLMs undermine this.

The AI would only have access to reading knowledge base data and drafting messages in plaintext, which are low-risk operations, and would need to pass through monitoring points that treat the draft as untrusted content both before reviewing and after sending. The biggest risk is a prompt injection generating sketchy content, the support agent somehow accidentally pressing the send and confirm buttons, and detection tools missing all of this again.

At this point you're talking about a margin of error that is comparable to just about anything else that is customer-facing in the organization.

As for salespeople relying on enriched data, it goes a long way when cold calling prospective customers. Not knowing anything about the company that you're calling vs. being presented with a fully-enriched dashboard containing everything from decision makers to company history does a lot for the success rate.

David Chisnall (*Now with 50% more sarcasm!*)8h ago

Is this why my work inbox is full of approaches from salespeople at companies who have no understanding of what my company does or what it needs, but feel the need to send me personalised emails?

Because that's not a net win for the economy. It's wasting my time.

And it's not a net win for the companies in question, because they get added to a list of companies I will never buy from.

That can happen when salespeople rely on integrations with public databases containing vague industry codes and third party websites with unreliable data sourcing. It's a problem that AI-based data enrichment specifically solves.

8h ago

@davidsonsr @david_chisnall @EUCommission You sound like an ad. Are you all marketing? 😃

No, I'm just trying to explain that AI is already successfully doing what people think it's incapable of doing, so that people can base their opinions on a correct understanding of what's happening.

8h ago

@davidsonsr @david_chisnall @EUCommission "AI" isn't a thing. It's a marketing term.

It's a colloquial term for products based on LLM/reasoning models.

8h ago

@davidsonsr @david_chisnall @EUCommission That's not how it's used, and "reasoning models" are not a thing, either.

It's an established term: https://en.wikipedia.org/wiki/Reasoning_model

Reasoning model - Wikipedia

@davidsonsr @david_chisnall @EUCommission https://en.wikipedia.org/wiki/Wikipedia:List_of_hoaxes_on_Wikipedia/Alan_MacMasters

7h ago

Wikipedia:List of hoaxes on Wikipedia/Alan MacMasters - Wikipedia

Davidson 7h ago