Specific line of thought to illustrate my general point:
Consider an LLM that helps manage email correspondence. It writes emails! It summarizes emails! Less reading! Less typing! More messages faster! Productivity boost!! Except:
- You have to babysit the LLM, guide it and check it to make sure it’s accurately preserving human intent (which is, after all, the whole point of communication…right??). That’s new work, and likely cancels out the slim time savings of reduced reading and typing.
2/
- But it's an LLM, so it’s still often wildly, convincingly incorrect. Miscommunication increases. Miscommunication has costs. Miscommunication generates new work. Which now gets done faster! And generates yet more work!
- IT staff has to administer the LLM, support the LLM, evaluate vendors, yada yada.
- People have to maintain the LLM itself, and the infra that supports it. Those costs are •large•.
3/
And if by some magic all of this actually spins up and gets working, then (1) the barrier to communication decreases (why not just send another email if it’s automated?), (2) individual communication load increases (because you can answer emails at a faster rate), and (3) the net efficiency of communication decreases (because of everything in the previous two posts).
Sound and fury, signifying nothing.
4/
I severely doubt many real orgs measure actual desired large-scale outcomes well enough to spot that net efficiency decrease. All this is going to look like increased productivity. Will •be• increased productivity in the ways that most folks actually measure it.
But here, with the bird’s-eye view of a hypothetical, it’s clear: the total amount of work happening to achieve the same ends has •increased•.
5/
I said “reduce total workload.” What are some thing that accomplish •that•?
“Do we really have the problem we think we have?”
“There’s a simpler way.”
“Work from home!”
“Hmm, I’m going to think about my reader, and edit for clarity and emotional impact before sending this email.”
“We’re willing to pay for experience / expertise.”
“Things are going well. Head home for the day!”
“Maybe we don’t need to do this thing anymore. We can just choose not to have this problem.”
/6
A lot of things that get billed as a productivity boost sound suspiciously to me like recipes for reducing operational slack and thus “going solid:” https://en.wikipedia.org/wiki/Richard_Cook_(safety_researcher)#%22Going_solid%22
/7
As both a software developer and a teacher, I’m increasingly interested in figuring out which costly things are avoidable, or can be simplified, or •just don’t matter•…and then doing less of them.
Breathing room can be a form of efficiency too. And it’s a more humane one.
Less about tools that boost productivity, more about tools that reduce total workload.
/end
@inthehands So much this! 💯
These LLM tools just lack so much context!
- What is actually important for the person that receive the email?
- What is actually important in this wall of text in the current context?
I've actually just done an experiment that shows this (with Chat LMSys):
Some details in the next posts...
1/3 (wow, a thread within a thread🤯)
@inthehands I've given it the following prompt:
"Please summarize the following text in max 4 sentences:"
and then I've given it the pure text of the following blog post:
https://blog.rust-lang.org/2024/05/17/enabling-rust-lld-on-linux.html
There is a summary at the end of the actual blog post (that's what makes this experiment so interesting!), _which is not part of the prompt_.
Please see the image below:
2/3
@inthehands ...and now have a look at the #AI summary below by GPT4o and Gemini 1.5.
While it perfectly got it right (this time!), the most crucial bit on how to disable this new linker is not present in the summary (see image below).
This is why context and details matter, which #LLMs will always miss!
Writing requires #empathy - an #LLM lacks it.
3/3
@inthehands to throw my 2 cents in: having "slack" or "breathing room" also can lead to situations where people start looking for, or just noticing outliers, and solving interesting but not urgent problems.
These problems would have become urgent if not spotted earlier, but slack lets them become non-threatning.
I have personally lost count of the amount of times I prevented a serious problem by having the luxury of reading logs leisurely for a little while longer.
@jjcelery @inthehands @fasterthanlime exactly! Today... for the first time in... ever, I sat in front a tail -f, opened a bag of chips – like for watching a TV show – and learned things about the container I'm running. I stopped and started it multiple times in various stages to get a feeling how it behaves in it's future production environment.
I learned that it doesn't like to have its persistent data on an NFS. Well... too bad, that's not gonna change. I found a few coupling issues and de-coupled them.
Today was a good day.
@pejacoby @inthehands A large problem is really three medium problems in a trench coat. Each of those is really three small problems in a trench coat. Each small problem is three racoons in a trenchcoat.
To summarise: each large problem is actually 27 racoons and 10 trench coats. That's why you need a lot of time and snacks to deal with it.
Accurate. I wrote about this once: https://hachyderm.io/@inthehands/111388010488223175
The number of problems is rising fast, and all the problems have problems. All you can see is problems. You have forgotten what your original problem was. You have forgotten what your business does. You have forgotten your own name. Now you are a software company. Time is not a flat line; it is a hall of mirrors, and all you can see is the endless infinity of your own hubris, your own human fallibility, now reified, grown mighty, and turned upon you, devouring all. Time to build more software.
@inthehands @pejacoby let's combine our efforts.
I'll feel bad about it every Tuesday, Thursday, and Saturday, and you take the rest, deal?
@inthehands So much this! At the high school I teach at, if you complain about workload / suffer stress / miss deadlines etc, the general administrative response is to allocate you to an online course on "efficient time management".
Thus adding yet another burden on your time. It's the budgeting in poverty problem - my problem is not how I budget my time, it's the fact that I need more time!
(Not in my department, though. My HOD understands this.)
@inthehands Yeah, during a particularly bad CA budget year I was introduced to the tech concept of:
Do less with less.
Sometimes it's what layer 8 requires.
@inthehands Fair points, but you seem to be comparing an LLM-based system against one with perfect efficiency, instead of the existing human-based system (which I’m certain has its own set of failings).
While it’s useful to know how an LLM system deviates from the ideal, I’d be far more interested in how it compares against the existing system. Personally, I don’t need a system to be perfect - I just need it to be better
@inthehands Hmm. Having been on the receiving end of thousands of emails throughout my career, I’m not convinced of this. I’m not ready to believe that the “failure rate” of LLM-generated emails is higher than that of human-generated emails.
This actually sounds like it’d be a really cool study to run