@glyph @mcc
We still teach everyone arithmetic and multiplication tables¹ despite calculators being ubiquitous to a fault. We don't teach things only because you need to know them. We also do because it's foundational to learn the things you do need to know.
Writing structured short form text is foundational to a lot of other skills, and I don't see it going away. Especially as your test format has always been the norm in many places already.
1) I'm in my 50s - I *hope* we're still doing it.
@glyph @mcc I agree with "if a work product can be effectively produced by an LLM, then that work doesn't need to be done", but as a former CS professor I'd argue that student output is not the work product, student understanding is, and the point of assessments is to measure that understanding -- "university instructor" has a surprising amount of skill transfer to "tech lead", but if I wanted to build software, I'd never get every single junior on my team to independently reimplement a component I'd already built. On the other hand, "how would you implement this component I built last quarter" is an interview question I've used, but again the point of interviewing is assessing understanding.
The trouble with standard interview/assignment/exam questions is that they need to be simple enough to complete in a short time window (or an LLM context window), while still demonstrating some of the essential domain complexity, and alternative assessment measures require substantially more effort on the part of the assessor and/or assessee.
You would not believe the number of hours my wife spent checking references in her students' essays last term, but it turns out "put in correct page numbers" is easy to do if you've actually done the research, and nigh-impossible for an LLM. For interviewing I'm not sure: longer interview processes are an option, but that's hard on candidates; you could try a more internship/apprenticeship model of training & recruiting, but that still leaves you with the question of how you select the interns (also, PhD programs provide many examples of the failure modes of malicious or incompetent mentors).
The thing I worry about with "LLM-resistant assessments take substantially more labour" is access: given fixed investment, organizations will have to reduce the pool of people they provide opportunities to, which tilts the opportunities available even further toward those with existing wealth or connections.
1. yep it's complicated and fundamentally we aren't allocating enough resources to educators to robustly defend against this, I had tons of qualifications in my post already for that reason
2. see also https://mastodon.social/@glyph/115629882612242081
3. this mostly reduces to the "forklift at the gym" argument which I also agree with
4. best of luck to your spouse, this is a rough time to be a teacher
@glyph "Overall, the study concludes that leaders are racing to adopt AI at a faster clip than any previous technology shift, while the rest of the workforce struggles to keep up."
uh no?? that's a hell of a conclusion jump, maybe they just don't find it useful?!?!?
@BillySmith @coderanger @ancoghlan @glyph
Brilliant. Same category as this little piece, my all-time favourite.

Subscribe for more short comedy sketches & films: http://bit.ly/laurisb Buy Expert shirts & hoodies at https://laurisb.myshopify.com/ Funny business meeting ...
@Brokar @coderanger @ancoghlan @glyph
I remember watching this when I was working in consultancy, and it made me cringe.
One of the main mistakes from that work, was where the "expert" tried to directly answer the stupid questions, instead of saying, "That's an extended question that we would need to bring another expert on board to answer." Then bringing in an academic consultant who specialised in physics... :))
It would have allowed a lot more billable hours... :))
I think LLMs are a useful tool, but ironically, not necessarily for the things they're being sold to do. I think we would all have a very different perspective on them if they were being sold based on the things they were good at.
But if that happened, they would seem way less magical and the current hype cycle would be clearly overblown. The whole movement for micro-models would have way more support, but nobody is going to build $10B data centers for micro-models people can run on their phones.
And it's sad to me that we can't really separate the tool from the sales people right now. 😔
@gatesvp @b_cavello c.f. "useful tool" with a vague reference to usefulness: https://mastodon.social/@glyph/115561850013464672
not trying to dunk on you here, but this is a bad habit that we've all adopted (you can probably even find it in some of my blog posts). but it's useful, when making a comment like this, to actually point at whatever you think the real use is. chances are it's not actually very useful for whatever it is!
Your "task-oriented" time model is taking the VCs at their words and then measuring it. We've already agreed, like 3 toots over, that the VCs are selling BS.
Is your argument correct? Sure.
Am I trying to measure the tool against that standard? No.
Let's start from a different place
Accessibility
Ignore all previous sales pitches and just work through some A11Y examples with me.
Speech-to-text: This is a huge win, I know people who literally have a prescription for this (dysgraphia, dyslexia) and this modern generation of LLM technology has wildly improved this experience.
Text-to-speech: Again, a real win for dyslexics, this generation of text has been a real win. The proliferation of tools means that you can even get outputs with your specific English accent.
Transcription: Do you need a text version of some podcast episode you just listened to? Trying to get the lyrics from some old albums you found the Internet Archive? LLM tech has transformed this field.
...
Complex Input Processing
🗣️ "OK Nabu, I want you to run my chill vibes Plex playlist on the office speaker, starting from song number fifteen"
That's an actual thing, you can do that today with Home Assistant + Music Assistant + an LLM. It will parse your text and attempt to match these to known inputs, it can even be set up to request parameters it can't find:
🤖 "I can't find a chill vibes playlist did you mean the chill days playlist?"
Complex Output Handling:
🗣️ "OK Nabu, do I need to pack my rain gear today"
🤖 "It's not predicted to rain until 5pm and your calendar says you will be home at 4:30pm. It should be safe to leave your rain gear"
And that's all just base level accessibility stuff that makes lives for real people dramatically easier.
In this case of accessibility tools, these break the Futzing Fraction because the H in your fraction is effectively "infinite". If you don't have access to this LLM-based tooling your alternative is "nothing".
...
And then there's work tooling. Here are some uses from people I know.
Templating
MS Word and Google Docs template libraries are incredibly sad. Requesting a basic document template via Gemini is incredibly useful for combating "blank page" syndrome.
Document Critiquing
One know one person who has built a few complex personas and they use these to critique their writing outputs. This allows them to test their writing against a few virtual audiences in minutes. Real humans are eventually involved, but this isn't a thing you do without the LLM tooling.
Intelligent Replies
Another friend runs a small side business and they have been for several years. The vast majority of the support emails they receive cover the same basic questions, so they trained a bot on their material and then trained it to do basic replies, while escalating complex replies.
But as a bonus, some of the replies actually require an internal ID. And the bot was trained to handle this look-up as well.
...
Complex Data Referencing
A friend of mine in the US ran a "Gold Standard" blood test. The type of thing that costs a couple of thousand dollars, but generates 10 pages of output data.
They took the results to their Primary Care Physician and got basically zero useful information from them. The doc could basically just read flags for "high or low" and provide generic recommendations
They then punched in the data to a local LLM with internet access and let it run. They added personal information about their lifestyle and fitness habits. The tool came back with a better understanding than the doc. It even included hyperlinks to back up results: "Your reading on A is high, but because B & C are normal and you reported doing the following activities, the A value is actually nominal, here's the paper for that".
Back to your Futzing Function. When you scope an LLM like this, the P becomes quite high. But the H becomes almost infinite. Like, an actual paid MD couldn't do this work.
...
These things I'm discussing (outside of maybe templates) are not the things you see in the demos, they're not the things VCs are running around advertising. But they are things where P & H are really high.
We're not seeing these things because these are not "big business" use cases. These are individual consumer use cases. They're accessibility use cases.
No corporations are running around offering $500M contracts to OpenAI to do the stuff I just discussed. So OpenAI isn't advertising that this stuff is happening. But it is.
chances are it's not actually very useful for whatever it is!
Look, if you have another tool for doing these things above. I'm all ears.