Dan Conway (@magisterconway.bsky.social)

So, um... this is bad. Really bad. I looked at the letters that were translated by the AI, and the very first one I found was almost entirely hallucination. Thread: [contains quote post or other embedded content]

Blacksky

"AI" users are like, "I know this is imprecise but as a convenience these transcriptions are better than nothing"

then 70 years from now we'll still be struggling to debunk these entirely hallucinated transcriptions of thousands of manuscripts that were pissed into the pool of human knowledge.

some things are worse than nothing. "signal-shaped noise" is worse than nothing.

@elilla transcription / translation is one of the areas where I see a good use for LLMs at the moment. But, only as a first-pass.

I use Speech Note to do a first pass at transcribing audio from talks and such that I will write about. But I also go back and watch the talk and clean up the transcript -- I'm not blindly trusting the output, I'm just trying to speed up the act of typing it out and saving some wear and tear on my hands.

An LLM-generated translation or transcription that is not verified is, IMO, generally a dangerous thing. It might be fine for local use to try to get the gist of something, but no organization should be publishing those types of things without verification.

@elilla I experimented with using ChatGPT to do OCR on old scanned assembly code listings.

Columnar text has always been a huge challenge for OCR, and I had already tried Tesseract and given up on it.

At first I thought the results from ChatGPT were a revolutionary leap in the state of the art.

Then I looked closer - it had reworded the comments and headers. It even changed the code in places, swapping out entire mnemonics and parameters.

Like any good sloperator I tried to prompt may way around this, which was met by effusive apologies and assurances that it would, going forward, be sure to never do that again.

Which of course, it immediately did.

I suspect there's only the most tenuous thread of context between a "multi-modal" LLM's text and image capabilities - they're basically just two models duct-taped together.

I find this particularly disturbing as if someone simply doing an editorial pass looking for spelling or grammar errors may not notice that the content appears fundamentally correct, but was actually altered.

I would rather wade through a sea of Tesseract's obvious typos than have to take on the much higher cognitive burden of making sure grammatically correct sentences weren't invented wholesale.