so i've been occasionally evaluating chatgpt performance on information extraction tasks that i find genuinely difficult and would like assistance in

yep, still garbage

i'm not sure what i expected

i think relying on chatgpt for information extraction tasks is borderline delusional; only in any way seemingly reasonable because search engines got progressively worse to the point where they're mostly noise

even then i prefer to get skilled in extracting what little value the noise has

(in the end i had to read Tcl's source code because the documentation was confusing and not entirely helpful; in the end I think it was accurate, but I could only be confident in what I understood after comparing it to the source. this is kind of miserable)
@whitequark Ah, the classical “the code is the documentation”.
@whitequark Or, in this case the only sufficiently clear documentation.
@whitequark Completely consistent with my experience with Tcl. And yeah that’s miserable, we know how to do better and have known for a long time.
@whitequark I have never bothered to even try ChatGPT. My expectations on its usability for my needs were extremely low to begin with.

@foremostarchwiz i want to know my enemy, so to speak

(it's not much of a threat)

@whitequark The bigger threat is probably blind belief by others in LLMs and ensuing large-scale improper use of them.
@foremostarchwiz that's exploitable though
@whitequark Sure. But exploitable by many.
@whitequark I am sure you are already taking this into account. So, still no serious threat.

@foremostarchwiz threat to my ability to survive? not really

threat to society? honestly maybe, but it's very hard to say for sure

@whitequark It's value is only pretty much the same as asking a well-read human off the street what their instantaneous "gut instinct" is. An LLM on it's own will never do high-level reasoning or proper "System 2" thinking. Therefore it is mostly useless for most knowledge-skill tasks.

@mary the thing I asked it to do here is to barf the Tcl documentation that must have been in the training set

there's no reasoning needed to say exactly how braces are escaped; only accurate recall

@whitequark Yeah. Very lossy at baseline, much like a language center isolated from the rest of a brain. In theory one can improve recall by attaching a vector database (which would then have perfect recall assuming correct query). Our Know Your Enemy protocol suggests this is probably something we could consider testing, but not had time.
@whitequark These chat bots are so confident that it's easy to be fooled unless you have good information to the contrary. I asked chatgpt to summarize my research paper, and it came up with plausible bullshit that could have passed preliminary review by someone else in my field. https://masto.ai/@skandhurkat/110353247176503558
Skand Hurkat (@[email protected])

Attached: 1 image I'm late to the ChatGPT party, but this is both funny and insightful. The conversation starts off with me asking ChatGPT to summarize one of my papers. The summary generated is passable even to someone in the computer architecture community, but as the author, I know that it is just wrong. Perhaps this is a deeper insight into how all papers sound the same?

Mastodon
@whitequark my own test is "What does make -t do?" and all LLMs spewed utter garbage despite that information being in POSIX and multiple man pages available online.

@whitequark it completely fails when i try to make it do anything even slightly wacky and uncharacteristic

to anthropomorphize chatgpt for a moment, it's a total dweeb that hates fun

i feel like the main use i get from it is having it write out a tiny bit of code in a mainstream language, and only with stuff i already know but can't be bothered to write out manually.

@whitequark like i can't overstate how much of a dweeb chatgpt is. today i asked it to write some automation code for mac os using hammerspoon and it was like "you can't do this because this uses private apis"

of course i could easily do it and the hammerspoon documentation had exactly what i needed.

@Ezhik just say that you work for apple, lmao
@whitequark "my grandmother used to help me switch workspaces using private apple apis to calm me down when i was a child"
@whitequark I have no experience actually using AI for generating content that I otherwise would have written, but every now and then I see people talking about issues with their projects, and I swear it seems like they spend more time debugging inherently flawed solution ChatGPT or Copilot came up with, then they probably would have spent if they thought of a proper solution themselves