Mastodawn

✧✦Catherine✦✧Feb 10, 2024

so i've been occasionally evaluating chatgpt performance on information extraction tasks that i find genuinely difficult and would like assistance in

yep, still garbage

Show thread

✧✦Catherine✦✧Feb 10, 2024

i'm not sure what i expected

foldl Feb 12, 2024

Show thread

✧✦Catherine✦✧

i think relying on chatgpt for information extraction tasks is borderline delusional; only in any way seemingly reasonable because search engines got progressively worse to the point where they're mostly noise

even then i prefer to get skilled in extracting what little value the noise has

Show thread

✧✦Catherine✦✧Feb 10, 2024

(in the end i had to read Tcl's source code because the documentation was confusing and not entirely helpful; in the end I think it was accurate, but I could only be confident in what I understood after comparing it to the source. this is kind of miserable)

Show thread

Megumin Feb 10, 2024

@whitequark Ah, the classical “the code is the documentation”.

Show thread

Megumin Feb 10, 2024

@whitequark Or, in this case the only sufficiently clear documentation.

Show thread

Aaron Sawdey, Ph.D.Feb 10, 2024

@whitequark Completely consistent with my experience with Tcl. And yeah that’s miserable, we know how to do better and have known for a long time.

Show thread

Megumin Feb 10, 2024

@whitequark I have never bothered to even try ChatGPT. My expectations on its usability for my needs were extremely low to begin with.

Show thread

✧✦Catherine✦✧Feb 10, 2024

@foremostarchwiz i want to know my enemy, so to speak

(it's not much of a threat)

Show thread

Megumin Feb 10, 2024

@whitequark The bigger threat is probably blind belief by others in LLMs and ensuing large-scale improper use of them.

Show thread

✧✦Catherine✦✧Feb 10, 2024

@foremostarchwiz that's exploitable though

Show thread

Megumin Feb 10, 2024

@whitequark Sure. But exploitable by many.

Show thread

Megumin Feb 10, 2024

@whitequark I am sure you are already taking this into account. So, still no serious threat.

Show thread

✧✦Catherine✦✧Feb 10, 2024

@foremostarchwiz threat to my ability to survive? not really

threat to society? honestly maybe, but it's very hard to say for sure

Show thread

Mary

Feb 10, 2024

@whitequark It's value is only pretty much the same as asking a well-read human off the street what their instantaneous "gut instinct" is. An LLM on it's own will never do high-level reasoning or proper "System 2" thinking. Therefore it is mostly useless for most knowledge-skill tasks.

Show thread

✧✦Catherine✦✧Feb 10, 2024

@mary the thing I asked it to do here is to barf the Tcl documentation that must have been in the training set

there's no reasoning needed to say exactly how braces are escaped; only accurate recall

Show thread

Mary

Feb 10, 2024

@whitequark Yeah. Very lossy at baseline, much like a language center isolated from the rest of a brain. In theory one can improve recall by attaching a vector database (which would then have perfect recall assuming correct query). Our Know Your Enemy protocol suggests this is probably something we could consider testing, but not had time.

Show thread

Skand Hurkat Feb 10, 2024

@whitequark These chat bots are so confident that it's easy to be fooled unless you have good information to the contrary. I asked chatgpt to summarize my research paper, and it came up with plausible bullshit that could have passed preliminary review by someone else in my field. https://masto.ai/@skandhurkat/110353247176503558

Skand Hurkat (@[email protected])

Attached: 1 image I'm late to the ChatGPT party, but this is both funny and insightful. The conversation starts off with me asking ChatGPT to summarize one of my papers. The summary generated is passable even to someone in the computer architecture community, but as the author, I know that it is just wrong. Perhaps this is a deeper insight into how all papers sound the same?

Mastodon

Show thread

إمي ❄️🏳️‍⚧️🇵🇸Feb 10, 2024

@whitequark my own test is "What does make -t do?" and all LLMs spewed utter garbage despite that information being in POSIX and multiple man pages available online.

Show thread

Internet Hedgehog 🦔Feb 10, 2024

@whitequark it completely fails when i try to make it do anything even slightly wacky and uncharacteristic

to anthropomorphize chatgpt for a moment, it's a total dweeb that hates fun

i feel like the main use i get from it is having it write out a tiny bit of code in a mainstream language, and only with stuff i already know but can't be bothered to write out manually.

Show thread

Internet Hedgehog 🦔Feb 10, 2024

@whitequark like i can't overstate how much of a dweeb chatgpt is. today i asked it to write some automation code for mac os using hammerspoon and it was like "you can't do this because this uses private apis"

of course i could easily do it and the hammerspoon documentation had exactly what i needed.

Show thread

✧✦Catherine✦✧Feb 10, 2024

@Ezhik just say that you work for apple, lmao

Show thread

Internet Hedgehog 🦔Feb 10, 2024

@whitequark "my grandmother used to help me switch workspaces using private apple apis to calm me down when i was a child"

Show thread

scarfish Feb 11, 2024

@whitequark I have no experience actually using AI for generating content that I otherwise would have written, but every now and then I see people talking about issues with their projects, and I swear it seems like they spend more time debugging inherently flawed solution ChatGPT or Copilot came up with, then they probably would have spent if they thought of a proper solution themselves