Mastodawn

#AI #ChatGPT #AISlop #LLM #LLMFail #Education #HigherEducation #AcademicChatter

Christopher Kyba 🇨🇦🇪🇺

My daughter just came up with a great exercise: challenge your students to find the title of your PhD using ONLY LLMs (no Google allowed). If any of them manage, they get gummy bears 😃

I asked five different models, and got five different answers, all five of which were completely wrong 😂

@skyglowberlin "(no Google allowed)" That makes this a pretty foolish exercise to suggest IMO. The bot's guess will be no different than anyone else who's never heard your bio.

@Jay42 And what do you suppose I hope the students might learn from such an exercise?

@skyglowberlin They'll learn their teacher gives them arbitrary tasks that have no bearing in reality.

AutoVectis 4d ago

@Jay42 Over the years I have taught there is always a subset of students who have difficulty in understanding that the overt task you set is merely a vehicle to carry a much more valuable growth/learning lesson. When you lift weights in the gym - is the point of that exercise to just arbitrarily get those weights higher off the floor? Or are they just a simple representative example, to help you get stronger, so that you can lift something for real when you have to?

@autovectis We see in current day that some people can use trial and error and still not learn (example: https://www.theguardian.com/technology/2025/jun/18/whatsapp-ai-helper-mistakenly-shares-users-number ) which further reinforces that it's a foolish exercise. With your analogy, would you teach them the incorrect way to lift weights? It's the equivalent of letting them hurt themselves lifting incorrectly to "teach them a lesson." Are you saying it's a smart exercise? It's a fool's errand.

‘It’s terrifying’: WhatsApp AI helper mistakenly shares user’s number

Chatbot tries to change subject after serving up unrelated user’s mobile to man asking for rail firm helpline

The Guardian

AutoVectis 4d ago

@Jay42 - That's not a good example because it carries real health risks. Getting people to look something up online using, a deliberately bad strategy, carries no risk. But I suspect you're making a couple of assumptions here - that this is the only strategy that is being used (Rather than one strand out of a bunch that are being used) And, secondly, that the reason behind the task has not been explained to the students? Is this approach something that frustrates you as a student?

@autovectis "That's not a good example because it carries real health risks." I was correcting your bad analogy. Seems your ego couldn't take it.

@autovectis I like the weight lifting analogy. We're watching @skyglowberlin telling his students to lift with their backs so some might learn that it's incorrect form. If any of his students are so naive to believe him he's also potentially taught them nothing.

AutoVectis 4d ago

@Jay42 I wish you good luck with your studies and look forward to buying burgers off you in the future.

@autovectis Hilarious ad hominem, you really won the argument with that one. Yet I'm the one being called rude and thick.

GLC 4d ago

@Jay42 @skyglowberlin

Welcome, reply guy.

Muting, as I don't have patience (too old), but with best wishes for your personal journey.

@glc@mastodon.online @skyglowberlin Then why say anything? You didn't have to be a weirdo.

Leonard Ritter 4d ago

@Jay42 @skyglowberlin "it's an excellent FPS game as long as you don't try to walk through the walls" (because it has no wall collision. you can just walk through.)

Konrad M. Lawson 4d ago

@skyglowberlin I think it is a useful exercise! I think some useful takeaways would be: (Without search tool turned on, and sometimes even with search tool available) 1) even the best LLMs are usually willing to guess answers to extremely obscure questions, like yours. 2) they are very likely to get them wrong.

Dodo 4d ago

@Jay42 it should give out "No result" instead of phantasizing. Wrong is worse than none. @skyglowberlin

@Dodo_sipping @skyglowberlin Many do when you stop roleplaying fantasies with the bot.

@Dodo_sipping What I hope they would learn is that LLMs don't actually know anything, so they can't know when to not give an answer. All texts are made up fantasy, it's just that for some topics the fantasy happens to be close to reality or even true.

But you can never tell unless you do your own research.

Tino Eberl 4d ago

The chatbot is better. But you used the API or the model directly, right?

@tinoeberl U of A is still wrong. I did a bachelor degree there. Try telling it I didn't get a degree in Canada - when I tried that, it said I got my degree in Heidelberg, then when I told it my degree was in the USA it said my PhD was from Berkeley, and when I said no, it was in the eastern US, then it said my PhD was from Brown.

I was using the 4.1 model.

I have seen "better" results in the past, meaning the probabilistically generated text was closer to the truth, but it's never actually been correct. And every time I have tried the models have always gotten wrong who my collaborators from that time were, despite about a dozen papers where we're listed together. If anything, they seem to be doing worse than they manged 6-9 months ago.

Arta 4d ago

@skyglowberlin ChatGPT answered absolutely correctly about me and my dissertation. Found on web, of course.

@Arta Interesting. What does it get for you for the prompt I used ("Where did Christopher Kyba get his PhD, and what was the title?")?

Both of these returned to me just now are wrong.

Arta 4d ago

@skyglowberlin i asked (in Latvian): do you know where Arta Snipe got her PhD and What were her thesis about. At first it said they do not have personal info, but when I answered that this is publicly known stuff, it came back with an accurate answer.

@Arta Sorry, I wasn't clear - I was curious what would happen if you asked for my name - whether maybe the model you are using is doing a better job of finding additional data or something.

Arta 4d ago

@skyglowberlin a, ok 😆

Arta 1d ago

@skyglowberlin finally remembered to ask 😅
How far from truth is it? 😃

Christopher Kyba 🇨🇦🇪🇺1d ago

@Arta Thanks for sharing. That is accurate - you would get gummy bears 🙂

Did you have to tell it to look online, or did it do that automatically?

Arta 1d ago

@skyglowberlin I just said t do the same it did for me, for your name. So 40:60, it was or previous prompts, that it is public information (I did not specifically asked to do online search).

Ryek Darkener 4d ago

Wait until you’re famous. Then all LLMs will know you. ;)

tewe 3d ago

This exercise won't work for my students - at least with brave AI:

The brave AI was pretty good with mine. It took me only one extra specification to find the correct title.

@tewe I'm not quite sure what you mean - was the first answer wrong, and then you gave it a hint of some kind?

tewe 3d ago

@skyglowberlin
In the first answer it said that it hasn't enough information to answer my question correctly. It gave me several persons with the same name, but phds in different research topics.
With a hint to my university it came to the correct title.

@tewe Even when I've provided the University, the LLMs haven't gotten it right. The other hilarious thing has been asking them whether I was a colleague with other PhD students from my group (with whom I've published), because (so far) the LLMs always insist that we were not colleagues.

You can also try "what has [author 1] published with [author 2]". That is usually good for generating entirely plausible sounding titles that are completely made up.

Norbert Forster 3d ago

@skyglowberlin temperature = 0, avoids hallucination (pixtral-12b)?

@fusion I'm not quite sure what you mean by "avoids hallucination"? I mean, reduced variability from the default model, sure, but unless they directly reproduce training data, all text output from LLMs is made up.

But it's a great example, because GFZ isn't a degree granting institution. That's a nice bonus demonstration of how LLMs don't actually "know" anything.

Norbert Forster 3d ago

@skyglowberlin “Hallucination” means the deviation from the trained data for the statistical processing of the answers.
BTW: An LLM is not designed as a “reference book” and is imho therefore the wrong tool for the job.

@fusion Any text that is not directly reproduced from the training set is according to that definition a "hallucination", which means that nearly everything they produce is a "hallucination". That's why I don't think it's a useful term. In general parlance, people use the term "hallucination" when an LLM says something that is not truthful. But (except when reproducing training data directly), every sentence is literally made up. It's just that in a lot of cases, the made up text happens to be true.

In the text you posted, even with temperature set to zero, it produced an incorrect answer, which surely does not appear in any training set (because it's not true). That's why I didn't understand what you meant by "avoids hallucination".

I completely agree with you that an LLM is the wrong tool for this job. That is the point of the excercise.

Thomas Berker 3d ago

@skyglowberlin I just asked Claude and it got place and title of my really obscure thesis exactly right. What is wrong with LLMs is unfortunately not only that they are sometimes wrong.

@thomas There's a lot of things wrong with them 😃

I tried you on GPT and it returned Stuttgart. Out of curiosity, what do you get for me on Claude?

Thomas Berker 3d ago

Claude (which has Internet access):

"University: Christopher Kyba received his PhD in Experimental Particle Physics from the University of Pennsylvania in 2006

Thesis Title: "Measurement of the Atmospheric Neutrino Induced Muon Flux at the Sudbury Neutrino Observatory"

Additional Details:

- His full name is Christopher Conrad Maximillian Kyba
- He received his BSc (honors) in Physics from the University of Alberta in 2000
- His PhD work was conducted as part of the famous Sudbury Neutrino Observatory (SNO) experiment, which was instrumental in solving the solar neutrino problem and demonstrating neutrino oscillations
- His involvement with SNO began with a serendipitous encounter at a particle physics conference while he was an undergraduate, where he met the head of SNO at Queen's University, leading to a summer internship and eventually his PhD work

Christopher Kyba is now a prominent researcher in light pollution studies at the GFZ German Research Centre for Geosciences and Ruhr-Universität Bochum, having transitioned from particle physics to environmental remote sensing research."

Correct?

@thomas Thanks! There are two minor errors, but if you were my student, you would definitely get some gummy bears 😃

It's funny that I can tell where it found the information. This EOS article, for example: https://eos.org/features/christopher-kyba-luck-in-light-pollution

And I think very likely my CV page at GFZ.

Anyway, the problem remains that unless you actually look everything up, you don't know whether it is true or not.

Christopher Kyba: Luck in Light Pollution

A series of serendipitous encounters shaped Kyba’s path from particle physicist to dark sky defender.

Eos