Justin D. Norman

@justin_time
20 Followers
26 Following
10 Posts
Computer Vision & ML PhD @BerkeleyiSchool
web: justintime.ai

New Paper! We found that there is little scientific work that attempts to measure the prevalence of language model hallucination in a comprehensive way. We argue that language models should be evaluated using repeatable, open, and domain-contextualized hallucination benchmarking.

https://arxiv.org/abs/2505.17345

#ai #llm #machinelearning

Language models should be subject to repeatable, open, domain-contextualized hallucination benchmarking

Plausible, but inaccurate, tokens in model-generated text are widely believed to be pervasive and problematic for the responsible adoption of language models. Despite this concern, there is little scientific work that attempts to measure the prevalence of language model hallucination in a comprehensive way. In this paper, we argue that language models should be evaluated using repeatable, open, and domain-contextualized hallucination benchmarking. We present a taxonomy of hallucinations alongside a case study that demonstrates that when experts are absent from the early stages of data creation, the resulting hallucination metrics lack validity and practical utility.

arXiv.org
Back for my yearly academic progress report. Passed quals yesterday. I guess we’re actually doing this.
It’s January. Time to get serious about my PhD preliminary exam prep, just over a month to go! #machinelearning #phdIRL #academia
December: Finalized my dissertation. Received verbal approval. Filed.
Now working on more applications for academic positions, a couple reviews, figuring out plans for the spring, and spending more time with my 11-week-old.

Very interesting paper that finds immersion in gaming culture, rather than in games per se, can lead to toxic outcomes

https://gnet-research.org/2022/11/11/extremist-action-in-digital-gaming-spaces-the-role-of-identity-fusion/

Extremist Action in Digital Gaming Spaces: The Role of Identity Fusion - GNET

GNET
#GodOfWarRagnarok is like playing a Jason Statham movie: I’m entertained at the moment, but there’s probably something else I should be doing.
Just a bunch of video games, guitar, raspberry pis and PhD stuff
Gotta say it’s refreshing to start an account with absolutely no corporate purpose or agenda. Just me.
Hi.