I thoroughly agree with business ethicist Dr. Dorothea Barr’s post about #AI and #hallucinated #references in #research articles. Headline: “Hallucinated References: Five Excuses for Academic Misconduct.” https://dorotheabaur.ch/en/texts-and-media/hallucinated-references-five-excuses-for-academic-misconduct/
Hallucinated References: Five Excuses for Academic Misconduct

When I proposed rejecting papers with hallucinated references, the support was overwhelming. But the critical voices revealed five argument patterns: from AI hype through TINA rhetoric to nihilism.

Dr. Dorothea Baur

The punchline of this research is in the paper's title. You might think authors would be alive to this?! 😬

"Our analysis reveals that nearly 300 papers contain at least one HalluCitation [...] Notably, half of these papers were identified at EMNLP 2025, the most recent conference, indicating that this issue is rapidly increasing..."

HalluCitation Matters: Revealing the Impact of #Hallucinated #References with 300 Hallucinated Papers in ACL Conferences https://doi.org/10.48550/arXiv.2601.18724 #GenAI #scholcomm

HalluCitation Matters: Revealing the Impact of Hallucinated References with 300 Hallucinated Papers in ACL Conferences

Recently, we have often observed hallucinated citations or references that do not correspond to any existing work in papers under review, preprints, or published papers. Such hallucinated citations pose a serious concern to scientific reliability. When they appear in accepted papers, they may also negatively affect the credibility of conferences. In this study, we refer to hallucinated citations as "HalluCitation" and systematically investigate their prevalence and impact. We analyze all papers published at ACL, NAACL, and EMNLP in 2024 and 2025, including main conference, Findings, and workshop papers. Our analysis reveals that nearly 300 papers contain at least one HalluCitation, most of which were published in 2025. Notably, half of these papers were identified at EMNLP 2025, the most recent conference, indicating that this issue is rapidly increasing. Moreover, more than 100 such papers were accepted as main conference and Findings papers at EMNLP 2025, affecting the credibility.

arXiv.org
@mguhlin Just had a chat with a colleague who was vibecoding on Canva to create a website. It looked good and it was almost finished when it #hallucinated. Since the code is elsewhere she had to start again. At that point it was her 66th iteration with several prompts. I guess it’s doable so long as one has the patience or is ready to accept that it has its shortcomings.

Another week -- which means another research paper questioning whether #LLMs should go anywhere near the scholarly research process. And the answer is, unsurprisingly, 'no'. #ChatGPT 4.0 and #Bard #hallucinated #references in circa 29% and 91% of cases. But there are many other worrying observations in this study.

Hallucination Rates and Reference Accuracy of ChatGPT and Bard for Systematic Reviews
https://doi.org/10.2196/53164 #LLM #scholcomm #AI #search #discovery #hallucinations

Hallucination Rates and Reference Accuracy of ChatGPT and Bard for Systematic Reviews: Comparative Analysis

Background: Large language models (LLMs) have raised both interest and concern in the academic community. They offer the potential for automating literature search and synthesis for systematic reviews but raise concerns regarding their reliability, as the tendency to generate unsupported (hallucinated) content persist. Objective: The aim of the study is to assess the performance of LLMs such as ChatGPT and Bard (subsequently rebranded Gemini) to produce references in the context of scientific writing. Methods: The performance of ChatGPT and Bard in replicating the results of human-conducted systematic reviews was assessed. Using systematic reviews pertaining to shoulder rotator cuff pathology, these LLMs were tested by providing the same inclusion criteria and comparing the results with original systematic review references, serving as gold standards. The study used 3 key performance metrics: recall, precision, and F1-score, alongside the hallucination rate. Papers were considered “hallucinated” if any 2 of the following information were wrong: title, first author, or year of publication. Results: In total, 11 systematic reviews across 4 fields yielded 33 prompts to LLMs (3 LLMs×11 reviews), with 471 references analyzed. Precision rates for GPT-3.5, GPT-4, and Bard were 9.4% (13/139), 13.4% (16/119), and 0% (0/104) respectively (P<.001). Recall rates were 11.9% (13/109) for GPT-3.5 and 13.7% (15/109) for GPT-4, with Bard failing to retrieve any relevant papers (P<.001). Hallucination rates stood at 39.6% (55/139) for GPT-3.5, 28.6% (34/119) for GPT-4, and 91.4% (95/104) for Bard (P<.001). Further analysis of nonhallucinated papers retrieved by GPT models revealed significant differences in identifying various criteria, such as randomized studies, participant criteria, and intervention criteria. The study also noted the geographical and open-access biases in the papers retrieved by the LLMs. Conclusions: Given their current performance, it is not recommended for LLMs to be deployed as the primary or exclusive tool for conducting systematic reviews. Any references generated by such models warrant thorough validation by researchers. The high occurrence of hallucinations in LLMs highlights the necessity for refining their training and functionality before confidently using them for rigorous academic purposes.

Journal of Medical Internet Research

Oh my god, I just tried out GitHub #copilot Chat for the first time. Yesterday I used the default model from #vscode and it #hallucinated a lot and did not come to any positive result.

Then I read more about https://docs.github.com/en/copilot/using-github-copilot/ai-models/choosing-the-right-ai-model-for-your-task and tried out Claude Sonnet 3.7.

People spending money on this?! It couldn't achieve a basic instruction. 😬

#claudesonnet

This man was killed four years ago. His AI clone just spoke in court.

AI continues to trickle into courtrooms, from 'hallucinated' court cases to deepfaked videos.

Popular Science

An #AI Customer Service #Chatbot Made Up a Company Policy—and Created a Mess

When an AI model for code-editing company #Cursor #hallucinated a new rule, users revolted.

https://www.wired.com/story/cursor-ai-hallucination-policy-customer-service/

An AI Customer Service Chatbot Made Up a Company Policy—and Created a Mess

When an AI model for code-editing company Cursor hallucinated a new rule, users revolted.

WIRED

Addendum: I was wrong!

Original Post:

@dalias @servo already the #documentation seems to be #AI #hallucinated garbage.

  • I.e. not useable at all...

And yes, undocumented, buggy and inefficient code will be the result, even if we don't think IP would be an issue...

Kevin Karhan :verified: (@[email protected])

@[email protected] @[email protected] apologies on my part as I seem to have conflated @[email protected] with some other project...

Infosec.Space

Yesterday I used Google translate to check if my email written in #dutch is actually corect.

It #hallucinated the hour of the meeting. Literally it changed the fucking time of the meeting which I created XDD

It's not usable anymore.

Double-check the #news you read, because #ArtificialIntelligence may have #hallucinated it. Apple Intelligence did so about the BBC. https://www.bbc.com/news/articles/cd0elzk24dno H/T to @grammargirl for pointing out the issue via AI Sidequest, at https://ai-sidequest.beehiiv.com/p/merriam-webster-expanding-thesaurus-with-ai.
BBC complains to Apple over misleading shooting headline

Apple's new artificial intelligence features falsely made it seem the BBC reported Luigi Mangione had shot himself.