fly51fly (@fly51fly)

논문 'Empty Shelves or Lost Keys? Recall Is the Bottleneck for Parametric Factuality'는 파라메트릭 사실성(parametric factuality) 문제에서 '기억(recall)'이 병목임을 주장한다. Google Research와 Technion 연구진이 arXiv(2026)에 공개한 연구로, 모델 내부 지식 검색/회수 능력 개선의 필요성을 강조한다.

https://x.com/fly51fly/status/2023871638702616579

#factuality #llm #research #recall

fly51fly (@fly51fly) on X

[CL] Empty Shelves or Lost Keys? Recall Is the Bottleneck for Parametric Factuality N Calderon, E Ben-David, Z Gekhman, E Ofek... [Google Research & Technion] (2026) https://t.co/OtvTyfLVKv

X (formerly Twitter)

Artificial Analysis (@ArtificialAnlys)

AA-Omniscience는 Artificial Analysis가 만든 벤치마크로, 6,000문항과 42개의 경제 관련 주제를 포함한 평가 세트를 통해 사실성(factual recall) 및 환각(hallucination)을 측정합니다. 평가 대상 도메인은 비즈니스, 보건, 법률, 소프트웨어 공학, 인문·사회, 과학/공학/수학 등 6개 영역입니다.

https://x.com/ArtificialAnlys/status/2008570655047118914

#aaomniscience #benchmark #factuality #hallucination

Artificial Analysis (@ArtificialAnlys) on X

AA-Omniscience is our benchmark measuring factual recall and hallucination across 6,000 questions covering 42 economically relevant topics within 6 domains (Business, Health, Law, Software Engineering, Humanities & Social Sciences, and Science/Engineering/Mathematics). Models

X (formerly Twitter)

According to Timothy Snyder's "On Freedom" (2024) there are these five forms of freedom:

- Sovereignty
- Unpredictability
- Mobility
- Factuality
- Solidarity

We need all of them. #freedom #sovereignty #unpredictability #mobility #factuality #solidarity

Google’s new FACTS benchmark reveals a 70% factuality ceiling across four rigorous tests, from grounding to multimodal and search scenarios. Even Gemini 3 Pro struggles to break the barrier, highlighting limits for large‑language‑models on Kaggle‑style tasks. Dive into the data and see what this means for open‑source AI research. #FACTSbenchmark #Gemini3Pro #GroundingBenchmark #Factuality

🔗 https://aidailypost.com/news/googles-facts-benchmark-shows-70-factuality-ceiling-across-four-tests

🚨 We're hiring!
As part of the ARMADA MSCA Doctoral Network, my group at #TUWien is offering a fully funded PhD position on:
➡️ Knowledge-Graph driven Factuality and Explainability

📍 Based in Vienna, Austria
🌍 Includes secondments, training schools, and a strong European research network
📅 Apply by: June 30, 2025 (mobility rules apply)

Details and how to apply:
👉 https://dmki-tuwien.github.io/jobs.html#armada
🌐 Full ARMADA call with all 15 PhD topics: https://armada-dn.eu/call

#LLM #KnowledgeGraphs #Factuality

DMKI Lab - Jobs

GitHub - Libr-AI/ #OpenFactVerification

#Loki is our open-source solution designed to automate the process of verifying #factuality. It provides a comprehensive pipeline for dissecting long texts into individual claims, assessing their worthiness for #verification, generating queries for evidence search, crawling for evidence, and ultimately verifying the claims

> interesting, does anyone have any experience with this tool?
#ai

https://github.com/Libr-AI/OpenFactVerification

GitHub - Libr-AI/OpenFactVerification: Loki: Open-source solution designed to automate the process of verifying factuality

Loki: Open-source solution designed to automate the process of verifying factuality - Libr-AI/OpenFactVerification

GitHub

How can we improve LM factuality and editability with nothing but the LM itself?

Introducing Deductive Closure Training (DCT):

1. generate statements and their implications
2. identify a logically consistent subset
3. distill this subset back to LM

https://lingo-mit.github.io/deductive-closure/

#NLP #modelEditing #LLM #LLMs #data #bias #factuality

Deductive Closure Training of Language Models for Coherence, Accuracy, and Updatability

Project page for the Deductive Closure Training.

Deductive Closure Training of Language Models for Coherence, Accuracy, and Updatability

See the sheer joy of my collaborators at #ACL2023 when 🤩
DissentQA
won best Paper AC award

This is a happy outcome of the fruitful collaboration with a group of wonderfully friendly people

https://arxiv.org/abs/2211.05655
#nlproc #machinelearning #Qa #factuality

DisentQA: Disentangling Parametric and Contextual Knowledge with Counterfactual Question Answering

Question answering models commonly have access to two sources of "knowledge" during inference time: (1) parametric knowledge - the factual knowledge encoded in the model weights, and (2) contextual knowledge - external knowledge (e.g., a Wikipedia passage) given to the model to generate a grounded answer. Having these two sources of knowledge entangled together is a core issue for generative QA models as it is unclear whether the answer stems from the given non-parametric knowledge or not. This unclarity has implications on issues of trust, interpretability and factuality. In this work, we propose a new paradigm in which QA models are trained to disentangle the two sources of knowledge. Using counterfactual data augmentation, we introduce a model that predicts two answers for a given question: one based on given contextual knowledge and one based on parametric knowledge. Our experiments on the Natural Questions dataset show that this approach improves the performance of QA models by making them more robust to knowledge conflicts between the two knowledge sources, while generating useful disentangled answers.

arXiv.org
False Promise of Imitating Proprietary LLMs: new paper argues that open source models imitating ChatGPT are less successful than perceived by human assessors because of imitating LLM style more successfully than content. Implies a need for more pre-training data, & improved evaluation data & method.
https://arxiv.org/pdf/2305.15717.pdf
#LLM #OpenSource #factuality #evaluation #assessment #chatgpt #AI #imitation #training #data #MachineLearning #safety #toxicity @machinelearning