Mastodawn

AbsenceBench: Language Models Can't Tell What's Missing

#HackerNews #AbsenceBench #LanguageModels #Missing #AIResearch #NaturalLanguageProcessing

AbsenceBench: Language Models Can't Tell What's Missing

Large language models (LLMs) are increasingly capable of processing long inputs and locating specific information within them, as evidenced by their performance on the Needle in a Haystack (NIAH) test. However, while models excel at recalling surprising information, they still struggle to identify clearly omitted information. We introduce AbsenceBench to assesses LLMs' capacity to detect missing information across three domains: numerical sequences, poetry, and GitHub pull requests. AbsenceBench asks models to identify which pieces of a document were deliberately removed, given access to both the original and edited contexts. Despite the apparent straightforwardness of these tasks, our experiments reveal that even state-of-the-art models like Claude-3.7-Sonnet achieve only 69.6% F1-score with a modest average context length of 5K tokens. Our analysis suggests this poor performance stems from a fundamental limitation: Transformer attention mechanisms cannot easily attend to "gaps" in documents since these absences don't correspond to any specific keys that can be attended to. Overall, our results and analysis provide a case study of the close proximity of tasks where models are already superhuman (NIAH) and tasks where models breakdown unexpectedly (AbsenceBench).

arXiv.org

Hacker News 13h ago

ELIZA Reanimated: Restoring the Mother of All Chatbots

https://www.computer.org/csdl/magazine/an/2025/02/11030922/27sQDLuL7Uc

#HackerNews #ELIZA #Reanimated #Chatbots #AI #NaturalLanguageProcessing #TechHistory #Innovation

CSDL | IEEE Computer Society

Hacker News 6d ago

Unsupervised Elicitation of Language Models

https://arxiv.org/abs/2506.10139

#HackerNews #Unsupervised #Elicitation #of #Language #Models #LanguageModels #AIResearch #NaturalLanguageProcessing #MachineLearning #HackerNews

Unsupervised Elicitation of Language Models

To steer pretrained language models for downstream tasks, today's post-training paradigm relies on humans to specify desired behaviors. However, for models with superhuman capabilities, it is difficult or impossible to get high-quality human supervision. To address this challenge, we introduce a new unsupervised algorithm, Internal Coherence Maximization (ICM), to fine-tune pretrained language models on their own generated labels, \emph{without external supervision}. On GSM8k-verification, TruthfulQA, and Alpaca reward modeling tasks, our method matches the performance of training on golden supervision and outperforms training on crowdsourced human supervision. On tasks where LMs' capabilities are strongly superhuman, our method can elicit those capabilities significantly better than training on human labels. Finally, we show that our method can improve the training of frontier LMs: we use our method to train an unsupervised reward model and use reinforcement learning to train a Claude 3.5 Haiku-based assistant. Both the reward model and the assistant outperform their human-supervised counterparts.

arXiv.org

PPC Land Jun 12

ICYMI: Microsoft launches AI analytics bridge for developer tools: Microsoft unveils Model Context Protocol server for Clarity analytics on June 4, enabling natural language queries through AI. https://ppc.land/microsoft-launches-ai-analytics-bridge-for-developer-tools/ #Microsoft #AI #Analytics #DeveloperTools #NaturalLanguageProcessing

Microsoft launches AI analytics bridge for developer tools

Microsoft unveils Model Context Protocol server for Clarity analytics on June 4, enabling natural language queries through AI.

PPC Land

rijo Jun 12

ICYMI: Microsoft launches AI analytics bridge for developer tools https://ppc.land/microsoft-launches-ai-analytics-bridge-for-developer-tools/ #Microsoft #AI #Analytics #DeveloperTools #NaturalLanguageProcessing

Blaze Trends Jun 11

Sam Altman Reveals ChatGPT Energy Use Per Interaction

#artificialintelligence #ChatGPT #energyconsumption #NaturalLanguageProcessing #SamAltman
https://blazetrends.com/sam-altman-reveals-chatgpt-energy-use-per-interaction/?fsp_sid=48970

Sam Altman Reveals ChatGPT Energy Use Per Interaction

ChatGPT's energy usage is a hot topic. Sam Altman, OpenAI's CEO, shared some insights on his blog. He says one interaction with ChatGPT uses about 0.000085

Blaze Trends

PPC Land Jun 9

Microsoft launches AI analytics bridge for developer tools: Microsoft unveils Model Context Protocol server for Clarity analytics on June 4, enabling natural language queries through AI. https://ppc.land/microsoft-launches-ai-analytics-bridge-for-developer-tools/ #Microsoft #AI #Analytics #DeveloperTools #NaturalLanguageProcessing

Microsoft launches AI analytics bridge for developer tools

Microsoft unveils Model Context Protocol server for Clarity analytics on June 4, enabling natural language queries through AI.

PPC Land

rijo Jun 9

Microsoft launches AI analytics bridge for developer tools https://ppc.land/microsoft-launches-ai-analytics-bridge-for-developer-tools/ #Microsoft #AI #Analytics #DeveloperTools #NaturalLanguageProcessing

Show thread

Valeriy M., PhD, MBA, CQF Jun 8

📄 Access the Full Paper: Response Quality Assessment for Retrieval-Augmented Generation via Conditional Conformal Factuality
#ConformalPrediction #RetrievalAugmentedGeneration #UncertaintyQuantification #AIResearch #NaturalLanguageProcessing

Blaze Trends Jun 6

WhatsApp to Create Custom ChatGPT in Seconds for Free

#artificialintelligence #ChatGPT #Meta #NaturalLanguageProcessing #whatsapp
https://blazetrends.com/whatsapp-to-create-custom-chatgpt-in-seconds-for-free/?fsp_sid=45663

WhatsApp to Create Custom ChatGPT in Seconds for Free

Imagine having a personal assistant that can understand your unique needs and respond accordingly. That's what Meta AI is bringing to WhatsApp. The company is

Blaze Trends