
AbsenceBench: Language Models Can't Tell What's Missing
Large language models (LLMs) are increasingly capable of processing long inputs and locating specific information within them, as evidenced by their performance on the Needle in a Haystack (NIAH) test. However, while models excel at recalling surprising information, they still struggle to identify clearly omitted information. We introduce AbsenceBench to assesses LLMs' capacity to detect missing information across three domains: numerical sequences, poetry, and GitHub pull requests. AbsenceBench asks models to identify which pieces of a document were deliberately removed, given access to both the original and edited contexts. Despite the apparent straightforwardness of these tasks, our experiments reveal that even state-of-the-art models like Claude-3.7-Sonnet achieve only 69.6% F1-score with a modest average context length of 5K tokens. Our analysis suggests this poor performance stems from a fundamental limitation: Transformer attention mechanisms cannot easily attend to "gaps" in documents since these absences don't correspond to any specific keys that can be attended to. Overall, our results and analysis provide a case study of the close proximity of tasks where models are already superhuman (NIAH) and tasks where models breakdown unexpectedly (AbsenceBench).
arXiv.orgCSDL | IEEE Computer Society

Unsupervised Elicitation of Language Models
To steer pretrained language models for downstream tasks, today's post-training paradigm relies on humans to specify desired behaviors. However, for models with superhuman capabilities, it is difficult or impossible to get high-quality human supervision. To address this challenge, we introduce a new unsupervised algorithm, Internal Coherence Maximization (ICM), to fine-tune pretrained language models on their own generated labels, \emph{without external supervision}. On GSM8k-verification, TruthfulQA, and Alpaca reward modeling tasks, our method matches the performance of training on golden supervision and outperforms training on crowdsourced human supervision. On tasks where LMs' capabilities are strongly superhuman, our method can elicit those capabilities significantly better than training on human labels. Finally, we show that our method can improve the training of frontier LMs: we use our method to train an unsupervised reward model and use reinforcement learning to train a Claude 3.5 Haiku-based assistant. Both the reward model and the assistant outperform their human-supervised counterparts.
arXiv.orgICYMI: Microsoft launches AI analytics bridge for developer tools: Microsoft unveils Model Context Protocol server for Clarity analytics on June 4, enabling natural language queries through AI.
https://ppc.land/microsoft-launches-ai-analytics-bridge-for-developer-tools/ #Microsoft #AI #Analytics #DeveloperTools #NaturalLanguageProcessing
Microsoft launches AI analytics bridge for developer tools
Microsoft unveils Model Context Protocol server for Clarity analytics on June 4, enabling natural language queries through AI.
PPC Land
Sam Altman Reveals ChatGPT Energy Use Per Interaction
ChatGPT's energy usage is a hot topic. Sam Altman, OpenAI's CEO, shared some insights on his blog. He says one interaction with ChatGPT uses about 0.000085
Blaze TrendsMicrosoft launches AI analytics bridge for developer tools: Microsoft unveils Model Context Protocol server for Clarity analytics on June 4, enabling natural language queries through AI.
https://ppc.land/microsoft-launches-ai-analytics-bridge-for-developer-tools/ #Microsoft #AI #Analytics #DeveloperTools #NaturalLanguageProcessing
Microsoft launches AI analytics bridge for developer tools
Microsoft unveils Model Context Protocol server for Clarity analytics on June 4, enabling natural language queries through AI.
PPC Land📄 Access the Full Paper: Response Quality Assessment for Retrieval-Augmented Generation via Conditional Conformal Factuality
#ConformalPrediction #RetrievalAugmentedGeneration #UncertaintyQuantification #AIResearch #NaturalLanguageProcessing
WhatsApp to Create Custom ChatGPT in Seconds for Free
Imagine having a personal assistant that can understand your unique needs and respond accordingly. That's what Meta AI is bringing to WhatsApp. The company is
Blaze Trends