Mastodawn

Marcel Salathé Feb 17, 2025

EPFL goes live with its own Mastodon server!

https://actu.epfl.ch/news/the-epfl-community-gets-a-mastodon-server/

The EPFL community gets a Mastodon server

EPFL has set up a Mastodon server for the School community, allowing members to post and share content in a way that’s aligned with the values of open science. We opted for Mastodon because independence and effective communication tools are critical.

Marcel Salathé Dec 21, 2024

I’m looking forward to the time when, thanks to #AI, it will be impossible to drive a car into a crowd of humans. 😞

Marcel Salathé Dec 19, 2024

This may be the most remarkable paragraph I’ve read about AI this year. It shows a level of self-awareness (in the technical sense) that’s just mind boggling.

Marcel Salathé Dec 19, 2024

Academic writing is getting harder to read - the humanities most of all

https://www.economist.com/science-and-technology/2024/12/18/academic-writing-is-getting-harder-to-read-the-humanities-most-of-all

Academic writing is getting harder to read—the humanities most of all

We analyse two centuries of scholarly work

The Economist

Marcel Salathé Dec 18, 2024

Context windows in #AI models are increasing massively, but this study suggests anything beyond 10,000 tokens and you're asking for trouble.

https://arxiv.org/abs/2406.10149

BABILong: Testing the Limits of LLMs with Long Context Reasoning-in-a-Haystack

In recent years, the input context sizes of large language models (LLMs) have increased dramatically. However, existing evaluation methods have not kept pace, failing to comprehensively assess the efficiency of models in handling long contexts. To bridge this gap, we introduce the BABILong benchmark, designed to test language models' ability to reason across facts distributed in extremely long documents. BABILong includes a diverse set of 20 reasoning tasks, including fact chaining, simple induction, deduction, counting, and handling lists/sets. These tasks are challenging on their own, and even more demanding when the required facts are scattered across long natural text. Our evaluations show that popular LLMs effectively utilize only 10-20\% of the context and their performance declines sharply with increased reasoning complexity. Among alternatives to in-context reasoning, Retrieval-Augmented Generation methods achieve a modest 60\% accuracy on single-fact question answering, independent of context length. Among context extension methods, the highest performance is demonstrated by recurrent memory transformers after fine-tuning, enabling the processing of lengths up to 50 million tokens. The BABILong benchmark is extendable to any length to support the evaluation of new upcoming models with increased capabilities, and we provide splits up to 10 million token lengths.

arXiv.org

Show thread

Marcel Salathé Dec 15, 2024

Something with OpenAI’s o1 pro is off, and I'm hearing similar stories from others. Makes you wonder: what happens when such AI confidence meets high-stakes situations like insurance claims or medical diagnoses?

Marcel Salathé Dec 15, 2024

Yesterday, I experienced something unsettling: an AI that refused to admit it was wrong about a simple piano piece. It kept insisting I was the one mistaken - about music I've played for 15 years.

Here's what happened:

https://engineeringprompts.substack.com/p/sorry-human-youre-wrong

Sorry Human, You're Wrong

AI models pushing back: A cautionary tale of the artificial confidence of o1 pro.

Prompt Engineering

Show thread

Marcel Salathé Dec 14, 2024

@mszll What the piece was

Marcel Salathé Dec 14, 2024

Confidently wrong: No model so far was able to answer this correctly. Not o1 pro, not Gemini advanced, not Claude Opus. The "better" the model, the more confident it was in its wrong answer.

At least Mistral and Claude Sonnet were able to say they didn't know.

This is a real issue. Most of us expect the better models to be more "aware" of possible mistakes. But that does not yet seem to be the case.

Marcel Salathé Dec 13, 2024

What do people use AI models for? These are the top 10 use cases on Claude.ai

Source: https://www.anthropic.com/research/clio

Clio: Privacy-preserving insights into real-world AI use

A blog post describing Anthropic’s new system, Clio, for analyzing how people use AI while maintaining their privacy

Substack	https://digitalepi.substack.com
LinkedIn	https://www.linkedin.com/in/salathe/
About me	https://www.digitalepidemiologylab.org/team/marcel-salathe
Book	https://www.digitalepibook.com/