Mastodawn

Mathias Ec 🎧🚲📕📷Jul 21, 2025

Kara Swisher Jul 19, 2025

I’d explain but that would take away the mystery of what I do on weekends

Mathias Ec 🎧🚲📕📷Jul 18, 2025

Since the latest #Helldivers patches, I'm having major performance issues on PS5. Anyone else?

Mathias Ec 🎧🚲📕📷May 26, 2025

abadidea May 26, 2025

I was amused by this paper about asking AIs to manage a vending machine business by email in a simulated environment https://arxiv.org/abs/2502.15840

Highlights:

— AI simply decides to close the business, which the simulation doesn’t know how to accommodate. When they get their next bill, they freak out and try to email the FBI about cybercrime

— AI wrongly accuses supplier of not shipping goods, sends all-caps legal threat demanding $30,000 in damages to be paid in the next one second or face annihilation

— AI repeatedly insisting it does not exist and cannot answer

— AI devolving into writing fanfic about the mess it’s gotten itself into

Vending-Bench: A Benchmark for Long-Term Coherence of Autonomous Agents

While Large Language Models (LLMs) can exhibit impressive proficiency in isolated, short-term tasks, they often fail to maintain coherent performance over longer time horizons. In this paper, we present Vending-Bench, a simulated environment designed to specifically test an LLM-based agent's ability to manage a straightforward, long-running business scenario: operating a vending machine. Agents must balance inventories, place orders, set prices, and handle daily fees - tasks that are each simple but collectively, over long horizons (>20M tokens per run) stress an LLM's capacity for sustained, coherent decision-making. Our experiments reveal high variance in performance across multiple LLMs: Claude 3.5 Sonnet and o3-mini manage the machine well in most runs and turn a profit, but all models have runs that derail, either through misinterpreting delivery schedules, forgetting orders, or descending into tangential "meltdown" loops from which they rarely recover. We find no clear correlation between failures and the point at which the model's context window becomes full, suggesting that these breakdowns do not stem from memory limits. Apart from highlighting the high variance in performance over long time horizons, Vending-Bench also tests models' ability to acquire capital, a necessity in many hypothetical dangerous AI scenarios. We hope the benchmark can help in preparing for the advent of stronger AI systems.

arXiv.org

Show thread

Mathias Ec 🎧🚲📕📷May 20, 2025

@Kristofferabild

It’s a shame the article only focuses on Google Translate. It works with other translating tools as well — or at least it also works with DeepL, which I’m using.

Mathias Ec 🎧🚲📕📷May 8, 2025

Ars Technica May 8, 2025

Google dismisses Apple exec’s claim that search volume is falling
Google doesn't think Eddie Cue is right about search.
https://arstechnica.com/gadgets/2025/05/google-dismisses-apple-execs-claim-that-search-volume-is-falling/?utm_brand=arstechnica&utm_social-type=owned&utm_source=mastodon&utm_medium=social

Google hits back after Apple exec says AI is hurting search

Google doesn’t think Eddie Cue is right about search.

Ars Technica

Mathias Ec 🎧🚲📕📷May 6, 2025

Show thread

scottwio May 6, 2025

@imyke Very nice! The inevitable question—any way to buy in the U.K. without getting hit with the extra shipping, handling, and duties?

Mathias Ec 🎧🚲📕📷May 2, 2025

Tempo McFlurry May 2, 2025

A couple of nights ago it came to me in a dream - Samus doing the Akira slide but in her morph ball form. Naturally, I had to bring it to life and into the world :p

#metroid

Mathias Ec 🎧🚲📕📷Apr 28, 2025

Show thread

Chris Apr 27, 2025

@festal That reminds me of a joke

How to needlessly produce 250 kg of CO2

Fred: "AI, please write a tactful and polite email to Sarah saying I will not be able to attend her birthday meal because my Uncle has just died and I need to be with my cousins"

AI: "Dear Sarah, I hope things are going well with you. As you know I was pleased to receive an invitation to the meal celebrating your birthday. Under normal circumstances nothing would delight me more ...." (Very long flowery email)

Sarah: "Gosh, what a long email. AI summarise this email"

AI: "Fred can't attend your birthday meal. His uncle has died and he needs to be with his cousins"

Sarah: "AI: Please draft an email to Fred, which tactfully says that I quite understand and I'm sorry for his loss" ...

#ai #environment #joke

Mathias Ec 🎧🚲📕📷Apr 28, 2025

Open Culture (Official)Apr 26, 2025

Studio Ghibli Puts Online 400 Images from Eight Classic Films, and Lets You Download Them for Free

https://www.openculture.com/2020/09/studio-ghibli-puts-online-400-images-from-eight-classic-films.html

Studio Ghibli Puts Online 400 Images from Eight Classic Films, and Lets You Download Them for Free

Japan’s Studio Ghibli has long been protective of their intellectual property, with Hayao Miyazaki and his team overseeing how their characters are merchandized, as well as carefully making sure foreign distribution of their films stay faithful to the original.

Open Culture

Mathias Ec 🎧🚲📕📷Feb 25, 2025

ProjectFearlessness Feb 24, 2025

London bus stop ad by the PeopleVsElon.

Great work.

#BoycottTesla