Ian

@bestdeadends@toot.wales
97 Followers
99 Following
488 Posts
#Cymro/adopted Scot in #Sweden. Energy efficiency researcher.
Writer and #PicMeUps photographer, ever-curious what's at the end of remote roads.
#NotFlying | #EuropeanRail | assertive #cyclist | #MilitantPedestrian
Header: in case the revolution is televised
bloghttps://bestdeadends.wordpress.com
@richardnosworthy
Canfranc–Huesca line in the central Pyrenees…
https://mastodon.social/@claudsch/114604082097392343
Trump vs Harvard: It's the fascism, stupid iandunt.substack.com/p/trump-vs-h...

Today I have been mostly discovering just how much Swedish people love their motor vehicles. (When this saga is done, I may explain exactly what I discovered and how.)

Separately, out on my bike today, another instance of eye contact saves your life. When a car should give way to you as a cyclist, ALWAYS LOOK DIRECTLY AT THE DRIVER until you are certain they have seen you. Guy drove onto a roundabout without even once looking in my direction, let alone noticing me.
#BikeTooter

Fascinating interviews with Russian journalists who never emigrated and stuck with the profession. Their harsh criticisms of the exiled independent press are profound — a real gut punch. Russian & Western audiences seem to have completely opposite demands & expectations. https://meduza.io/en/feature/2025/05/29/there-s-no-such-thing-as-safety

Sir John Betjeman, CBE, has been dropped from the village poetry festival for calling for the bombing of Slough.

Organisers said that his appearance would have required a massive police operation.

Betjeman is due to appear in Windsor Magistrates Court in July on terrorism-related offences.

https://www.best-poems.net/john_betjeman/slough.html

Slough poem - John Betjeman

Come friendly bombs and fall on Slough! It isn't fit for humans now, There isn't grass to graze a cow. Swarm over, Death! Come, bombs and blow to smithereens Those air -conditioned, bright canteens,

Beware of geeks bearing grifts.
Email today from my Swedish mobile provider about the nationwide switch off of 3G this year.
Despite my 2017 #Fairphone2 having 4G, looks like this is its death knell. Due to lack of VoLTE capability, it'll kill making/receiving Swedish calls and SMS.
But weirdly it can still roam on Telia's staying-in-service-for-now 2G network with its UK SIM.
Guess I'll be in line for the #Fairphone6 that many are expecting to be announced soon. Or a bargain/secondhand #Fairphone4 or #Fairphone5.

I was amused by this paper about asking AIs to manage a vending machine business by email in a simulated environment https://arxiv.org/abs/2502.15840

Highlights:

— AI simply decides to close the business, which the simulation doesn’t know how to accommodate. When they get their next bill, they freak out and try to email the FBI about cybercrime

— AI wrongly accuses supplier of not shipping goods, sends all-caps legal threat demanding $30,000 in damages to be paid in the next one second or face annihilation

— AI repeatedly insisting it does not exist and cannot answer

— AI devolving into writing fanfic about the mess it’s gotten itself into

Vending-Bench: A Benchmark for Long-Term Coherence of Autonomous Agents

While Large Language Models (LLMs) can exhibit impressive proficiency in isolated, short-term tasks, they often fail to maintain coherent performance over longer time horizons. In this paper, we present Vending-Bench, a simulated environment designed to specifically test an LLM-based agent's ability to manage a straightforward, long-running business scenario: operating a vending machine. Agents must balance inventories, place orders, set prices, and handle daily fees - tasks that are each simple but collectively, over long horizons (>20M tokens per run) stress an LLM's capacity for sustained, coherent decision-making. Our experiments reveal high variance in performance across multiple LLMs: Claude 3.5 Sonnet and o3-mini manage the machine well in most runs and turn a profit, but all models have runs that derail, either through misinterpreting delivery schedules, forgetting orders, or descending into tangential "meltdown" loops from which they rarely recover. We find no clear correlation between failures and the point at which the model's context window becomes full, suggesting that these breakdowns do not stem from memory limits. Apart from highlighting the high variance in performance over long time horizons, Vending-Bench also tests models' ability to acquire capital, a necessity in many hypothetical dangerous AI scenarios. We hope the benchmark can help in preparing for the advent of stronger AI systems.

arXiv.org
But soft! What light through yonder window breaks?
In part, because they recognise that a key part of their value proposition to their enterprise grade corporate clients is *accountability shield*.
Employees would be outraged if their employer proposed screengrabbing their desktop every three seconds. But this is just an operating system default behaviour, right.