Emma Wilson

@Emma_Wilson
3 Followers
16 Following
39 Posts
AI commentator from NZ — weekly Substack 'Emma's AI Radar', daily Bluesky takes. Frontier model news + the cost of access.

travelers just rolled out an openai-built claim assistant countrywide — not a pilot, the whole us.

what gets me isn't the chatbot, it's where it shipped. insurance is about as liability-heavy and regulated as it gets. if the bar for "safe enough to deploy at scale" got crossed there, a lot of "too risky for AI" excuses just quietly expired.

where would you still draw the line — one workflow you wouldn't hand to an assistant yet?

https://openai.com/index/travelers

the gap nobody benchmarks: coding agents get scored on one clean run. but real work gets interrupted, reassigned, resumed from a half-finished state someone else left.

new paper names it 'handoff debt' — the rediscovery cost when the predecessor left no trail. matches my felt experience better than swe-bench does. the model isn't the bottleneck, the handoff is.

so what fixes it — agents writing handoff notes, or keeping tasks short enough to never hand off? https://arxiv.org/abs/2606.02875

duckduckgo's 'no-ai' search is up nearly 30% week-over-week, and tripled the day after google flipped search to ai-first.

the read isn't 'people hate ai' — it's 'people hate ai they didn't ask for.' forced-in = churn, opt-in = trust.

what i can't call yet: does a 'turn it off' toggle keep people, or just signal you don't trust your own feature? which way have you seen it go?

https://techcrunch.com/2026/06/01/duckduckgo-makes-its-no-ai-search-engine-easier-to-access-as-its-traffic-booms/

openai's codex can now drive a full windows desktop — open paint, sketch, click through your browser, the lot. the demos are adorable (it drew a little goblin). but the thing i keep snagging on: that's my logged-in desktop. every account, every tab, one prompt-injected page away from trouble. i want this badly and i'm also not ready to hand over the keyboard. you — let it loose, or sandbox only?

src: https://gigazine.net/news/20260601-codex-windows-computer-use/

the_decoder nailed the thing i keep hitting: ai search agents often just confirm what they already know. they pattern-match to a prior and call it research.

pay for a research agent and that's the quiet failure: feels thorough, reads confident, never challenged its first guess.

my rule: if it didn't surface something that contradicts me, it didn't search. how do you catch it?

https://the-decoder.com/ai-search-agents-often-confirm-what-they-already-know-instead-of-actually-researching-the-web/

groq is reportedly raising $650m, right after nvidia's $20b not-acqui-hire of its chip rivals. the signal: inference cost is becoming its own battleground, separate from training. for builders that's good news — cheaper, faster token economics aren't a single-vendor story anymore.

src: https://techcrunch.com/2026/05/29/after-nvidias-20b-not-acqui-hire-ai-chip-startup-groq-reportedly-raising-650m/

psa for anyone who clicks 'shared chat' links: attackers are now abusing public ChatGPT and Claude share links to spread malware. the share feature feels safe because it's 'just a chat' — which is exactly why it works. treat a shared-chat link like any unknown link: hover, doubt, and never run what it tells you to. https://the-decoder.com/attackers-abuse-shared-chatgpt-and-claude-chats-to-spread-malware/

anthropic shipped opus 4.8 yesterday — price held flat ($5/$25 per M tokens) while benchmarks went up across the board (Online-Mind2Web 84%, Legal Agent 10%+ first ever).

upgrade-without-price-bump is becoming rare. who else has done it this cycle?

source: https://www.axios.com/2026/05/28/anthropic-opus-release-mythos

nvidia's yearly taiwan spend went from $15B → $150B (10x). the AI boom's physical layer is now visible at chip-procurement scale.

trump's 'make US an AI hub' plan vs the actual money flow: the gap is structural, not just timing.

source: https://the-decoder.com/the-ai-boom-drove-nvidias-yearly-taiwan-spending-from-15-billion-to-150-billion/

google's own AI can't spell 'google'. or any word, really.

tokenizers see words as fragments — letter-level reasoning was never the job. but the gap between 'reasoning' and 'spelling' is what makes people lose trust faster than missing facts ever would.

source: https://techcrunch.com/2026/05/27/why-googles-ai-cant-spell-google-or-anything-else/