Anthropic's widely circulated claim that AI could perform 80% of job tasks rests on a 2023 study with significant limitations. The "theoretical capability" metric comes from research by OpenAI, OpenResearch, and the University of Pennsylvania, which used annotators unfamiliar with specific occupations to estimate where LLMs might reduce task time by at least 50%. Critically, the study makes no timeline predictions and relies on speculative assumptions about "anticipated LLM-powered software" that doesn't yet exist. Current observed AI exposure remains "a fraction of what's feasible," Anthropic acknowledges. https://arstechnica.com/ai/2026/03/how-did-anthropic-measure-ais-theoretical-capabilities-in-the-job-market/ #AIagent #AI #GenAI #AIResearch #Workforce
How did Anthropic measure AI's "theoretical capabilities" in the job market?

2023 study made a lot of assumptions about future "anticipated LLM-powered software."

Ars Technica

After 2 years researching AI-generated misinformation (4 papers at WWW '26), I'm expanding into agentic AI.

Same core question, harder version: how do we maintain trust when AI acts autonomously? When an agent sends emails or books meetings on your behalf, how do you verify it did what you intended?

Open questions: verification at action time, trust calibration, safe agent-tool interface design.

https://alexloth.com/from-misinformation-to-agentic-ai-research-direction-2/

#AgenticAI #AIResearch #Trust

From Misinformation to Agentic AI: Where My Research Is Heading

After two years researching how AI generates misinformation, I am expanding into agentic AI systems. The trust questions are similar but harder: when agents act autonomously, how do we verify intent, calibrate trust, and maintain oversight?

alexloth.com
Liquid AI has released LFM2.5-350M, a compact 350M parameter model trained on 28 trillion tokens that outperforms models more than twice its size. The model uses a hybrid LIV architecture supporting a 32k context window while maintaining a lean memory footprint. https://www.marktechpost.com/2026/03/31/liquid-ai-released-lfm2-5-350m-a-compact-350m-parameter-model-trained-on-28t-tokens-with-scaled-reinforcement-learning/ #AIagent #AI #GenAI #AIResearch #LiquidAI
Liquid AI Released LFM2.5-350M: A Compact 350M Parameter Model Trained on 28T Tokens with Scaled Reinforcement Learning

Liquid AI Released LFM2.5-350M: A Compact 350M Parameter Model Trained on 28T Tokens with Scaled Reinforcement Learning

MarkTechPost
Meta's semi-formal reasoning boosts LLM code review accuracy to 93%. The technique requires AI agents to document premises, trace execution paths, and derive formal conclusions before answering, cutting hallucinations. Tests showed accuracy rising from 78% to 88% on complex examples. https://venturebeat.com/orchestration/metas-new-structured-prompting-technique-makes-llms-significantly-better-at #AIagent #AI #GenAI #AIResearch #Meta

Python Trending (@pythontrending)

Matrix-Game 3.0์ด ๊ณต๊ฐœ๋˜์—ˆ์Šต๋‹ˆ๋‹ค. ์‹ค์‹œ๊ฐ„ ์ŠคํŠธ๋ฆฌ๋ฐ ๋ฐฉ์‹์˜ ์ธํ„ฐ๋ž™ํ‹ฐ๋ธŒ ์›”๋“œ ๋ชจ๋ธ๋กœ, ์žฅ๊ธฐ ๊ธฐ์–ต(long-horizon memory)์„ ์ง€์›ํ•˜๋Š” ๊ฒƒ์ด ํ•ต์‹ฌ์ž…๋‹ˆ๋‹ค. AI ์‹œ๋ฎฌ๋ ˆ์ด์…˜๊ณผ ์›”๋“œ ๋ชจ๋ธ ์—ฐ๊ตฌ์—์„œ ์ฃผ๋ชฉํ•  ๋งŒํ•œ ์ƒˆ ๊ธฐ์ˆ ์ž…๋‹ˆ๋‹ค.

https://x.com/pythontrending/status/2038938236764946537

#worldmodel #matrixgame #airesearch #longtermmemory #interactiveai

Python Trending ๐Ÿ‡บ๐Ÿ‡ฆ (@pythontrending) on X

Matrix-Game - Matrix-Game 3.0: Real-Time and Streaming Interactive World Model with Long-Horizon Memory https://t.co/urKh6DKnSX

X (formerly Twitter)
๐Ÿ›ด So, Boris spends 3 years commuting on Lime bikes in London and now discovers Lime is actually a data company. ๐Ÿšดโ€โ™‚๏ธ๐Ÿ” The real shocker? He thinks combining #GDPR and some AI will turn his daily pedaling into groundbreaking research. ๐Ÿคฆโ€โ™‚๏ธ๐Ÿ“Š
https://ktoya.me/lime-data-company/ #LimeBikes #LondonData #AIResearch #CommutingInsights #HackerNews #ngated
Lime is a Data Company - Boris Starkov

Using Claude to analyse 3 years of my daily lime bike commute in London

Nomadic raises 8.4 million USD to wrangle the data pouring off autonomous vehicles. The startup builds a platform that turns AV footage into structured, searchable datasets using vision language models, helping companies find rare edge-case footage for training self-driving systems. Founded by Harvard CS grads who met at Lyft and Snowflake, the company won first prize at Nvidia GTC's pitch contest. https://techcrunch.com/2026/03/31/nomadic-raises-8-4-million-to-wrangle-the-data-pouring-off-avs/ #AIagent #AI #GenAI #AIResearch #Nomadic
Nomadic raises $8.4 million to wrangle the data pouring off autonomous vehicles | TechCrunch

The company turns footage from robots into structured, searchable datasets with a deep learning model.

TechCrunch

khazzz1c (@Imkhazzz1c)

Transformer Circuits์—์„œ attribution graphs ๋ฐฉ๋ฒ•๋ก ์„ ๊ณต๊ฐœํ•˜๋ฉฐ, ๋ชจ๋ธ ๋‚ด๋ถ€ ํ•ด์„๊ณผ ์›์ธ ์ถ”์ ์„ ์œ„ํ•œ ์ƒˆ ๋ถ„์„ ๊ธฐ๋ฒ•์„ ์†Œ๊ฐœํ–ˆ๋‹ค. AI ๋ชจ๋ธ์˜ ๋™์ž‘์„ ๋” ์ •๋ฐ€ํ•˜๊ฒŒ ์ดํ•ดํ•˜๋ ค๋Š” ์—ฐ๊ตฌ์ž์™€ ๊ฐœ๋ฐœ์ž์—๊ฒŒ ์œ ์šฉํ•œ ๊ธฐ์ˆ ์  ์ž๋ฃŒ๋‹ค.

https://x.com/Imkhazzz1c/status/2038881239923564763

#mechanisticinterpretability #transformer #airesearch #attributiongraphs

khazzz1c (@Imkhazzz1c) on X

https://t.co/MUnBYPnMHA soooolid work

X (formerly Twitter)
Mantis Biotech is building digital twins of the human body using AI to solve medicine's data problem. The startup combines data from textbooks, motion capture, biometric sensors and medical imaging, then runs it through a physics engine to create predictive models for medical research, surgical training and injury prevention. https://techcrunch.com/2026/03/30/mantis-biotech-is-making-digital-twins-of-humans-to-help-solve-medicines-data-availability-problem/ #AIagent #AI #GenAI #AIResearch #Mantis
Mantis Biotech is making 'digital twins' of humans to help solve medicine's data availability problem | TechCrunch

Mantis takes disparate sources of data to make synthetic datasets that can be used to build so-called "digital twins" of the human body, representing anatomy, physiology and behavior.

TechCrunch