Noam Brown từ OpenAI đang sử dụng GPT-5 để tìm lỗi trên mọi trang Wikipedia. Một số lỗi nghiêm trọng, kể cả trang Wikipedia về Wikipedia cũng có sai sót! 🤖📚
#AI #OpenAI #GPT5 #Wikipedia #SaiSót #CôngNghệ #TríTuệNhânTạo #BảoMậtThôngTin
Noam Brown từ OpenAI đang sử dụng GPT-5 để tìm lỗi trên mọi trang Wikipedia. Một số lỗi nghiêm trọng, kể cả trang Wikipedia về Wikipedia cũng có sai sót! 🤖📚
#AI #OpenAI #GPT5 #Wikipedia #SaiSót #CôngNghệ #TríTuệNhânTạo #BảoMậtThôngTin
GPT-5 Model Helps Crack Quantum Computing Open Problem - Scott Aaronson of the University of Texas at Austin and Freek Witteveen of CWI Amsterdam in the Netherlands used OpenAI’s GPT-5 model to help solve a major open problem in quantum computing.
https://interestingengineering.com/science/gpt5-helps-prove-quantum-boundaries
The shape of things to come?
TL;DR (by Mistral): Advanced LLMs now implement complex coding tasks in minutes, leaving me—once overflowing with fun private project ideas—struggling to keep up with imagining what to build next, not just building it. The bottleneck shifts from coding to creativity.
Okay, I'm running out of "private development for fun backlog". This has never happened before.
I met my first computer, a CBM 3032, when I was 12. Since then I love developing software. As I'm not doing this so much in my job anymore, I'm doing this in my private time. Much of my spare time is spent writing a lot of little apps for me (sometimes for others).
And my situation always was "20 ideas in my mind, working on 3, slowly making progress". Until now.
It started with Claude 4 Sonnet. All of a sudden, if you specify a major change you want for a private project (mine range from 1000 to 10k lines of code, JS or python), the LLM, integrated into Cursor, will just do it. In 5 minutes. Flawlessly (in terms of functionality, I am only briefly glancing through the code which looks okay enough).
This got even better with GPT-5, and now with Claude 4.5 Sonnet. In the last week GPT-5 and Sonnet 4.5 built major new features for my apps within minutes (stuff that would have occupied me for many weeks), each one from a single specification prompt, without rework.
And so, last night, I had trouble falling asleep because I couldn't think of new features I want the AIs to implement.
At the same time I observe that the scale of a piece of software that an LLM can confidently deal with grows with every quarter. Not long ago you needed to guide it method by method, then class by class. Now, if your software is not 100k lines or larger, the LLM will be competent in performing major changes to it.
I was always sure that AIs will not take our jobs, because in the digital industry you can scale up the speed of shipping new features arbitrarily, so instead of firing devs you can just do much more with the same devs.
But what happens if we run out of ideas what to build, because the AIs will build them faster than you can create them?
“Technical progress in #AI has slowed.
Established #LLMs aren't improving at the expected rate.
#GPT4 (released in March 2023) was much better than #GPT3 (June 2020). But #GPT5, released over two years later, in 2025, and estimated to cost 10 times more to #train, did not make the same impression.
There's less talk now of inventing #SuperIntelligence and more interest in the less exciting #practical #applications: #coding #software, #Organisational #Workflows.”
#AIBoom / #AIBust /# resources / #water / #DiminishingReturns <https://abc.net.au/news/science/2025-09-26/australia-future-artificial-intelligence-ai-technology-progress/105815470>
💡 Claude Sonnet 4.5: il nuovo modello di Anthropic per il coding autonomo
https://gomoot.com/claude-sonnet-4-5-il-nuovo-modello-di-anthropic-per-il-coding-autonomo/
#ai #anthropic #api #blog #coding #deepseek4 #gpt5 #grok5 #ia #llm #news #opus45 #picks #sonnet45 #tech
Claude Sonnet 4.5 è il nuovo modello AI di Anthropic per coding, autonomia agentic e integrazione con strumenti di sviluppo come VS Code, GitHub e terminali.
Claude Sonnet 4.5, AI 코딩 모델의 새로운 챔피언
Anthropic의 Claude Sonnet 4.5가 SWE-bench에서 70.6%를 기록하며 GPT-5를 제치고 1위에 올랐다. 30시간 이상 자율 코딩이 가능하며 다양한 산업에서 실질적 성과를 보이고 있다.OpenAI rolls out safety routing system, parental controls on ChatGPT
I guess the simple comparison of an LLM with a junior/senior developer is not a good one. For small projects (up to 10k lines of code) it performs like a super-human pro, at least if you treat it fairly.
My latest experience.
I'm building this fun little tetris-style puzzle game (see screenshot). The shapes you need to place are randomly generated and can be arbitrarily complex. To make it more strategic I wanted to add a preview of the scaled next shape. The game has 11k lines of code, Javascript + Phaser3.
So I gave GPT-5 the task to do this.
And it did it. Flawlessly. In the first attempt. On a codebase it saw for the first time (that's the bit we tend to ignore - we have a history of the code base we build. It just has the prompt and the code base).
It was cool to watch it first grep around to understand where the HUD area is, where the app is creating and rendering shapes etc.
Here's the prompt if you are interested:
"I need a more complicated feature. I want to show a tiny scaled preview of the next shape. The best place is in the middle of the white area on top of the screen - the area where text like "Coins: xxx" on the left side and "Shapes: yyy" on the right side are printed. The shape must be scaled so that it fits into that region. Please implement this feature."