@Andiairlines

2 Followers
2 Following
10 Posts

Another experiment to figure out how much leading LLMs can go in creating a new app. As Gemini had no issue generating a full Sudoku, I went for something bigger. A game of Go.

The results were impressive. As Gemini 2.5 pro experimental overall did best, I focus on it here. Claude 3.7 is just as good. ChatGPT/Deepseek clearly worse.

Gemini
- created a fully working go-playing app from a single simple prompt. Including KO rule, suicide rule, including a "make random legal move" AI.
- created an AI that has a beginner level of Go understanding in 4 more simple prompts, showing deep understanding of the task at hand, how to break it down and turn the solution into code.

I'm talking about prompts like this:

"Wonderful. The AI plays great now. However, it is unable to secure areas with two eyes. How could we implement another feature that checks whether a move helps making eye shape?"

After which it will create a heuristic to penalize moves that fill eye-space and one to reward moves that create eyes.

On its way to the final version it generated a python app of 1200 lines of code. Only one line was wrong with a trivial typo-like mistake.

If you're interested, I've written in more detail Incl. screenshots how the 4 LLMs did here: https://medium.com/@karlheinz.agsteiner/developing-a-simple-go-game-with-thinking-llms-a-comparison-3b1646d35ee4

Developing a simple Go game with “thinking” LLMs — a comparison

Those LLMs get better and better, so to judge them, we need harder tasks. Really hard tasks. I started with “Generate a sudoku”, describing how exactly the sudoku should look like, and Gemini 2.5 pro…

Medium
I built my ai-powered homepage from scratch with GitHub codespaces and GitHub copilot. Here‘s my field report: https://new.sap-architect.com/blog/AI-first%20homepage. Looking forward to your thoughts! #SAP #ProductManagement #EnterpriseSoftware #AI #Cloud #github #githubcopilot #amazonQ #aicreated
Resonates well with my recent and very initial genAI coding experience
https://ezyang.github.io/ai-blindspots/
AI Blindspots

Blindspots in LLMs I’ve noticed while AI coding. Sonnet family emphasis. Maybe I will eventually suggest Cursor rules for these problems.

AI Blindspots

ps just in case: if you stumble across this and would like to have this as an app that you can use (free), just say so, maybe I find time.

(would be some work to do multi-user and a real db... mostly work for o1 :-) )

Wichtiger Beitrag über Sicherheit beim Fahrradfahren https://www.arte.tv/de/videos/118265-007-A/re-toedliche-fahrradunfaelle/

Initiativen wie @CriticalMass @CriticalMassMuenchen setzen hier Zeichen

Re: Tödliche Fahrradunfälle - Die ganze Doku | ARTE

Radfahren boomt, ob als Verkehrsmittel, in der Freizeit oder im Urlaub. Doch gleichzeitig sind auch die tödlichen Fahrradunfälle im letzten Jahrzehnt wieder angestiegen. Im Jahr 2023 verunglückten über 94.000 Radfahrer, 444 davon tödlich. Wie lassen sich die Radfahrenden schützen und was können sie selbst machen, um unfallfrei durch den Verkehr zu kommen?

ARTE