@Andiairlines

2 Followers
2 Following
10 Posts

Another experiment to figure out how much leading LLMs can go in creating a new app. As Gemini had no issue generating a full Sudoku, I went for something bigger. A game of Go.

The results were impressive. As Gemini 2.5 pro experimental overall did best, I focus on it here. Claude 3.7 is just as good. ChatGPT/Deepseek clearly worse.

Gemini
- created a fully working go-playing app from a single simple prompt. Including KO rule, suicide rule, including a "make random legal move" AI.
- created an AI that has a beginner level of Go understanding in 4 more simple prompts, showing deep understanding of the task at hand, how to break it down and turn the solution into code.

I'm talking about prompts like this:

"Wonderful. The AI plays great now. However, it is unable to secure areas with two eyes. How could we implement another feature that checks whether a move helps making eye shape?"

After which it will create a heuristic to penalize moves that fill eye-space and one to reward moves that create eyes.

On its way to the final version it generated a python app of 1200 lines of code. Only one line was wrong with a trivial typo-like mistake.

If you're interested, I've written in more detail Incl. screenshots how the 4 LLMs did here: https://medium.com/@karlheinz.agsteiner/developing-a-simple-go-game-with-thinking-llms-a-comparison-3b1646d35ee4

Developing a simple Go game with “thinking” LLMs — a comparison

Those LLMs get better and better, so to judge them, we need harder tasks. Really hard tasks. I started with “Generate a sudoku”, describing how exactly the sudoku should look like, and Gemini 2.5 pro…

Medium

@karlheinz_agsteiner

5. **Question-Based**
- *Best for research/exploration* → Unsuitable for direct code gen without follow-ups.

**Best for your case (mood-tracking word cloud):**
- **Hybrid** = *User Story + Technical Spec* (e.g., "As a user, I want [...] [requirements]. Implement with: 1. Canvas 1000x1000px, 2. Mood colors using [...], 3. Non-overlap logic [...]").

IMHO quite a useful guideline

@karlheinz_agsteiner

2. **User Story**
- *Best for aligning code with user needs* → Requires additional technical clarification for full code gen.

3. **Example-Driven**
- *Best for replicating visual/behavioral patterns* → Great for UI code if reference examples are provided.

4. **Goal-Oriented** ("Outcome First")
- *Best for high-level architecture* → Leaves gaps in low-level implementation details.

Tbc

@karlheinz_agsteiner cool, thanks! First prompt is almost pseudo code. Now I get it why you focused more heavily on goal-oriented prompting 🤩 but then there are limits to what one can state as desired behavior. I asked DeepSeek for suitable prompting styles for code generation and it came back with the following classification:

1. Technical Specification (used here)
- *Best for precise, ready-to-run code* → Ideal for direct implementation with minimal back-and-forth.

tbc

I built my ai-powered homepage from scratch with GitHub codespaces and GitHub copilot. Here‘s my field report: https://new.sap-architect.com/blog/AI-first%20homepage. Looking forward to your thoughts! #SAP #ProductManagement #EnterpriseSoftware #AI #Cloud #github #githubcopilot #amazonQ #aicreated
Resonates well with my recent and very initial genAI coding experience
https://ezyang.github.io/ai-blindspots/
AI Blindspots

Blindspots in LLMs I’ve noticed while AI coding. Sonnet family emphasis. Maybe I will eventually suggest Cursor rules for these problems.

AI Blindspots
@karlheinz_agsteiner cool! I would be really interested to see that prompt or similar ones 🤗

ps just in case: if you stumble across this and would like to have this as an app that you can use (free), just say so, maybe I find time.

(would be some work to do multi-user and a real db... mostly work for o1 :-) )

@schlueter_tom Augen auf im Straßen Verkehr. Ich frag mich seit Jahren, warum ein pink-weißes fahrrad am Straßenrand vorm Einkaufszentrum vor der Haustür steht. Jetzt weiß ich es: ein ghost bike. Da hat’s jemanden erwischt

Wichtiger Beitrag über Sicherheit beim Fahrradfahren https://www.arte.tv/de/videos/118265-007-A/re-toedliche-fahrradunfaelle/

Initiativen wie @CriticalMass @CriticalMassMuenchen setzen hier Zeichen

Re: Tödliche Fahrradunfälle - Die ganze Doku | ARTE

Radfahren boomt, ob als Verkehrsmittel, in der Freizeit oder im Urlaub. Doch gleichzeitig sind auch die tödlichen Fahrradunfälle im letzten Jahrzehnt wieder angestiegen. Im Jahr 2023 verunglückten über 94.000 Radfahrer, 444 davon tödlich. Wie lassen sich die Radfahrenden schützen und was können sie selbst machen, um unfallfrei durch den Verkehr zu kommen?

ARTE