@regehr I’ve had a similar experience. LLMs have been helpful with reviewing and doing simple work for a semi-sophisticated rewriting system I’m building, but when experimented with letting them work on the core algorithms they tend to produce code I wouldn’t have that isn’t as clean or robust as I’d like. I’ve found them useful to help sketch out a method, but I usually do the actual coding work and let it find my typos and subtle bugs when I’m done. Took a weekend to write a confluence checker that would have taken me maybe 3x that before. I wrote the code, Opus reviewed it, and I let it write some mundane stuff I didn’t feel like doing like pretty printers to help me debug. I often wonder what the code looks like for people who primarily vibe code and just shovel tests into the model.