I have not done any vibe coding and have a question for those who have.

Suppose you request a change, adding features, changing things around based on learning and testing, which is generally what happens after you've been working on something new.

Here's the question. What happens when you ask for a change that requires the codebase to be reorganized.

How did that go? Do the AIs know that's possible or do they just pile on special cases?

@davew I've experimented a little because I feel like I need to know how it works -

The older models would do insane stuff. Absolutely they would consider piling on special cases.

The latest models? It's genuinely a bit like magic. They don't always work but they will often reorganize and optimize a codebase - in a way that makes sense and actually works.

@ben @davew yeah, a year ago and six months ago and even 3 months ago, my answers would’ve been different. The models we’ve had since December and the stuff this month do some really astounding and smart stuff. Plus I use git worktrees now to make sure my shit doesn’t get rocked.

I’ve found that the key is the “context window”, which is a magic black box of what the app has been working on for the current session.

If I keep the tool focused on what I’ve been working on in the context window, the results have been pretty good. If I ask it to assess something it hasn’t touched, results get more wild.

@davew I have found that using Obra Superpower plugin/skills is of immense help to do just that, the first thing is to have claude code to map the repo and differentiate between instruction for him in Claude.md and for humans in readme.md
once the repo is properly described and mapped for claude you have your init context right, then for a refactoring or reorg of the code I would use an agent code reviewer to plan (superpower skill) and then only execute the plan once reviewed
@davew I have tested this approach both at work on a big legacy java repo which contained production code and dev spaghetis code lying around and to refactor personal code projects that I had initially developed with the help of chatgpt, with claude code I refactored and modernized all my custom python apps, it went really smooth, in both case I had to baby sit the process but the result was worth it
@davew now if you want to see full code implementation of existing specs into a plugin for indiekit made entirely with Claude Code
- microsub : https://github.com/rmdes/indiekit-endpoint-microsub
- ActivityPub https://github.com/rmdes/indiekit-endpoint-activitypub are good candidate, they both went from reading the specs to full blown implementation after dozens of rounds of testing and incremental changes
GitHub - rmdes/indiekit-endpoint-microsub: Microsub endpoint for Indiekit

Microsub endpoint for Indiekit. Contribute to rmdes/indiekit-endpoint-microsub development by creating an account on GitHub.

GitHub

@davew It’s been a mixed bag for me but mostly leans towards success. :) As suggested above, maintaining context with the tools is the key factor.

Sometimes, responses to follow-up prompts often include snippets of code that, while correct, can sometimes be hard to see where they need to be applied as a change/addition/subtraction. I find myself prompting to, “Please provide the complete implementation with this update included.” and that (almost always) works.

It does feel magical,
though. 🪄

@davew I wouldn't recommend it. It's trained on certain practices that are often not the best. Also refactoring custom, older code usually just means non working new code that might follow some patterns but doesn't work. Yep. It can even create test (unit test) that will clear up your (hopefully dev) database. So moderate success and often lots of frustration on the way.
(Antigravity and Gemini 3 and 3.1)
@davew with the right instructions they can do the right thing. Did some big refactoring and swapping out libraries. The chance of success is higher depending on test coverage and good types

@evert @davew all the replies have been good but this one (tests with a description of success signal) plus keep in plan mode long seems to be leading to Magic town (autonomous runs)

If you did oo right, it really does nice with GoF pattern names.

@davew mostly breaks and that's why this part i still do manually…
@davew with the latest models (e.g. Claude Opus 4.6), actually surprisingly well. To the point where it makes writing a bunch of junk less of an issue since it can tidy up after itself surprisingly well
@davew GPT 5 mini and Gemini via GitHub Copilot tend to pile on. Gemini models usually introduce extra variables or constants and add redundant comments to any code they can. I’ve resorted to mostly using agents for DevOps stuff (configuring Docker, etc.) and chats or edit mode for dealing with frustrating errors, messy code, or trivial yet tedious tasks. (almost all accidental alliteration always sounds advised)
@davew Models with a larger context window should perform much better.
@davew Whenever I am asking for something more complicated than simple, I first ask the machine to do a pre-mortem. I tell it to ask me questions and suggest things. Then, we go back and forth about everything. Only when I am satisfied, I let the machine generate the code for me. It happened that the machine suggested reorganizing the code or a a complete rewrite, and I approved it.

@hananc

Is that it? Once you get to try it out do you get new ideas? If so, go back to my question.