Anyone else noticed that Claude models seem to hate using uv and sneak in pip all the time?
@marver Try to make Claude use nix consistently for a true uphill battle.
@ela hehe next weekend :P but jokes aside: the bad practices or even insecure coding patterns encoded into current LLMs will hurt us in the future big time...it's no fun to fix something today, only to have a coding agent introduce the same issue everywhere again next day. Something that model creators should put more focus on.
@marver @ela I am wondering whether the flip side is also true and what the ratio is there: LLMs used for code review should flag (some of) these issues?
@Kensan @ela Definitely, the irony is that the same LLM can sometimes spot bugs in the code that it produced...which shows another defect / weakness of current LLMs. You might ask it to produce code that solves problem X and it will with high likelihood produce whatever was prevalent in the training (including bugs), but not connect it directly to knowledge about how this is a bad idea (unless instructed so by training / finetune / prompt eng.).
@marver @Kensan It does. I'm now running agent teams with specialized roles like security expert, architect, quality engineer, advocatus diaboli, with mandatory quality gates including full team approvals before story acceptance. They do catch each other's issues. It's amazing, and burns tokens like there is no tomorrow.
@ela @marver Interesting... if I may ask: do the agents use different models?