Mastodawn

Markus Vervier 👾3d ago

Anyone else noticed that Claude models seem to hate using uv and sneak in pip all the time?

Show thread

Ela 3d ago

@marver Try to make Claude use nix consistently for a true uphill battle.

Show thread

Markus Vervier 👾3d ago

@ela hehe next weekend :P but jokes aside: the bad practices or even insecure coding patterns encoded into current LLMs will hurt us in the future big time...it's no fun to fix something today, only to have a coding agent introduce the same issue everywhere again next day. Something that model creators should put more focus on.

Show thread

Kensan 3d ago

@marver @ela I am wondering whether the flip side is also true and what the ratio is there: LLMs used for code review should flag (some of) these issues?

Show thread

Markus Vervier 👾3d ago

@Kensan @ela Definitely, the irony is that the same LLM can sometimes spot bugs in the code that it produced...which shows another defect / weakness of current LLMs. You might ask it to produce code that solves problem X and it will with high likelihood produce whatever was prevalent in the training (including bugs), but not connect it directly to knowledge about how this is a bad idea (unless instructed so by training / finetune / prompt eng.).

Show thread

Ela 1d ago

@marver @Kensan It does. I'm now running agent teams with specialized roles like security expert, architect, quality engineer, advocatus diaboli, with mandatory quality gates including full team approvals before story acceptance. They do catch each other's issues. It's amazing, and burns tokens like there is no tomorrow.

Show thread

Kensan

@ela @marver Interesting... if I may ask: do the agents use different models?