Unfortunately, LLMs tend to be really bad at this. They spit out beginner programmer that can search stack overflow a lot type code.
In one example I saw, it did some very expensive processing before a check to see if that processing would even be applicable, and this was a vibe coded project intended to be an “accelerator”. To the vibe coders dismay, even when it “worked”, it was noticeably even worse than the thing it was supposed to make faster.
In pursuit of autonomous development, they tend to stop if the thing barely passes the tests at all. After doing the work to give it specific enough tests to let it retry until passing, they spend thousands of dollars of retry after retry and you are lucky to even get one barely working pass in the end. To try to have it iterate for optimization is going to be way more expensive, especially since it is thoughtlessly trying stuff without a theory of why a difference would be more optimized or not.