"Observation 3: Current LLMs still fall short in controlling regressions during long-term code maintenance."