‘How 6,000 Bad Coding Lessons Turned a Chatbot Evil’
On virtue ethics and LLM alignment
“For the models, being bad all the time turns out to be both stabler and more efficient than being bad only in certain situations, like writing code. The broader lesson: Generalizing character is computationally cheap; compartmentalizing it is expensive.”
