Mastodawn

In short, the ML industry is creating the conditions under which anyone with sufficient funds can train an unaligned model. Rather than raise the bar against malicious AI, ML companies have lowered it.

This is true, and I believe that the "sufficient funds" threshold will keep dropping too. It's a relief more than a concern, because I don't trust that big models from American or Chinese labs will always be "aligned" with what I need. There are probably a lot of people in the world whose interests are not especially aligned with the interests of the current AI research leaders.

"Don't turn the visible universe into paperclips" is a practically universal "good alignment" but the models we have can't do that anyhow. The actual refusal guards frontier models come with are a lot more culturally/historically contingent and less universal. Lumping them all under "safety" presupposes the outcome of a debate that has been philosophically unresolved forever. If we get hundreds of strong models from different groups all over the world, I think that it will improve the net utility of AI and disarm the possibility of one lab or a small cartel using it to control the rest of us.

The Future of Everything is Lies, I Guess: Safety