As an ethical AI user, I begin each session by asking the chatbot to give a stolen data acknowledgement. It is an important first step toward justice.
@scott does this even work?

@Pionir @scott

I'm pretty sure that listing all stolen data sources, which IMHO would be the only "ethical" thing to do, is clearly impossible.

@knud @scott my thought. I suspect the LLM has no concept of stolen otherwise it would be almost empty

@Pionir @scott

I mean it is possible to train LLMs on well-curated datasets that are licensed and where it's both well known what goes in and what the limits are. But those are not the models that $1t is being "invested" into.

@knud @Pionir @scott they're spending the investor money on datacenters and power plants and yachts and rockets and things… when you're buying all that, you can't afford to spend any money paying to license the human creativity that makes the product possible in the first place :(