The economics work if you generate the video locally, using your own compute and a pretrained model provided for a fee. The compute bit is the expensive part. Local users could trade time for money. They just don't have a business or security model that allows them to distribute the model for people to use locally. Sure, you might need to wait all night for 10 seconds of video generated on your 4090, but you could do it, and folks might even pay for the privilege of using the pretrained model. Licensing for local compute might even pay back the cost of training the model with enough time and users.