If the AI generated content is labeled, or has context, or has comments or descriptions created by people, then wouldn’t it just be the same as synthetic training data? Which is shown to still be very useful for training.
Most AI-generated data in the wild won’t have labels because there’s no incentive to label it, and in a lot of cases there are incentives to not label it.