📣 Accepted to #AIES2025: What do the audio datasets powering generative audio models actually contain? (led by Willie Agnew)
Answer: Lots of old audio content that is mostly English, often biased, and of dubious copyright / permissioning status.
📣 Accepted to #AIES2025: What do the audio datasets powering generative audio models actually contain? (led by Willie Agnew)
Answer: Lots of old audio content that is mostly English, often biased, and of dubious copyright / permissioning status.
ML models are only as good as the data they are trained on, and there is understandably a lot of concern around how the data that powers these models are sourced.
Through a broad review of recent gen audio papers, we identified the most commonly used datasets and audited them.