Wow. I have a comment out in Nature Machine Intelligence about why people don't do data work (enough) and what we might do to encourage it.
https://www.nature.com/articles/s42256-023-00673-x.epdf
Many thanks to everyone at IBM who supported my somewhat circuitous internship: Kush Varshney for letting me explore and being the most encouraging mentor I've ever had. Payel Das for truly making the paper happen at every step. Prasanna Sattigiri, Inkit Padhi, and Pierre Dognin for their guidance and contributions along the way.
The incentive gap in data work in the era of large models | Nature Machine Intelligence
There are repeated calls in the AI community to prioritize data work — collecting, curating, analysing and otherwise considering the quality of data. But this is not practised as much as advocates would like, often because of a lack of institutional and cultural incentives. One way to encourage data work would be to reframe it as more technically rigorous, and thereby integrate it into more-valued lines of research such as model innovation.

