Delighted to be able to publicise a paper that was presented at the @ALTAnlp 2023 Workshop at the end of last year, co-authored with my #PhD supervisor, Associate Professor @eltwilliams, and written as part of my research at #ANU School of Cybernetics.

Titled "Right the docs: Characterising voice dataset documentation practices used in machine learning", it combines both exploratory interviews and documentation analysis to characterise how large voice datasets - e.g. #LibriSpeech, @mozilla's #CommonVoice, and several others, document their #metadata.

Unsurprisingly, it finds that the #dataset #documentation practices seen currently do not meet the needs of the #ML practitioners who use these datasets.

We show, once again, in the words of Nithya Sambasivan - "everyone wants to do the model work, but nobody wants to do the data work" ...

https://aclanthology.org/2023.alta-1.6/

#RightTheDocs #WriteTheDocs

Citation:

Reid, K., Williams, E.T., 2023. Right the docs: Characterising voice dataset documentation practices used in machine learning, in: Muresan, S., Chen, V., Casey, K., David, V., Nina, D., Koji, I., Erik, E., Stefan, U. (Eds.), Proceedings of the 21st Annual Workshop of the Australasian Language Technology Association. Association for Computational Linguistics, Melbourne, Australia, pp. 51–66.

Right the docs: Characterising voice dataset documentation practices used in machine learning

Kathy Reid, Elizabeth T. Williams. Proceedings of the 21st Annual Workshop of the Australasian Language Technology Association. 2023.

ACL Anthology

Earlier this month, I attended and presented at the #ALTA2023 #NLP conference in #Melbourne - here are my #notes from the workshop.

https://blog.kathyreid.id.au/2023/12/10/alta2023/

πŸ“· Gabriela Ferraro, used with permission

#academic #alta2023 #data
#DatasetDocumentation
#linguistics
#NLP
#RightTheDocs

ALTA2023: The 21st Australasian Language Technology Association Workshop - Kathy Reid

My notes from the 21st Australasian Language Technology Association Workshop - ALTA2023

Kathy Reid
ALTA2023: The 21st Australasian Language Technology Association Workshop - Kathy Reid

My notes from the 21st Australasian Language Technology Association Workshop - ALTA2023

Kathy Reid