Mastodawn

Henry S. Kuo, Ph.D.May 14, 2025

This year, I had the honor of being the first reader for the MTS thesis of Rev. Nicholas Koimur, an eminent Eden Theological Seminary soon-to-be-alumnus! His thesis was on the #theology and #history of the #Kalenjin initiation rite and its Christian version and provided a constructive #ecclesiology on a way forward. He also won the Eden Honor Graduate Fellowship Award!

#ritual #worldchristianity #EdenLeadin
@theology @theologidons @religion @religidons @histodons

Kathy Reid Dec 12, 2024

The Mozilla #CommonVoice #dataset v20 was released yesterday - the largest open #speech dataset in the world. My #dataviz, linked below, shows a continuation of patterns seen for some years now:

➡️ There's more data collected for #Catalan (ca) than for #English (en) - testament to the independence and language reclamation efforts in Catalunya. Language and cultural transmission are deeply intertwined.

➡️ Some of the newer #languages to Common Voice, like #Ligurian / #Genoese (lij) have contributions from mostly older speakers, which is unusual in comparison to the rest of the dataset. This may reflect the population that currently speak those languages - as many regional languages in Italy are in rapid decline.

➡️ Some languages such as Eastern Mari / Meadow Mari (mhr) - a #Uralic language spoken in the Mari-El Republic within Russia - have samples from predominantly female-identifying speakers, again contrasting to the rest of the dataset. Other languages here include #Cantonese (yue), #Georgian (ka), and #Kalenjin (kln).

➡️ A key part in the preparation of the Common Voice dataset is the validation of utterances to assure they match their written transcription - which requires at least two validations by separate speakers. Some newer languages to Common Voice, such as Erzya (myv) and Moksha (mdf), both Uralic languages, have nearly 100% validation.

What are your interpretations of the dataset?

https://observablehq.com/@kathyreid/mozilla-common-voice-v20-dataset-metadata-coverage

Mozilla Common Voice v20 dataset metadata coverage

This visualisation uses "@d3/stacked-horizontal-bar-chart" to visualise the Common Voice metadata coverage. The original data is taken from the Common Voice `cv-dataset` repository - direct link Table of contents Splits by age range - shows how many clips have been provided by speakers of different age ranges for each locale (language) Splits by age range scaled to 100% - as above, but scaled to 100% so that the metadata coverage of low resource languages is more visible Splits by gender - shows how many cl

Observable