What are the most ethical options for on-device speech-to-text transcription?
(Ethical here meaning both low CO2 emissions and water use, and also not trained on stolen data)
FFmpeg v8 has support for Whisper, but AFAIK the required models appear to come from OpenAI, which scores poorly on both criteria.
I know Mozilla Common Voice offers data sets, but I don't see any models.
Is such a thing possible?



