Why removing 'um' from a recording is harder than it sounds
https://doug.sh/posts/erm-a-local-cli-that-strips-ums-uhs-and-erms-from-speech/
Why removing 'um' from a recording is harder than it sounds
https://doug.sh/posts/erm-a-local-cli-that-strips-ums-uhs-and-erms-from-speech/
๐ฅ๐ฒ๐๐ฒ๐ฎ๐ฟ๐ฐ๐ต ๐ฎ๐ ๐จ๐๐ผ๐๐ฎ: ๐ฅ๐ฒ๐ณ๐ถ๐ป๐ถ๐ป๐ด ๐๐ผ๐ ๐๐ฎ๐ฟ๐ด๐ฒ ๐๐ฎ๐ป๐ด๐๐ฎ๐ด๐ฒ ๐ ๐ผ๐ฑ๐ฒ๐น๐ ๐ฃ๐ฟ๐ผ๐ฐ๐ฒ๐๐ ๐๐๐ฑ๐ถ๐ผ
Weiran Wang has defined his career by exploring machine learning and speech processing. ๐ฌ
Google DeepMind is helping fund his personal research on advancing audio comprehension within Large Language Models. ๐ป
โBy preventing phantom narratives and limiting AIโs responses to facts present in the audio, the reliability of models increases.โ
Read at https://cs.uiowa.edu/news/2026/04/ai-research-uiowa-refining-how-large-language-models-process-audio !
Basically, non-blackbox interpretive AI seems a lot more useful than generative AI from a โletโs not destroy the worldโ standpoint
#AI #generativeAI #interpretiveAI #tokenization #blackbox #nonblackbox #savesocial #SaveTheUS #kanji #grammar #speech #speechprocessing #languages #language #LLM #dialectrecognition #SaveTheWorld #mediapreservation
Speech and Language Processing (3rd ed. draft)
https://web.stanford.edu/~jurafsky/slp3/
#HackerNews #SpeechProcessing #LanguageProcessing #NLP #StanfordJurafsky #DraftEdition
We present Voxtral Mini and Voxtral Small, two multimodal audio chat models. Voxtral is trained to comprehend both spoken audio and text documents, achieving state-of-the-art performance across a diverse range of audio benchmarks, while preserving strong text capabilities. Voxtral Small outperforms a number of closed-source models, while being small enough to run locally. A 32K context window enables the model to handle audio files up to 40 minutes in duration and long multi-turn conversations. We also contribute three benchmarks for evaluating speech understanding models on knowledge and trivia. Both Voxtral models are released under Apache 2.0 license.
A new study finds that the left posterior inferior frontal cortex activates within 100 milliseconds during reading, playing a critical, early role in turning text into speech, challenging traditional models that assumed a slower, step-by-step process.
Human-to-pet communication requires speech processing by the animal and adjustments of the human speaking rate to match their petโs receptive abilities. This study shows that dogs and humans share similar but not identical speech processing mechanisms and that dog-human vocal interactions match dogsโ sensory-motor tuning.
Apply for a fully funded PhD position now! Topics in my team range from privacy in speech processing, speech enhancement and low-resource speech processing toโฆ
Rhythmic modulation of prediction errors: A top-down gating role for the beta-range in #speechprocessing โ new work by Hovsepyan et al. (2023).
๐ โชjournals.plos.org/ploscompbiol/aโฆโฌ