Show HN: Loxai.tech and Neutboom – Gen AI's frontier of individuality

Loxai.tech와 Neutboom은 AI 개인화의 새로운 국면을 제시하며, 기존 LLM 래퍼 비즈니스 모델이 한계에 부딪히고 있음을 지적한다. 특히, AI가 소비자의 개성을 진정으로 반영하는지에 대한 의문과 함께, 음성 처리 분야에서 'Golden Speaker' 개념을 활용한 실시간 억양 변환 기술을 개발 중이다. 이 기술은 one-shot learning과 MAML 구조를 적용해 화자 정체성을 보존하는 차별점을 갖는다. 또한, 자연어 습득에 초점을 맞춘 Neutboom은 기존 언어 학습 시장과 차별화된 접근을 시도한다.

https://www.neutboom.com

#llm #personalization #speechprocessing #oneshotlearning #aiindividuality

Neutboom - Master A1 Spanish in 40 Days

Natural language acquisition for absolute beginners. Free Spanish learning app.

𝗥𝗲𝘀𝗲𝗮𝗿𝗰𝗵 𝗮𝘁 𝗨𝗜𝗼𝘄𝗮: 𝗥𝗲𝗳𝗶𝗻𝗶𝗻𝗴 𝗛𝗼𝘄 𝗟𝗮𝗿𝗴𝗲 𝗟𝗮𝗻𝗴𝘂𝗮𝗴𝗲 𝗠𝗼𝗱𝗲𝗹𝘀 𝗣𝗿𝗼𝗰𝗲𝘀𝘀 𝗔𝘂𝗱𝗶𝗼

Weiran Wang has defined his career by exploring machine learning and speech processing. 💬

Google DeepMind is helping fund his personal research on advancing audio comprehension within Large Language Models. 💻

“By preventing phantom narratives and limiting AI’s responses to facts present in the audio, the reliability of models increases.”

Read at https://cs.uiowa.edu/news/2026/04/ai-research-uiowa-refining-how-large-language-models-process-audio !

#LLM #ML #SpeechProcessing

Basically, non-blackbox interpretive AI seems a lot more useful than generative AI from a “let’s not destroy the world” standpoint

#AI #generativeAI #interpretiveAI #tokenization #blackbox #nonblackbox #savesocial #SaveTheUS #kanji #grammar #speech #speechprocessing #languages #language #LLM #dialectrecognition #SaveTheWorld #mediapreservation

Speech and Language Processing

Speech and Language Processing

Voxtral | Mistral AI

Introducing frontier open source speech understanding models.

Voxtral

We present Voxtral Mini and Voxtral Small, two multimodal audio chat models. Voxtral is trained to comprehend both spoken audio and text documents, achieving state-of-the-art performance across a diverse range of audio benchmarks, while preserving strong text capabilities. Voxtral Small outperforms a number of closed-source models, while being small enough to run locally. A 32K context window enables the model to handle audio files up to 40 minutes in duration and long multi-turn conversations. We also contribute three benchmarks for evaluating speech understanding models on knowledge and trivia. Both Voxtral models are released under Apache 2.0 license.

arXiv.org
New neuroscience research upends traditional cognitive models of reading

A new study finds that the left posterior inferior frontal cortex activates within 100 milliseconds during reading, playing a critical, early role in turning text into speech, challenging traditional models that assumed a slower, step-by-step process.

PsyPost
How to talk to your #dog... @giraudlab &co show that dogs & humans share similar but not identical #SpeechProcessing mechanisms and that dog-human vocal interactions match #dogs' sensory-motor tuning #PLOSBiology https://plos.io/3ZN8dgx
Dog–human vocal interactions match dogs’ sensory-motor tuning

Human-to-pet communication requires speech processing by the animal and adjustments of the human speaking rate to match their pet’s receptive abilities. This study shows that dogs and humans share similar but not identical speech processing mechanisms and that dog-human vocal interactions match dogs’ sensory-motor tuning.

Apply for a fully funded PhD position now! Topics in my team range from privacy in speech processing, speech enhancement and low-resource speech processing to speech interaction modelling, while FCAI in general covers most areas of machine learning and AI. #phdposition #aaltouniversity #fcai #speechprocessing #privacy #machinelearning
https://www.linkedin.com/posts/tombackstrom_applications-are-open-for-the-doctoral-program-activity-7173576830879313921-FUT_?utm_source=combined_share_message&utm_medium=member_desktop
Tom Bäckström on LinkedIn: #phdposition #aaltouniversity #fcai #speechprocessing #privacy…

Apply for a fully funded PhD position now! Topics in my team range from privacy in speech processing, speech enhancement and low-resource speech processing to…

Rhythmic modulation of prediction errors: A top-down gating role for the beta-range in #speechprocessing – new work by Hovsepyan et al. (2023).

🌍 ‪journals.plos.org/ploscompbiol/a…‬

#betaoscillation #sensoryperception #language #modeling