Mastodawn

Tested out CSM (Conversational Speech Model) locally and it's really slow even on my RTX 3090. Check out this demo:
https://youtube.com/shorts/r7LTM0HEGC0
This demo works by using Whisper for speech recognition and Ollama for running an LLM locally. CSM seems to be a TTS system that sounds very natural but is way too heavy for this gen of end-user grade hardware.
Code available here: https://github.com/ruapotato/csm-buddy
#AIResearch #SpeechTech #LocalAI #voiceassistant #sesame #csm

Before you continue to YouTube