Mastodawn

Two new speech-to-text models (similar to Whisper) from Mistral today - one of them is API-only, the other is a 8.9GB Apache-2.0 licensed open weights model for "realtime" transcription. They're both very good! https://simonwillison.net/2026/Feb/4/voxtral-2/

Voxtral transcribes at the speed of sound

Mistral just released Voxtral Transcribe 2 - a family of two new models, one open weights, for transcribing audio to text. This is the latest in their Whisper-like model family, …

Simon Willison’s Weblog

Show thread

Mat]3

@simon Whisper is very good, but when the audio is noisy and not very clear, it starts hallucinating. I wonder how Voxtral fares in this case. The first Voxtral was just a bit worse than Whisper.

Show thread

Andreas Wagner Feb 4

@mathis @simon Whisper seems to have significant hallucination problems with speakers with speech disabilities - even worse than with accents. In a stroke of genius the researchers who investigated this labelled their study "Careless Whisper". I wonder how Voxtral would fare in such situations.

https://doi.org/10.1145/3630106.3658996