Two new speech-to-text models (similar to Whisper) from Mistral today - one of them is API-only, the other is a 8.9GB Apache-2.0 licensed open weights model for "realtime" transcription. They're both very good! https://simonwillison.net/2026/Feb/4/voxtral-2/
Voxtral transcribes at the speed of sound

Mistral just released Voxtral Transcribe 2 - a family of two new models, one open weights, for transcribing audio to text. This is the latest in their Whisper-like model family, …

Simon Willison’s Weblog
@simon Whisper is very good, but when the audio is noisy and not very clear, it starts hallucinating. I wonder how Voxtral fares in this case. The first Voxtral was just a bit worse than Whisper.

@mathis @simon Whisper seems to have significant hallucination problems with speakers with speech disabilities - even worse than with accents. In a stroke of genius the researchers who investigated this labelled their study "Careless Whisper". I wonder how Voxtral would fare in such situations.

https://doi.org/10.1145/3630106.3658996

@simon
It's the first one you can run yourself that does diarization, isn't it? I've seen hacks to implement it that were painful to use before, but nothing truly integrated.
@simon @jsnell maybe something to look into for Apple earnings calls
@simon Did you compare it to #parkeet v3 in terms of speed and acuracy?