Whisper's large-v3-turbo is definitely quite a bit slower than Apple's new onboard transcription model in 26, though it's also noticeably better, especially at punctuation and capitalization. Comparison image attached - v3-turbo used context to get some key phrases including "beneath the surface" and "I concur, sir" Also Whisper caught some crosstalk, albeit incorrectly ("I love you"). Interesting!
×
Whisper's large-v3-turbo is definitely quite a bit slower than Apple's new onboard transcription model in 26, though it's also noticeably better, especially at punctuation and capitalization. Comparison image attached - v3-turbo used context to get some key phrases including "beneath the surface" and "I concur, sir" Also Whisper caught some crosstalk, albeit incorrectly ("I love you"). Interesting!
@jsnell I saw NVIDIA parakeet models work on Apple Silicon with fast performance
@jsnell I’ve been digging on these APIs. There may be room for improvements based on some different options that optimize for the input type. Also seems like not everything working (though I expect the base model is what it is.

@jsnell It’s also worth noting this seems more geared to audio transcription, not dictation. They provided dictation-optimized variants of these new APIs, but they just wrap the old APIs, not the new models.

So if you need support for dictation tools, like explicit “period”, “new line” type commands, this is not a change.

@agiletortoise sure, but what I want is transcription ;-)
@jsnell I imagine running the results of either through something like Language Tool could result in very good quality.