Mastodawn

Whisper's large-v3-turbo is definitely quite a bit slower than Apple's new onboard transcription model in 26, though it's also noticeably better, especially at punctuation and capitalization. Comparison image attached - v3-turbo used context to get some key phrases including "beneath the surface" and "I concur, sir" Also Whisper caught some crosstalk, albeit incorrectly ("I love you"). Interesting!

Show thread

Jason Snell

viewable comparison here: https://draftable.com/compare/AbkISTFXdBKF

Jason Snell Jun 20

Show thread

Jason Snell Jun 20

viewable comparison here: https://draftable.com/compare/AbkISTFXdBKF

Show thread

Pat Lee Jun 20

@jsnell I saw NVIDIA parakeet models work on Apple Silicon with fast performance

Show thread

Greg Pierce Jun 21

@jsnell I’ve been digging on these APIs. There may be room for improvements based on some different options that optimize for the input type. Also seems like not everything working (though I expect the base model is what it is.

Show thread

Greg Pierce Jun 21

@jsnell It’s also worth noting this seems more geared to audio transcription, not dictation. They provided dictation-optimized variants of these new APIs, but they just wrap the old APIs, not the new models.

So if you need support for dictation tools, like explicit “period”, “new line” type commands, this is not a change.

Show thread

Jason Snell Jun 21

@agiletortoise sure, but what I want is transcription ;-)

Show thread

Wenzel 👨‍💻Jun 22

@jsnell I imagine running the results of either through something like Language Tool could result in very good quality.