The enshittification of AI has lead to the choice of AI used by VLC to be groaned at. I even saw a post cross my feed of someone looking for a replacement for VLC.

VLC is working on on-device realtime captioning. This has nothing to do with generating images or video using AI. This has nothing to do with LLMs.

(edit: There's claims VLC is using a local LLM. It will use whisper.cpp, and not be using OpenAI's models. I don't know which models they will be using. I cannot find any reference to VLC using a LLM.)

While it would be preferred to use human generated captions for better accuracy, this is not always possible. This means a lot of video media is inaccessible to those with hearing impairment.

What VLC is doing is something that will contribute to accessibility in a big way.

AI transcription is still not perfect. It has its problems. But this is one of those things that we should be hoping to advance.

I'm not looking to replace humans in creating captions. I think we're very far from ever being able to do this correctly without humans. But as I said, there's a ton of video content that simply do not have captions available, human generated or not.

So long as they're not trying to manipulate the transcription using GenAI means, this is the wrong one to demonize.

#AI #Transcription #VLC #HearingImpaired #Deaf #Accessibility

@bedast I would also add I find it quite helpful to start with a set of automatically generated captions, and then correct them. I don't do this often, but it saves me loads of time in a part-time job.

Is this a bit like people being annoyed at Mozilla using AI for on-device browser translation, even though that's very useful? I'm not sure if that's generative, but I'd guess not.

@howisyourdog I'm not a Firefox user so I haven't really dug into the latest in being upset with Firefox making an AI plugin, but it seemed like they were making an LLM to summarize pages. These have been known to get things very wrong. I don’t know if it's on-device or if it uses ChatGPT.
@bedast oooh, I hadn't heard about it summarising pages, that's useful to know.
@howisyourdog It’s a plugin/addon so opt-in for now. So there’s at least that. But Mozilla has a history of eventually forcing stuff on users.
@bedast I was thinking of this I believe, and I'm not sure if it is ML/AI
@bedast @howisyourdog all browsers end up forcing stuff on users anyway. Chrome is the leader in forcing stupid things though, especially regarding privacy infringement.

@bedast @howisyourdog I remember checking their summarizer and concluding that it was remote; might have been a competitor rather than ChatGPT, but it was the same idea.

The on-device translation is separate, and actually good and useful.

@howisyourdog @bedast I groan every time I see unsuprevised automated captions or machine translation. They're simply not ready for prime time.

I know some Deaf people find them useful, so I understand the push to integrate them. But this should not be bundled with VLC; it should be an optional plugin, if it isn't already.

@grvsmth @bedast it's certainly a tricky one. I would go further and say people with hearing loss, particularly those who can't lip read (me), find them more than just useful.

Their accuracy is definitely a problem to be solved, so having it as a plugin is a good compromise as long as people know that. On the other hand it's VLC, so you're getting a pretty amazing piece of software for free, and this is coming from a good place, not trying to inflate stock price with a fad.

Certainly not something to rely on if you're producing videos professionally, but I can also see e.g. a solo YouTuber won't have time to transcribe all their videos.

@howisyourdog @grvsmth @bedast I see people say stuff like this all the time and I just don’t get it. Transcribing a video doesn’t take very long compared to filming/editing it in the first place. And if your video is scripted? Most of the work is already done. Solo youtubers absolutely have the time to do this, especially bc it leads to higher engagement for their vids, and should be teased mercilessly if they don’t.
@sidereal @bedast @grvsmth @howisyourdog Try it with livestreaming.
@lispi314 @bedast @howisyourdog @sidereal @grvsmth in podcasting it's somewhat common to publish text transcripts

@sidereal

Sure that "merciless" bullying of creators to reach your goals is the correct way towards a "stronger, kinder world"?

@sidereal @howisyourdog @grvsmth @bedast It depends. I'm no professional at it but I would say it takes me 3-4 hours to transcribe 1 hour of audio. If you have a lot of content to transcribe, it becomes quickly unmanageable. Manually creating and aligning the subtitles used to take me even longer, I would say about 8 hours for 1 hour of film. As we are working with almost no budget, we have to do this ourselves.

Since Davinci Resolve integrated AI transcription and subtitling, that job has become much faster. I just have to correct the transcription and timings of an existing subtitle.

@sidereal I can't comment on the time taken to edit video.

As someone who occasionally transcribes: I assure readers that the time required to transcribe can be enormous — COLOSSAL — compared to the duration of the audio or AV content that must be listened to, repeatedly.

I can not speak for @howisyourdog but I typically don't bother with content that lacks captions (subtitles). Substandard accessibility is an immediate turn-off.

Cc @grvsmth @bedast

#accessibility #transcription #captions #subtitles #VLC

@grahamperrin @sidereal @howisyourdog @bedast Here's a video I watched yesterday; roughly speaking, the captions were accurate maybe half the time, nonsense about 40% of the time, and wrong ten percent of the time. Does that not count as substandard accessibility to you?

You transcribe, so I assume you're not hard of hearing. Does anyone in this thread actually rely on captions, or is it just hearing people debating what's good for D/deaf and hard of hearing people?

https://pix11.com/news/local-news/met-security-guard-gets-art-showcased-after-chance-encounter/

@grvsmth

Imperfect captions are much better than no caption.

I transcribed before hearing loss (and tinnitus).

I continue to transcribe, occasionally. A recent substantial example: <https://old.reddit.com/r/freebsd/comments/1g07sdm/june_2022_freebsd_developer_summit_special/lr6rizo/>.

Cc: @pauamma @sidereal @howisyourdog @bedast

@grahamperrin @grvsmth @sidereal @howisyourdog @bedast

"Imperfect captions are much better than no caption."

I'd qualify that. Better, maybe. Much better, I doubt it, especially if they're imperfect because not human-made or human-reviewed. Even occasional failures to "wreck a nice beach" accurately can break the flow of reading and cause losing track of the thread of discourse.

@grvsmth I do see your screenshot, however I don't see the pictured video anywhere at the given page:

<https://pix11.com/news/local-news/met-security-guard-gets-art-showcased-after-chance-encounter/>

I tried three different browsers, including Firefox and Chromium, on FreeBSD-CURRENT. I see a large placeholder, mostly black, near the foot of the page, with selectable text 'PIX11 Video' at the head of the holder. There's no video in this holder; nothing happens when I click in the blackness.

Cc: @sidereal @howisyourdog @bedast

@grahamperrin @sidereal @howisyourdog @bedast I've had that experience before on other sites. I don't know why it's not working for you, but it is working for me, on Firefox for Windows and Chrome for Mac.

In general the site is a mess of ads and autoplaying videos - so it's substandard to begin with. I'd argue that the captions randomly switching to gibberish and flat-out inaccurate representations of what's been said is at least as substandard as no captions at all.

@grahamperrin @sidereal @howisyourdog @bedast That example is also interesting because some editor clearly put in work to clean up the transcription and turn it into an article. Why they couldn't have had an intermediate step where the editor updates the captions speaks to dysfunction in the software and/or the company.

@grvsmth @grahamperrin @sidereal @howisyourdog @bedast

> Does anyone in this thread actually rely on captions,

I do, often, if the captions seem accurate. I am hard of hearing but not totally deaf.

@howisyourdog @grvsmth @bedast FWIW, I run whisper-net locally for exactly that, and the accuracy of the transcription for my own voice is *vastly* better than anything else. Obviously, I do spend a little time correcting it.

My (own) annoyance is that it still uses an unethically trained model. I hope that Mozilla's Common Voice project can replace that soon.

@derickr @grvsmth @bedast I've heard of whisper but I've never used it, good to know!
Local Whispers — Derick Rethans

@grvsmth @howisyourdog @bedast always better to have automated captions and/or automated translations than understanding nothing at all.

There is also basically no drawback from shipping this with VLC, while shipping it separately as a plugin is an additional hurdle.

@vekkq @howisyourdog @bedast These systems routinely produce both factually incorrect text and nonsense.

Claiming "always better to have automated captions and/or automated translations" without further justification is just a power move.

I'm baffled that these developers can't envision a situation where an incorrect transcription has worse consequences than no transcription at all. Or one where the availability of crappy automated text justifies a decision not to provide quality text.

@vekkq @howisyourdog @bedast And if I say transcription should not be bundled with VLC, I obviously think there are drawbacks. Stating "there is also basically no drawback" is, again, a power move, not a real discussion.

What are the drawbacks?

1. It's an endorsement of technology that is not ready for prime time.
2. It locks users of VLC into a specific implementation of transcription, when there are lots of ways to do transcription, currently and in development.

@grvsmth @howisyourdog @bedast
1. machine learning has been used since the 80' and has helped in many fields.
2. there is no lock in. the existing choices don't go away, but it makes the choice made by the VLC devs easier to use.

@vekkq @howisyourdog @bedast 1. I worked in machine learning for years. I stopped because I saw firsthand how developers and investors were treating it like a magic spell that always works.

2. One of the things I've always appreciated about VLC is how its developers have avoided trying to make it do too much and be too many things.

It's a video player. If I want something else, I'll use that.

@grvsmth @howisyourdog @bedast 2. it wasn't just a video player at its first day. The name should have given it away, that the project's original purpose was to transmit video over local network and display it. This required encoding and networking additional to decoding, which made it way more than just a video player.
Early on VLC was regarded as the player that can just play everything, doing more than what other players could at that time. This was also likely the reason for its rapid adoption. Considering how massive the project is to provide this, a module for automatic caption would be a small part.
@NotAlexNoyle @vekkq @howisyourdog @bedast Seriously? You have to know that the error rate is nowhere near the same.
@grvsmth @vekkq @howisyourdog @bedast very soon the AI will be better
@grvsmth @vekkq @howisyourdog @bedast if you told me about ChatGPT and DALLE 10 years ago I would probably have that same reaction. But the technology continues to advance whether or not we believe or expect it
@NotAlexNoyle @vekkq @howisyourdog @bedast I've had exactly that reaction to ChatGPT and DALL-E. They have a big wow factor, but they do nothing to advance the goal we're supposed to be using these things for.

@howisyourdog I dunno about the on-device translation, but Mozilla has also been messing with LLMs, staring with the AI sidebar (which could've been just a regular web panel) and the Orbit summariser extension, which is why people have gotten angry (alongside the "privacy preserving" tracking ad-tech).

@bedast

@Flaky @bedast that's good to know. I'll avoid the LLM stuff and try and disable it in about config if it becomes mandatory

@howisyourdog ATM it's not, but you might also want to disable the "privacy preserving advertising" stuff if you don't want Mozilla to track you. Unless you disabled Mozilla telemetry outright, in which case the adtech gets disabled too.

@bedast

@Flaky @bedast yeah, I turn that off straight away 😀

@bedast

I will stick to Open Subtitles as it is more reliable, & will provide better accuracy for slang and other contextual factors

there is no #Enshitification of #AI, when AI is shit to begin with

@brentpruitt This is a gross hot take built on gross ignorance. And if you think this makes me an AI apologist, you haven’t seen any of my prior posts about AI.

@bedast

no, i just find the phrase ‘enshitification of AI’ to be paradoxical / funny

@bedast the worry I do have regarding this feature is it’s will provide an excuse to some (and that will grow over time) to stop investimg into producing quality captioning. Why spending money/ressources when there is an IA who will generate some [crappy, or just basic one, if not errornous] captions, automatically.

I beleive on the long run, thats will be an innevitable drop on the quality, in exchange of availability.

Damn if you do, damn if you don’t, as they say.

@xavsworld @bedast This right here is exactly the problem. We are already seeing this happening with image descriptions. Many people don't want to write descriptions, they don't care enough about accessibility. But they will be yelled at if they don't provide any. Therefore they use "AI" to generate them.

These "AI"-generated ones miss the point, or outright hallucinate about the contents of the image. They're often worse than no description at all.

And that's what happens to subtitles, too.

@scy
I find it quite interestening what Ai sees in my pics and what I do not see or did not intend. In the end with some correction alt txts work ok and sorry to say but it saves time and helps to get rid of tedious work. Since it’s not a creative process I don’t mind to hand it over to a dirty machine.
@petpet The thing is, most people don't make these corrections you're talking about. Most people don't even read the alt texts they've just generated before posting them.

@bedast

Maybe this needs to be called "voice recognition" instead of AI?

Using a term that nowadays means something awful is going to make misunderstandings more likely?

(When I read the news about VLC using AI I wrongly assumed it meant generative AI, as that has totally dominated discourse.)

@FediThing @bedast One of GenAI's well poisoning aspects has been tarnishing the term "AI". It has lost its meaning now.
@SamiMaatta @FediThing @bedast In the case of automatic transcription, it’s using machine-learning models, which are similar enough to LLMs that it muddies the water, as far as terminology goes.

@ramsey @SamiMaatta @bedast

Whatever it's called, perhaps it needs to get across the ethics of its technology if it wants to avoid misunderstandings?

If it's using massive amounts of energy and/or stolen data for training, then it's probably unethical.

If it's using reasonable amounts of energy and hasn't stolen any data, then it might be ethical.

(I think? Just a layperson here, might be a lot of stuff I'm missing...)

@SamiMaatta @FediThing @bedast And so every developer or group with a sense of marketing should have started avoiding the word for like a year now.
@SamiMaatta @FediThing @bedast Isn't VLC doing the same, though? It doesn't seem like there's much "intelligence" going on when all you're doing is voice recognition, so why refer to it as a form of "intelligence"? Maybe everyone should stop trying to create hype by calling any new feature "AI".
@SamiMaatta @FediThing @bedast the term "AI" never HAD any real meaning; it has always been a marketing buzzword used to bedazzle people who don't know better. reading on the history of "AI" can be enlightening.
@SamiMaatta @FediThing @bedast More generally, one part of the problem is that anything based on Deep Learning is just called 'AI' for more than 10 years. It is not false, but it is too broad. It is like telling 'vehicle' each time you want to say 'bicycle'.