Mastodawn

Jupiter Rowland Jun 28, 2025

@Chris Another good question is what the Fediverse (Mastodon specifically) expects in an alt-text for a video. A summary which should probably go outside the video? Or a visual description of what's shown in the video, just like an alt-text for an image, but for moving and constantly changing visuals and maybe even time-coded?

#AltText #AltTextMeta #CWAltTextMeta #VideoDescription #VideoDescriptions #MediaDescription #MediaDescriptions

Netzgemeinde/Hubzilla

Show thread

Jupiter Rowland Mar 27, 2025

@iFixit and it doesn't look like you can attach documents to posts
You can't on Mastodon. I could, both here on Hubzilla and on (streams) where I post my images.

But I wouldn't have to. Vanilla Mastodon has a character limit of 500. Hubzilla has a character "limit" that's so staggeringly high that nobody knows how high it is because it doesn't matter. (streams), from the same creator and the same software family as Hubzilla, has a character "limit" of over 24,000,000 which is not an arbitrary design decision but simply the size of the database field.

By the way: Both are in the Fediverse, and both are federated with Mastodon, so Mastodon's "all media must have accurate and sufficiently detailed descriptions" rule applies there as well unless you don't care if thousands upon thousands of Mastodon users block you for not supplying image and media descriptions.

In theory, I could publish a video of ten minutes, and in the same post, I could add a full, timestamped description that takes several hours to read. Verbatim transcript of all spoken words. Detailed description of the visuals where "detailed" means "as detailed as Mastodon loves its alt-texts" as in "800 characters of alt-text or more for a close-up of a single flower in front of a blurry background" detailed. Detailed description of all camera movements and cuts. Description of non-spoken-word noises. All timestamped, probably with over a hundred timestamps for the whole description of ten minutes of video.

Now I'm wondering if that could be helpful or actually required, or if it's overkill and actually a hindrance.

CC: @masukomi @GunChleoc

#Long #LongPost #CWLong #CWLongPost #FediMeta #FediverseMeta #CWFediMeta #CWFediverseMeta #Mastodon #Hubzilla #Streams #(streams) #AltText #AltTextMeta #CWAltTextMeta #ImageDescription #ImageDescriptions #ImageDescriptionMeta #CWImageDescriptionMeta #MediaDescription #MediaDescriptions

Hubzilla - Join the Fediverse

Show thread

Jupiter Rowland Mar 26, 2025

@masukomi @iFixit And this is only mostly a transcript of the spoken words.

What if someone actually took upon themselves the effort to describe a video with a timestamped/timecoded combination of visual description, spoken word transcript and non-spoken word audio description? Especially if the visual description is on the same high level of detail that's expected in the Fediverse?

CC: @GunChleoc

#FediMeta #FediverseMeta #CWFediMeta #CWFediverseMeta #MediaDescription #MediaDescriptions

Netzgemeinde/Hubzilla

Jupiter Rowland Dec 8, 2024

@Christopher M0YNG Just out of curiosity: What would be an appropriate textual description for a video?

A description of what the video is about?

Or a detailed, time-coded description of the actual visuals throughout the whole video plus a detailed, time-coded transcript of the audio in the video?

If the latter, what details are required, regardless of topic and content?

CC: @Stefan Bohacek

#VideoDescription #VideoDescriptions #MediaDescription #MediaDescriptions

Netzgemeinde/Hubzilla

Show thread

PaulaToThePeople 😷Nov 30, 2024

Third and last report about media description statistics on this server.

Since the last time 58% of media posts had decent media descriptions, as far as I'm concerned.

In total I checked 1247 media posts since October 23 and 776 posts (62%) had adequate captions.

#captions #MediaDescriptions #AltText

Jupiter Rowland Aug 11, 2024

I'm thinking about adding at least one of my channels to Trunk. I mean, it isn't like I don't have enough followers; they've risen above 500 again. But Trunk would help people follow me for a better reason than just one cool post or comment, all still without having to figure out how to check my profile.

That said, Trunk requires you to volunteer on at least one list, in at least one topic. That's where things get difficult.

For one, there's Described Media. I'm not even kidding: It's a list for people who describe the media which they post. People who add alt-text to their images. Even though everybody in the Fediverse is expected to do it all the time, at least if their posts reach Mastodon in some way.

I do it. But I don't do it "the standard Mastodon way". For one, Mastodon's limitations, especially the 500-character limit for posts, don't apply to me. I don't have any character limit in my posts. Thus, nothing forces me to describe and even explain an image only in alt-text because I've got plenty of space in my posts.

Besides, my images require absolutely massive image descriptions, especially taking all those typical image description guidelines into consideration. That's because none of them are prepared for the edge-cases that are my images. And with "absolutely massive", I don't mean, "800 characters? Are you nuts?! Who's gonna read that?!?" I mean up to over 60,000 characters, and I can guarantee you this is not a typo. Maybe even more in the future.

I'm not quite convinced that I'm a good example of a provider of media descriptions, partly because by adhering to general image description rules, I break most of Mastodon's image description rules, partly because next to nobody has the patience to read one image description that's longer than 120 toots or have it read to them by a screen reader, partly also because my own image descriptions become obsolete so quickly whenever I discover something new that I should do in image descriptions.

Even if none of this mattered, I don't post images often. Maybe once every couple months. That's because I have to schedule my image posts due to how much time they consume. The 60,000-character description took me two full days to research and write, breakfast to after dinner. And it might become even rarer in the future. I've started a dedicated (streams) channel to be able to post images with sensitive content, including but not limited to eyes and faces. But posting these will eat up the time I could also use to post perfectly safe images on this Hubzilla channel.

The Described Media list is rather for people who routinely whip up 200 characters of alt-text in under a minute or so, but who do so at least daily.

An even more obvious list, at least at first glance, would be 3D Virtual & Augmented Reality, seeing as the primary topic of this channel is OpenSim. In fact, in the long run, I could add two or three channels to this list.

But OpenSim does not fit on it. The list is for actual virtual reality, for new virtual reality and augmented reality developments of the 2020s. "The Metaverse" as envisioned by most. It absolutely requires VR or AR headsets, full stop.

OpenSim has been using the term "metaverse" routinely since as early as 2007, the year of its inception. But the list is not about "metaverse". It's about VR.

And OpenSim is what's commonly called a "pancake". It's made for desktop and laptop computers and their 2-D screens. It does not really work on VR headsets. It does not work on stand-alone VR headsets with integrated graphics hardware at all. That's mainly because VR headsets require a constantly guaranteed frame rate of 60fps. It isn't simplified and cartoonish and geared towards mobile graphics hardware like Horizons or Rec Room or the like. Instead, it's largely photo-realistic, high-detail stuff with high-resolution textures.

You may get 60fps out of a dedicated graphics unit on a not-too-highly-detailed sim when you're alone. But have more than a few avatars around, and your fps will drop below 60. Join a party or any other event with a couple dozen avatars, and you're heading for slideshow-level fps. That's because the avatars aren't made by the OpenSim devs and optimised for high performance. They mostly entirely consist of user-supplied stuff and optimised for good looks. Some two years ago, one average avatar had more vertices than an entire scene in World of Warcraft. They've only gotten much, much more complex since then.

A liquid-cooled 4090Ti overclocked to kingdom come won't give you 60fps at 1080p at OSgrid's Event Plaza on a Friday night. So, what chances does a stand-alone, passively-cooled headset based on phone hardware have if it has to whip up even more pixels? And none of this is even taking recently-introduced Physically-Based Rendering into account which absolutely requires dedicated graphics hardware with no less than 4GB of dedicated VRAM, preferably at least 8GB.

That is, you couldn't use OpenSim on a stand-alone headset anyway. There are only two OpenSim-compatible viewers available right now, they're only available for desktop operating systems, and their highly complex UIs (pull-down menus like you've last seen in Photoshop etc.) are entirely geared towards desktop and laptop computers.

In brief: OpenSim is not VR, and it's unlikely to ever truly become VR.

Okay, I still have the option to ask one of the four Trunk admins to add an extra "Virtual Worlds" list, arguing that OpenSim, just like Second Life, is not VR and thus doesn't fit onto a VR & AR list. But they might argue that it's close enough to VR & AR for a separate list not being justified.

#Long #LongPost #CWLong #CWLongPost #FediMeta #FediverseMeta #CWFediMeta #CWFediverseMeta #MediaDescription #MediaDescriptions #AltText #AltTextMeta #CWAltTextMeta #ImageDescription #ImageDescriptions #ImageDescriptionMeta #CWImageDescriptionMeta #OpenSim #OpenSimulator #Metaverse #VirtualWorlds #VR #VirtualReality #AR #AugmentedReality #Trunk

Trunk for the Fediverse

Jupiter Rowland Jul 1, 2024

@Robert Kingett, blind If at all, I would have to do it all myself with no technical aids involved. Nobody would be able to help me with it.

After all, I'm not talking about videos shot in real life with a video camera, especially not scripted ones.

I'm talking about unscripted, spontaneously produced video captures from very obscure 3-D virtual worlds. In order to describe these videos properly, extensive detail knowledge about this super-niche topic, its technology and its culture would be absolutely mandatory, and it would have to be up-to-date by hours at most. This detail knowledge is also necessary to be able to judge what has to be explained and described.

Also, it'd be impossible to properly describe these videos by watching these videos. They can only be described by logging into that world, teleporting to the place where the video starts and then looking at everything that's shown in the video from up close and sometimes even from different camera angles than in the video itself.

#Long #LongPost #CWLong #CWLongPost #MediaDescription #MediaDescriptions #AudioDescription #AudioDescriptions

Netzgemeinde/Hubzilla

Jupiter Rowland Jun 28, 2024

@Robert Kingett, blind I don't trust anything generated. At least not with super-obscure niche content like what I post.

And audio descriptions in general are why I'll never publish videos in the Fediverse.

I'd have to go into similar detail as for my pictures, only for moving pictures plus sound plus voice-over now. My descriptions would have to be so detailed that the video would have to pause to let the audio description catch up with the visuals. In fact, the video would spend more time paused while the audio description is rambling than actually moving, and it would never spend more than a few seconds moving at a time.

For one, I would have to describe and explain what the video shows at the very same level of detailed as I describe my images. And at least once I've described one single image at such a level of detail that it'd probably take a screen reader one full hour to read the image description aloud.

Besides, I would have take into account that it's a video. Everything would need timestamps. And instead of only describing the camera position and the camera angle, I would have to describe the camera movements like so:

Seven minutes, eighteen point one three seconds. The camera quickly rotates to the left around a vertical axis through a point roughly two point four metres straight ahead of the avatar. It starts rotating from the direction in which the avatar is facing, roughly twelve degrees to the east of north. The barn which has first appeared at five minutes, fifty-two point two eight seconds comes into view again, including all decoration around it. The camera only rotates around this vertical axis and not around any horizontal axis. The avatar does not rotate with the camera.

Seven minutes, eighteen point six four seconds: The video pauses to let this description catch up.

Seven minutes, eighteen point seven one seconds: The video no longer pauses. The camera reaches a rotation angle of roughly twenty degrees to the south of west. The rotation speed of the camera slows down. It continues to rotate to the left.

Seven minutes, eighteen point nine three seconds: The video pauses to let this description catch up.

Seven minutes, nineteen point zero four seconds: The video no longer pauses. The camera stops rotating at an angle of roughly twenty-five degrees to the west of south.

That is, in order to cater to deaf-blind users, I would have to have two time codes. One, the time code of the original video, not taking the pauses into account. Two, the time code of the described video with catch-up pauses.

And the video with catch-up pauses would be dramatically longer than the original video. Ten minutes of video would take me weeks to describe, probably over a month. And it would end up many hours long, depending on how much there is to describe and explain.

So a time code in the Braille description for deaf-blind users might actually read, "Six minutes, thirty-seven point five five seconds in the original video, fourteen hours, three minutes, forty-nine point two one seconds in this described version of the video."

By the way, no, an AI can't do that.

#Long #LongPost #CWLong #CWLongPost #MediaDescription #MediaDescriptions #AudioDescription #AudioDescriptions

Netzgemeinde/Hubzilla

Jupiter Rowland Apr 15, 2024

This sounds like good advice...

Fedi.Tips wrote the following post Mon, 15 Apr 2024 17:28:51 +0100 If you're posting a video clip or an audio clip attached to a post, remember to include a text description which describes the sound. This is important so that the video or audio is accessible to deaf people.

Also, if it's a video, it's important to describe both the sound and the visuals so that it's accessible to everyone.

Text descriptions for audio and video are added just like text descriptions for images (exact steps vary depending on which app you use).

#FediTips #Accessibility

...but in my case, this would go out of hand. So much that I've completely discarded the idea of posting in-world videos.

I'm someone who has taken most of a day to describe three still images in a post in a combined almost 77,000 characters which take over an hour to read. No, you haven't misread any of this. And yes, this effort is necessary in my case.

Of course, if I were to describe a video, I'd have to go as much into details. However, there'd be a whole lot more to describe.

The video would constantly change. It would show much much more than a still image. There'd be audio that'd require detailed description instead of just name-dropping. All of it. Yes, including panning position. Movements of my avatar would have to be described. Movements of the camera around my avatar as well as independently from my avatar would have to be described. All movements would of course require distances, angles, speeds and changes of speed

The description would require a time code: Everything that happens would have to be mentioned including when exactly it happens, and since things might happen quickly or in quick succession, I'm talking about at least tenths of seconds.

Ten minutes of in-world video would take me weeks to describe, and the description would be the length of a novel and take a whole day to read.

Mastodon users would never see the post with the video because, as far as I know, Mastodon automatically rejects all external posts that exceed 100,000 characters, and I'm talking about millions of characters here. I don't even know if Hubzilla would let me post that much, and Hubzilla doesn't have any character limits except for what the Web server can handle.

Nobody would ever read this, so the whole effort would be in vain. But anything less than this would be critically lacking.

#Long #LongPost #CWLong #CWLongPost #FediMeta #FediverseMeta #CWFediMeta #CWFediverseMeta #MediaDescription #MediaDescriptions #VideoDescription #VideoDescriptions #A11y #Accessibility

Fedi.Tips (@[email protected])

7.42K Posts, 4 Following, 232K Followers · Posting tips and answering questions about how to use Mastodon and the Fediverse 🌍 There are lots of guides on my site at https://fedi.tips. If you can't find your answer @ or DM me, there's no such thing as a silly question! I am a volunteer, you can buy me a coffee at https://ko-fi.com/fedithing or become a patron at https://liberapay.com/FediThing Maintained by @[email protected], banner artwork is by @[email protected] 🎂 Account originally joined mstdn.social on 16th Nov 2020

social.growyourown.services

Show thread

Jupiter Rowland Feb 18, 2024

@Andre Louis What kind of description do you have in mind?

Only a brief mention what the audio is?

Or a full verbatim transcript of all words in the audio plus full descriptions of all the other sounds in the audio, time-coded in seconds and milliseconds and, if it's music, additionally in bars, quarters and even shorter notes?

#MediaDescription #MediaDescriptions