Mastodawn

I don't know who needs to hear this, but if you're yelling at a person on Fedi for not using #AltText, stop. Please.

Should you use alt-text? Yes!

Should you boost posts without alt-text? No!

Should you yell at companies and bots who don't use alt-text? Fuck yeah!

Should you try your best to make Fedi a more accessible, and safe place for everyone—including those who have trouble writing alt-text? Of course.

Should you mention @altbot or #Alt4Me *occasionally* so folx know they're both wonderful tools that can help make Fedi a more inclusive place? It would be impolite not to.

But—and let me make this clear—don't be a dick to folx who forget alt-text, or who have trouble writing it for whatever reason; it's not very punk of you.

Show thread

Diego Martínez (Kaeza) 🇺🇾1d ago

@alice @altbot

I'm reluctant to use an AI bot, but I guess it could be used by someone.

Of course, human written descriptions are always better.

In any case, boosted!

Cc: @muchanchoasado

Show thread

🅰🅻🅸🅲🅴 (🌈🦄)1d ago

@diegomartinez I agree. Human-written alt-text will always be better, if for no other reason than it was written by a person (though there are plenty of reasons it's better).

@muchanchoasado

Show thread

Woochancho

@alice @diegomartinez Definitely. Bots are functional, but a human description is way better.

Show thread

Jupiter Rowland 1d ago

@Woochancho @Diego Martínez (Kaeza) 🇺🇾 @🅰🅻🅸🅲🅴 (🌈🦄) Especially whenever humans have advantages over LLMs.

When I describe my own original images, I have two advantages.

One, I know much more about the contents of the image than any AI. That's because my original images always show something from extremely obscure 3-D virtual worlds. On top of that, I may add some extra insider knowledge or explain pop-cultural references in the long description in the post if it helps understand the image and its descriptions.

Two, the LLM can only look at the image with its limited resolution. That's all it has. In contrast, when I describe my images, I don't just look at the images. I look at the real deal in-world with a nearly infinite resolution.

For example, an LLM can only generate a description from a picture of a virtual building. But when I describe it, my avatar is in-world, standing right in front of the building whose picture I'm describing. I can move the avatar around, I can move the camera around, I can zoom in on anything. I can correctly identify that four-pixel blob as a strawberry cocktail wheras the LLM doesn't even notice it's there.

I've actually done two tests using LLaVA. I've fed it two images I had described myself previously to see what happens. It was abysmal. LLaVA hallucinated, it interpreted stuff wrongly and so forth, not to mention that LLaVA's description, even after being prompted to write a detailed description, wasn't nearly as detailed as mine.

In one image, there's an OpenSimWorld beacon placed rather prominently in the scenery. LLaVA completely ignored it. I described what it looks like in about 1,000 characters, and then I explained what it is, what OpenSimWorld is and how it works in another 4,000 characters or so.

It's an illusion that AI will soon catch up with any of this.

Oh, by the way: How is an AI supposed to pinpoint exactly where an image was made if the image shows a place of which multiple absolutely identical copies exist? Or if the image has a neutral background that doesn't even hint at where it was made? I can do that with no problem because I remember where I've made the image.

#Long #LongPost #CWLong #CWLongPost #AltText #AltTextMeta #CWAltTextMeta #ImageDescription #ImageDescriptions #ImageDescriptionMeta #CWImageDescriptionMeta #AI #LLaVA #AIVsHuman #HumanVsAI

Netzgemeinde/Hubzilla