I built possibly the earliest same-tab tool for getting AI image descriptions in Aug 2022, using a then newly updated Microsoft AI API. I listened to Disabled folks about it. I've payed close attention to the subject since.

My criticisms of @mozilla plans for Firefox built in AI alt text generation aren't because of knee-jerk AI hatred or some ignorance. They are because I understand this to a breadth and depth extremely rare and I want acknowledgement of major issues being ignored 🧵…

Boost? 💜

This issue is complex in several ways contributing to conflict here. Those saying objectors must not understand things are themselves not understanding the roots of objections.

The text of @mozilla announcements on social media talked only about AI generated descriptions accessible via screen reader. This is commonly asked for by those needing alt text, and the above tool was built for exactly that reason: it can be useful to fetch an AI description of an image when unable to see it.

🧵…

The post image was a UI for use when writing alt text, and the linked blog confirmed that this was on the roadmap.

Usage patterns by those writing alt text can be readily predicted, because these tools already exist and are fairly widely used. Current patterns of usage often show a lack of any care to edit or even review generated descriptions before posting images with them as alt text to social media.

Ignored objections from alt text dependent folks call out these descriptions as poor.

🧵…

The detail that this is specifically about social media posts is important, and it is not widely understood that the description needs and process are substantially different from other page types. I think based on existing image post patterns that Firefox AI description write usage would primarily be on social media.

There post context is crucial. The goal is equal access to a post and that context informs what in the image is important to describe. Sometimes those aspects are abstract.

🧵…

Images posted to social media are very diverse in a number of ways. It is evidenced by the images used to test @mozilla description AI being, as far as I could tell, entirely stock art that this diversity is not being accounted for. Test images are often ones unlikely to be posted to social media as is

The descriptions being generated are also assessed based on needs for posting images in places other than social media

The result: assessing project progress does not consider likely usage

🧵…

Criticism of this @mozilla image description AI and others is frequently met with impressive demos, but this repeats a frequent AI advocate fallacy: AI is often impressive but objections are that it is not reliable in many surprising ways. That concern can not be addressed with just a few examples.

Other remote AIs offering more advanced description produce outputs which are substantially more detailed and advanced, but they fail in shared and unique ways. They also chug power and water.

🧵…

There are still more details, but I'll leave it here.

I believe tools for those writing alt text which include image description AI could contribute to web accessibility.

To actually contribute they need to include and deeply understand the needs of Visually Impaired people relying on alt text. Current tools seem to largely prioritize the time and effort of those writing alt text over equal access to the web.

Questions or comments? I am happy to chat here or via email: [email protected]

💜

Important addition: I misunderstood @mozilla public plans here, and have issued a partial retraction: https://social.alt-text.org/@hannah/112570794161740101
Hannah Kolbeck 🏳️‍⚧️ (@[email protected])

Important partial retraction: I thank @[email protected] for the correction on the @[email protected] image description AI plans criticized by me in these last few days. I and many others, most of whom I saw celebrating, believed that those plans included providing such tools to all users writing alt text in the web browser. They don't. I retract my criticism of Mozilla, because they are already doing what I was asking. I maintain my criticism of alt writer AI tools, and I think Jamie has validated my concerns.

Mastodon
@hannah I think this feature is a crucial one for visually impaired people, and has almost no use to all others, and it's not clear if it can be disabled and not downloaded, and it has its weight in bytes and perhaps in energy use too (I don't know)
For non impaired people sometimes alt text is useful if there's no context (I want to know what movie's scene it is, what lake it is in the background), but for that perhaps AI is not the best tool
@hannah images can have simply a decorative function: in this case an alt text could be not only useless, but a waste of time. Or they can be significant: they can even change the meaning or context of the text accompanying it, in this case I don't know if AI can catch this

@filobus @hannah The most useful tool for me when writing Alt text is OCR, and I've come across a good extension: https://webextension.org/listing/ocr.html

Of course, although its recognition rate is really high, it still wants proofreading! It saves a lot of time typing though.

For anything that's not text, only the person posting it will know what they want to express with their image.

OCR Image Reader

The 'OCR - Image Reader' extension is designed to simplify optical character recognition (OCR) processing within your browser. After installation, the extension adds a new button to the toolbar area of your browser. When you press this button, the current window switches to the selection mode, allowing you to select a region on the current page. The extension captures the image of the area, and the internal engine, Tesseract.js, extracts the content for you. This engine is a JavaScript-based OCR that supports over 100 languages. When performing a new job, the extension displays the progress of the OCR extraction in a popup window. If you have multiple jobs, you will get equivalent floating windows. It's worth noting that the extension fetches the appropriate language database from the server once. In future usage, the cached database is used.

@filobus @hannah Automated tools are consumer's tools. For producers, they can be used assistively, but never without proofreading.

Same goes for automated translation, just like image description AIs, it can go wrong and have poor grammar too.

@gunchleoc @filobus when I added the OCR cmd to @AltTextCrew@☠️ I did a lot of research into OCR services. Google's advanced OCR API was most powerful.

It doesn't need to know the language to be extracted, instead you give it an image and tells you the language, with support for different languages in the same image and location data on individual letters.

Proof of concept tool taking an image of a text table and OCRing into an HTML <table> using letter position: https://2d-ocr.glitch.me

2D OCR

@hannah I've been working hard to help my publishing team learn to write alt text based on context. As far as I'm concerned they understand the context better than anyone else ever will, because they are picking the images!

The ease with which folks are ready to go all in on AI alt text generation generally concerns me, especially on the publishing and server side—I don't want to bake these bad alternative texts into the product.

Mozilla's client side approach feels like a better option than nothing, but it also feels worrying that some folks might read announcements like this and mistakenly believe they no longer need to write alt text.

Will be watching closely...
@hannah The "it doesn't understand context" problem is also why translators are saying that AI can't do translations either, for purposes other than "give me the gist of this text". As with alt text, it's all about the content producer, not the reader. Provided it's fast, "cheap" (we ignore the cost to the planet, right?) and looks like it means something, that's all that matters.
@hannah Thanks for the thoughtful commentary here. By way of introduction, I'm totally blind and work for Mozilla on the accessibility team. My team doesn't own this project, though we have been consulted since its beginnings. I'll leave it to others to speak about future plans. However, I wanted to clarify a couple of points in case that's helpful.
1. At this stage, as a pilot project, the feature is only being used to provide a starting point for alt text for images that users add to PDF documents using Firefox's PDF reader.
2. We are aware of the harms inaccurate descriptions can cause. We have attempted to mitigate that by prefixing the text with "Generated by AI". This way, the author has to explicitly remove that text in order to provide some assurance that they have verified the accuracy of the description. If they don't, that prefix will remain so that when consumers read the alt text, they will be aware it is AI generated and can adjust their expectations accordingly; e.g. they may choose to be cautious in trusting it.
3. When adding images, many users will simply choose not to provide alt text at all. Our hope is that this tool will raise awareness about alt text and provide a prompt or starting point, with the risk of inaccuracy mitigated to some extent as discussed above.
4. Beyond this, we may experiment with using this to provide alt text to screen readers in Firefox where a web page does not provide alt text. But even here, the text will clearly indicate that it is AI generated to alert the user as to potential inaccuracy.

@jcsteh thanks so much for the thoughtful reply, I'm glad to make these connections.

My concerns are very specifically this tool being easily used by folks adding alt text to images in general web interactions. While I know that would be a long way off, it is the focus of celebratory discussion I've seen. It is based on a few things:

(cont)

@jcsteh (cont)

- It will primarily be used for social media
- The descriptions it generates are targeted at PDF images, with radically different description needs
- When I talk to Visually Impaired folks on social media interacting with AI usage they have unanimously said frequent abuse created alt text that was often basically worthless, the AI being used an LLM of some sort.
- social media alt text writing tools require immense care to constrain abuse

(cont)

@jcsteh (cont)

That last point is important. I foresee a "this is V1, we can go from here" gradual improvement approach, but I believe the complexity gap between this usage iteration and the usage method minimum needed for responsible use in writing social media alt text is large in complex ways, and later making the jump with the feature already live would be very resisted.

I believe releasing any tool to web writers needs immense care prior to design, see none planned, and alarms ignored.

@hannah FWIW, I entirely agree with your last point here. If there's any way we've possibly miscommunicated about our intentions here, I'd very much like to know about it so I can look into it. But from my perspective, there are no alarms we could ignore here because we have no intent at this stage to deploy this in the way you're suggesting. Unless your concerns extend to how we're using this in the PDF editor or for use by screen reader users in future, in which case I'd love to learn more about those too.
@hannah If we did intend to deploy this in a situation that might be used by social media authors, we would very clearly need to address concerns such as the ones you've raised. Even if we had plans to deploy the tool for that purpose as is, I'd note that some of this is mitigated by our clear prefixing of the image description, and I'd argue folks that would choose not to validate that description would be unlikely to bother removing that prefix; I'm not sure what that would gain them. But that's a supposition not backed up by evidence and that's something we'd need to study carefully if we were to go down such a path.
@hannah When the image description is prefixed, it basically serves as a fallback for a user when they do not have a better tool to describe the image and the author has failed to provide a better description. It becomes equivalent to any other tool a consumer would use to "guess" at an image description, with all the potential inaccuracy that entails. But the key point is that the user understands that the description might be potentially inaccurate because of the prefix.
@jcsteh I think that having such descriptions easily fetchable via screen readers is quite valuable and making it fairly universal would take relatively little effort and be low risk. Should that be in progress, I see using an equivalently generated description in the page alt text as adding nothing while likely to be abused based on my extensive experience with similar till usage on social media.
@hannah Putting aside the question of amount of effort (doing this on-device across a plethora of hardware incurs some additional challenges), this assumes that everyone has access to this. As much as I wish everyone used Firefox :), not everyone does. Some other solutions use cloud services, and while that has its place, it is not desirable for some users and use cases. For those users, it does potentially add "another tool in the toolbox". That said, I don't want to get too far into the weeds on this point because its use by authors beyond PDF is not something we have any intent to pursue at this stage.

@jcsteh I am glad to hear that other use is not being pursued right now and hope that this functionality is never provided to general browser users, rather than just by user-selected extension, until that tool's results can be assessed by Visually Impaired folks for images representing the spread likely to have the tool used for them

I believe the only current AI capable of good such use is Google's, and via a small specialized API. I'm working on a proof of concept tool to demo some complexity

@hannah As far as I'm aware, no one at Mozilla has communicated any intent to deploy this beyond inserted PDF images (again, noting that we do prefix the text) or "in general browsing for users with screen readers"; i.e. to provide a "best guess" at a description for an image where the author provided none. Mozilla is a sprawling organisation, though :), so please correct me if you've seen otherwise... or please let me know if you think something in our blog post, etc. might lead to this assumption.

@jcsteh I recall seeing a direct mention that providing it to those writing alt text on sites was planned, but not finding it now I'll make no claims as to its author being official.

I have seen a pretty universal perception that this feature will be offered thus in the future, and believe that it is irresponsible to not address that celebration with some acknowledgement that doing so takes immense care.

@hannah That's fair. I definitely don't want our intentions to be misconstrued here or for folks to assume we're going to do anything other than move with the utmost care. I will say that I've not personally seen any of the misperception you've mentioned, but I will see what I can do internally to ensure our values are more clearly communicated here.

@jcsteh that is so appreciated. I had a major brain surgery a week ago and have been spending a lot of energy on this because I had the perception that this was on the road map and saw little understanding of potential issues or willingness to listen when I tried to explain those issues or when folks relying on alt text raised alarms.

I should really sleep, but I'm happy to chat more later and if anyone internal would like to speak I can be reached at [email protected]

@hannah I've sent some follow-up internally and will keep tracking it as I can. For now, I too should sleep. Please take care of yourself and rest well.

@jcsteh unrelated but I also wanted to link a discussion of a project under the Mozilla umbrella that I think is worth drawing your attention to if you're willing and able to share it where you think could be useful.

Doing new to me frontend work I used the popular https://developer.mozilla.org HTML and CSS docs and noticed a pattern that could impact accessibility results. The below discussion came to an agreed upon high level plan, but cancer has made me unable to enact it.

https://github.com/orgs/mdn/discussions/430

MDN Web Docs

The MDN Web Docs site provides information about Open Web technologies including HTML, CSS, and APIs for both Web sites and progressive web apps.

MDN Web Docs

@hannah Thank you. I've been following this from afar, and your explanation makes sense.

As a sited reader I often enjoy people's alt text descriptions as a kind of Eater Egg, or hint as to the image relevance to them.

I've wondered if UI for alt text writers could help here, and if AI could be a part of this.

I'm imagining that when an image is added to a post it would immediately have a visible default description in an editable textfield.

My hypothesis is that a well constructed UI could strongly invite a poster to edit the generated text and replace or improve on it.

This is a testable proposal, I guess – A given UI could be evaluated for how effectively the poster is prompted to improve the description.

But – it's not something I'm going to be building as it's miles from my experience!

@benjohn Based on my experience w/ populations of social media users using AI for alt text, simply prompting for edits is unlikely to help much for a number of reasons. I believe there is some low hanging fruit possible examples of what more responsible to provide tools might look like and am working on a proof of concept one, we'll see if I can continue

Unfortunately, they require advanced and remote AI to work. I believe Google's advanced specialized API for this is the only possible one ATM