I built possibly the earliest same-tab tool for getting AI image descriptions in Aug 2022, using a then newly updated Microsoft AI API. I listened to Disabled folks about it. I've payed close attention to the subject since.

My criticisms of @mozilla plans for Firefox built in AI alt text generation aren't because of knee-jerk AI hatred or some ignorance. They are because I understand this to a breadth and depth extremely rare and I want acknowledgement of major issues being ignored 🧵…

Boost? 💜

This issue is complex in several ways contributing to conflict here. Those saying objectors must not understand things are themselves not understanding the roots of objections.

The text of @mozilla announcements on social media talked only about AI generated descriptions accessible via screen reader. This is commonly asked for by those needing alt text, and the above tool was built for exactly that reason: it can be useful to fetch an AI description of an image when unable to see it.

🧵…

The post image was a UI for use when writing alt text, and the linked blog confirmed that this was on the roadmap.

Usage patterns by those writing alt text can be readily predicted, because these tools already exist and are fairly widely used. Current patterns of usage often show a lack of any care to edit or even review generated descriptions before posting images with them as alt text to social media.

Ignored objections from alt text dependent folks call out these descriptions as poor.

🧵…

The detail that this is specifically about social media posts is important, and it is not widely understood that the description needs and process are substantially different from other page types. I think based on existing image post patterns that Firefox AI description write usage would primarily be on social media.

There post context is crucial. The goal is equal access to a post and that context informs what in the image is important to describe. Sometimes those aspects are abstract.

🧵…

Images posted to social media are very diverse in a number of ways. It is evidenced by the images used to test @mozilla description AI being, as far as I could tell, entirely stock art that this diversity is not being accounted for. Test images are often ones unlikely to be posted to social media as is

The descriptions being generated are also assessed based on needs for posting images in places other than social media

The result: assessing project progress does not consider likely usage

🧵…

Criticism of this @mozilla image description AI and others is frequently met with impressive demos, but this repeats a frequent AI advocate fallacy: AI is often impressive but objections are that it is not reliable in many surprising ways. That concern can not be addressed with just a few examples.

Other remote AIs offering more advanced description produce outputs which are substantially more detailed and advanced, but they fail in shared and unique ways. They also chug power and water.

🧵…

There are still more details, but I'll leave it here.

I believe tools for those writing alt text which include image description AI could contribute to web accessibility.

To actually contribute they need to include and deeply understand the needs of Visually Impaired people relying on alt text. Current tools seem to largely prioritize the time and effort of those writing alt text over equal access to the web.

Questions or comments? I am happy to chat here or via email: [email protected]

💜

@hannah Thanks for the thoughtful commentary here. By way of introduction, I'm totally blind and work for Mozilla on the accessibility team. My team doesn't own this project, though we have been consulted since its beginnings. I'll leave it to others to speak about future plans. However, I wanted to clarify a couple of points in case that's helpful.
1. At this stage, as a pilot project, the feature is only being used to provide a starting point for alt text for images that users add to PDF documents using Firefox's PDF reader.
2. We are aware of the harms inaccurate descriptions can cause. We have attempted to mitigate that by prefixing the text with "Generated by AI". This way, the author has to explicitly remove that text in order to provide some assurance that they have verified the accuracy of the description. If they don't, that prefix will remain so that when consumers read the alt text, they will be aware it is AI generated and can adjust their expectations accordingly; e.g. they may choose to be cautious in trusting it.
3. When adding images, many users will simply choose not to provide alt text at all. Our hope is that this tool will raise awareness about alt text and provide a prompt or starting point, with the risk of inaccuracy mitigated to some extent as discussed above.
4. Beyond this, we may experiment with using this to provide alt text to screen readers in Firefox where a web page does not provide alt text. But even here, the text will clearly indicate that it is AI generated to alert the user as to potential inaccuracy.

@jcsteh thanks so much for the thoughtful reply, I'm glad to make these connections.

My concerns are very specifically this tool being easily used by folks adding alt text to images in general web interactions. While I know that would be a long way off, it is the focus of celebratory discussion I've seen. It is based on a few things:

(cont)

@jcsteh (cont)

- It will primarily be used for social media
- The descriptions it generates are targeted at PDF images, with radically different description needs
- When I talk to Visually Impaired folks on social media interacting with AI usage they have unanimously said frequent abuse created alt text that was often basically worthless, the AI being used an LLM of some sort.
- social media alt text writing tools require immense care to constrain abuse

(cont)

@hannah If we did intend to deploy this in a situation that might be used by social media authors, we would very clearly need to address concerns such as the ones you've raised. Even if we had plans to deploy the tool for that purpose as is, I'd note that some of this is mitigated by our clear prefixing of the image description, and I'd argue folks that would choose not to validate that description would be unlikely to bother removing that prefix; I'm not sure what that would gain them. But that's a supposition not backed up by evidence and that's something we'd need to study carefully if we were to go down such a path.
@hannah When the image description is prefixed, it basically serves as a fallback for a user when they do not have a better tool to describe the image and the author has failed to provide a better description. It becomes equivalent to any other tool a consumer would use to "guess" at an image description, with all the potential inaccuracy that entails. But the key point is that the user understands that the description might be potentially inaccurate because of the prefix.
@jcsteh I think that having such descriptions easily fetchable via screen readers is quite valuable and making it fairly universal would take relatively little effort and be low risk. Should that be in progress, I see using an equivalently generated description in the page alt text as adding nothing while likely to be abused based on my extensive experience with similar till usage on social media.
@hannah Putting aside the question of amount of effort (doing this on-device across a plethora of hardware incurs some additional challenges), this assumes that everyone has access to this. As much as I wish everyone used Firefox :), not everyone does. Some other solutions use cloud services, and while that has its place, it is not desirable for some users and use cases. For those users, it does potentially add "another tool in the toolbox". That said, I don't want to get too far into the weeds on this point because its use by authors beyond PDF is not something we have any intent to pursue at this stage.

@jcsteh I am glad to hear that other use is not being pursued right now and hope that this functionality is never provided to general browser users, rather than just by user-selected extension, until that tool's results can be assessed by Visually Impaired folks for images representing the spread likely to have the tool used for them

I believe the only current AI capable of good such use is Google's, and via a small specialized API. I'm working on a proof of concept tool to demo some complexity