@crankylinuxuser @dzwiedziu @scottjenson @tael
First: such systems do exist. Many Blind users use them regularly. They are quite sophisticated, but current architectures have inherent limits including inescapable biases and predictable patterns of failure.
There's a fine line here: I'm not opposed to use by Blind users for their own purposes, even if I'll sometimes feel warranted to warn about the shortcomings that the boosters always minimize. They've got enough to deal with in this world and it's absolutely not my place to criticize their choices on any grounds (I've seen Blind users on here who do have ethical objections they raise with other Blind users, but that's not my conversation to jump into).
But I am opposed to their use by those who can write alt text themselves: you're effectively offloading the systemic risks into the disabled people you're ostensibly trying to serve, without giving them a say in the matter and often without any warning.
Some would say: isn't some alt text better than none even if it's poor quality?
The answer is: no it's not. Imagine the following scenario: user posts a picture of a group of Black people at a concert. Comments "Having fun at the club." They use AI captions and they were too tired to double-check this time. AI-generated caption reads "A group of gorillas dancing in a club." (Mis-labeling Black people as gorillas actually happened with an image-labeling system at one point already, so this is absolutely possible). Now a blind user who can only read the caption thinks this is a joke. They comment in reply "Haha how did they train those gorillas to dance?"
So your choice to offload your alt text work to an AI ends up creating a racist incident *and* it makes the Blind user seem like a completely aggressive and unapologetic racist, since sighted users mostly won't see the alt text. Because we know ahead of time that the AI will make exactly these lines of mistakes (and that we can't possibly be vigilant enough to catch them all), it's irresponsible to use it in this way, on top of all the orthogonal reasons that it's ethically wrong to use most modern LLM systems.