There are a few problems with using LLMs for image descriptions. For one, they're optimized to generate text while ignoring uncertainty. They'll spew text at the suggestion of an image.
There are a few problems with using LLMs for image descriptions. For one, they're optimized to generate text while ignoring uncertainty. They'll spew text at the suggestion of an image.