@
Woochancho @
Diego MartÃnez (Kaeza) 🇺🇾 @
🅰🅻🅸🅲🅴 (🌈🦄) Especially whenever humans have advantages over LLMs.
When I describe my own original images, I have two advantages.
One, I know much more about the contents of the image than any AI. That's because my original images always show something from extremely obscure 3-D virtual worlds. On top of that, I may add some extra insider knowledge or explain pop-cultural references in the long description in the post if it helps understand the image and its descriptions.
Two, the LLM can only look at the image with its limited resolution. That's all it has. In contrast, when I describe my images, I don't just look at the images. I look at the real deal in-world with a nearly infinite resolution.
For example, an LLM can only generate a description from a picture of a virtual building. But when I describe it, my avatar is in-world, standing right in front of the building whose picture I'm describing. I can move the avatar around, I can move the camera around, I can zoom in on anything. I can correctly identify that four-pixel blob as a strawberry cocktail wheras the LLM doesn't even notice it's there.
I've actually done two tests using LLaVA. I've fed it two images I had described myself previously to see what happens. It was abysmal. LLaVA hallucinated, it interpreted stuff wrongly and so forth, not to mention that LLaVA's description, even after being prompted to write a detailed description, wasn't nearly as detailed as mine.
In one image, there's an OpenSimWorld beacon placed rather prominently in the scenery. LLaVA completely ignored it. I described what it looks like in about 1,000 characters, and then I explained what it is, what OpenSimWorld is and how it works in another 4,000 characters or so.
It's an illusion that AI will soon catch up with any of this.
Oh, by the way: How is an AI supposed to pinpoint exactly where an image was made if the image shows a place of which multiple absolutely identical copies exist? Or if the image has a neutral background that doesn't even hint at where it was made? I can do that with no problem because I remember where I've made the image.
#
Long #
LongPost #
CWLong #
CWLongPost #
AltText #
AltTextMeta #
CWAltTextMeta #
ImageDescription #
ImageDescriptions #
ImageDescriptionMeta #
CWImageDescriptionMeta #
AI #
LLaVA #
AIVsHuman #
HumanVsAI