## Summary of Strengths of the Paper
It is short.
## Summary of Weaknesses
It is not short enough.
## Summary of Strengths of the Paper
It is short.
## Summary of Weaknesses
It is not short enough.
AI images are getting harder to spot. Google thinks it has a solution. #giftArticle https://wapo.st/3szeZYM
A few months ago the WEF put out surprisingly good recs about #generativeAI. But that leaned a lot on the idea of watermarking sources of data. Now Google has announced the opposite, that it can watermark its own AI outputs. While nice (& no doubt hard) this is nowhere near as important as tracing the data inputs for ensuring not only IP payments, but for issues such as #trustworthiness.
title text: The vaccine stuff seems pretty simple. But if you take a closer look at the data, it's still simple, but bigger. And slightly blurry. Might need reading glasses.
(https://xkcd.com/2806)
(https://www.explainxkcd.com/wiki/index.php/2806)
DISCARDED.
toXic is live now. rip blue birdie.
Can (text) LLMs reason about images, if they get a textual description of them? Yes, sort of!
Says Sherzod Hakimov, in "Images in Language Space: Exploring the Suitability of Large Language Models for Vision & Language Tasks" (ACL Findings)
https://arxiv.org/abs/2305.13782
2/4
Large language models have demonstrated robust performance on various language tasks using zero-shot or few-shot learning paradigms. While being actively researched, multimodal models that can additionally handle images as input have yet to catch up in size and generality with language-only models. In this work, we ask whether language-only models can be utilised for tasks that require visual input -- but also, as we argue, often require a strong reasoning component. Similar to some recent related work, we make visual information accessible to the language model using separate verbalisation models. Specifically, we investigate the performance of open-source, open-access language models against GPT-3 on five vision-language tasks when given textually-encoded visual information. Our results suggest that language models are effective for solving vision-language tasks even with limited samples. This approach also enhances the interpretability of a model's output by providing a means of tracing the output back through the verbalised image content.
WIRED covered the recent partnership between GPT4 and "Be My Eyes", and included a nice discussion with Danna Gurari who has been leading the VizWiz workshop at CVPR for the past 5 years, where we challenge computer vision researchers to work on problems in accessibility -- and, yes, it stems back to the VizWiz paper from almost 13 years ago!
https://www.wired.com/story/ai-gpt4-could-change-how-blind-people-see-the-world/