A quick thread on #AIhype and other issues in yesterday's Gemini release:

#1 -- What an utter lack of transparency. Researchers form multiple groups, including @meg and @timnitGebru when they were at Google, have been calling for clear and thorough documentation of training data & trained models since 2017.

In Bender & Friedman 2018, we put it like this:

/1

In the tech report, there is half a page describing (most the preprocessing of) the data. What a farce:

https://twitter.com/JesseDodge/status/1732444597593203111?s=20

And Google can't even be bothered to cite Drs. @meg and @timnitGebru 's work:

https://twitter.com/_alialkhatib/status/1732425933016179064?t=OdFK1fod5ncLGg5H7W6snQ&s=09

/2

Jesse Dodge (@JesseDodge) on X

Today Google released Gemini with a 60-page report in which they repeatedly say the training data is key ("We find that data quality is critical to a highly-performing model"), while providing almost no information about how it was made, how it was filtered, or its contents.

X (formerly Twitter)

More lacking transparency: they state "Gemini has the most comprehensive safety evaluations of any Google AI model to date, including for bias and toxicity." --- but provide no link to where anyone can inspect the methodology and results of those evaluations.

/3

@emilymbender Google said it planned to release more details when Gemini Ultra arrives in 2024. (Gemini Pro and Nano have already arrived now.)