Mastodawn

Carl T. Bergstrom Nov 16, 2022

Let's take a look. Galactica can generate wikipedia articles, supposedly.

So let's see what they look like. Here's one for Brandolini's law, the principle that bullshit takes another of magnitude less effort create than to clean up.

Left: Galactica's attempt at creating a wikipedia entry
https://galactica.org/?prompt=wiki+article+on+brandolini%27s+law

Right: The actual wikipedia entry
https://en.wikipedia.org/wiki/Brandolini%27s_law

wiki article on brandolini's law - Galactica

Carl T. Bergstrom Nov 16, 2022

Here's the kicker. It's not that Galactica picked the wrong law. It is that the Padua economist to whom Galactica attributes the law, Gianni Brandolini, DOES NOT EXIST.

Galactica's phrasing of the law itself? That does not exist either. No one has ever said that phrase online (rather a surprise, tbh).

Galactica doesn't let us "access and manipulate what we know about the universe." It generates *pure bullshit* — which, incidentally, will be orders of magnitude more difficult to clean up.

Juliet Unpronouncablelastname Nov 17, 2022

Carl T. Bergstrom

UW researcher Robert Wolfe pointed out to me that there is a fundamental category mistake in how #galactica is being pitched.

This is not a machine learning system that is designed to represent scientific facts, models, and the structures that associate them. (There are other research efforts that attempt to do that.) This is a large language model that is designed to produce semantically plausible text using scientific terms and conforming to our expectations for various technical formats.

Carl T. Bergstrom Nov 17, 2022

This is why, when I called it a bullshit generating machine, I was using the term bullshit in its technical sense. Philosopher Harry Frankfurt explained, in On Bullshit, that bullshit is speech intended to be persuasive without concern for the truth. For Frankfurt, the difference between a liar and a bullshitter Is this a liar knows the truth and is trying to lead you elsewhere where is the bullshitter either doesn’t know or doesn’t care wants to sound like they know what they’re talking about.

Carl T. Bergstrom Nov 17, 2022

That’s more or less exactly what a large language model like this does.

It is trained to produce text it seems like it was written by a competent person. In this case #galactica also uses a technical vocabulary, frequent citation, structured argumentation, numbers, etc. to create a veneer of legitimacy—all tools frequently employed in the sort of new-school bullshit that we treat in our book.

it doesn’t care about facts. It has no representation of them beyond their semantic relations.

Prof. Emily M. Bender(she/her)Nov 17, 2022

@ct_bergstrom I think even "semantic relations" is in fact an overstatement. It's all about textual distribution and nothing more.

Carl T. Bergstrom Nov 17, 2022

@emilymbender Thank you. Really hoping we can discuss this tomorrow.

Prof. Emily M. Bender(she/her)Nov 17, 2022

@ct_bergstrom I think we might be hard pressed to talk about anything else!

Douglas King Nov 17, 2022

@ct_bergstrom
@emilymbender

So it's really expensive mad libs?

@emilymbender @ct_bergstrom but textual relations do track semantic relations, no? that’s why LSA models in cognitive science that basically track co-occurrence of text are such good predictors of various semantic similarity tasks/relationships. So not semantic relationships pee se, but the two things aren’t wholly distinct.

Prof. Emily M. Bender(she/her)Nov 17, 2022

@UlrikeHahn @ct_bergstrom

I wrote a paper so I could stop having this argument:

https://aclanthology.org/2020.acl-main.463/

See in particular section 7.

Climbing towards NLU: On Meaning, Form, and Understanding in the Age of Data

Emily M. Bender, Alexander Koller. Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. 2020.

ACL Anthology

@emilymbender @ct_bergstrom aha! will read and ponder - thank you!

ok, I'm back, having gone off and read your paper (in particular section 7) and I don't think it conflicts with what I said and I think that matters for the whole discussion, so I will try and spell this out.
First, I was trying to say (and will say) that LLMs *contain information* about semantic relationships - which is not the same as saying that they have access to fully fledged semantics (hence I said "not semantic relationships per se"). I've always thought 1/

2/ of co-occurrence statistics (and related distributional information) as "footprints of use" not use per se (so I totally agree with your section 7!). There is something missing there, but what's missing is also missing in *other* computational approaches for dealing with language and that doesn't preclude there being empirical questions about whether one system or approach is better than another . I very much like @ct_bergstrom's description of the system as a

3/ BS machine, and I think the observation that it works differently in important respects from other current systems for dealing with scientific text is also apt. But those systems also don't have effectors that give them symbol grounding.

But we can still have meaningful discussion about whether their functionality and performance is better or worse.

All of which is to say that I think it is entirely right to point out the limitations of distributional knowledge

@emilymbender @ct_bergstrom
4/ but I disagree on how far those kinds of in principle arguments go.
We can think about the Collins Dictionary of English and agree that it contains specifications of intensional relationships between concepts, and we can agree that actual human speakers have (extensional) knowledge that goes beyond that.
But it is, to my mind, an empirical and open question *how much* such knowledge is required.
And, relatedly, it is not clear a priori how far a given system

@emilymbender @ct_bergstrom
5/ can get in practice without it.

With respect to BS versus true statements about the world this boils down to the relationship between coherence and correspondence . And that depends on the specific coherence constraints.

All of this, I think, is why cognitive scientists currently have considerable interest in the kinds of reasoning and inference LLMs can support, even though they understand what such systems do and do not capture about meaning.

6/all of which is a long winded way of trying to make the point that in principle considerations (e.g., predictive or distributional models as 'category errors') go less far than one might think, imo, even if that diagnosis is taken as correct.

apologies for wading in here, and please feel free to ignore.

Kristin Branson Nov 17, 2022

@ct_bergstrom I think of these models as a step beyond the "bag of words" models that were popular for text in the early 2000s. The main difference is that LLMs have decent local temporal structure, in particular they produce correct grammar and even sub-topic similarity, which is not nothing & is a success for ML! I appreciate @emilymbender and colleagues' work to help us not be fooled by these models. It is a new type of second-order BS that we are not used to filtering!

Mike Edwards Nov 17, 2022

@ct_bergstrom So it's a terrible knowledge engine, but possibly a champion Balderdash player? Mostly kidding, but I spent some time designing Balderdash-type games, teaching (human) students format, diction, genre conventions apart from content and... this seems like it's working like that?

Ag Terrane Nov 17, 2022

@ct_bergstrom Oh, so it's a management-speak generator?😀

Andrew Jon Thomson Nov 17, 2022

@ct_bergstrom if you like that, get a load of this:

A galactica article explaining that birds are not real

🇳𝗮ꜟ𝖼𝘩 Nov 17, 2022

@ct_bergstrom I tried to get it to write my bio, and it came up with gibberish that was almost as bad as the actual bio.

Ruth Aylett Nov 17, 2022

@ct_bergstrom But if it is using ML technology then it has no SEMANTIC relations at all, that is the problem. It has textual association which is not in the least the same thing. LLMs do not do meaning and therein lies their problem.

Jeremi M Gosney

Nov 17, 2022

@ct_bergstrom
I tried it on myself. This is hilarious but also highly offensive. I would never use rainbow tables!

Likely a coincidence, but 2011 is when I built the first Brutalis, and 2012 is when I joined Hashcat and founded Terahash.

Dave Vanness Nov 17, 2022

@ct_bergstrom Reminds me of some of Neal Stephenson's speculative fiction - especially the versions in Anathem and Fall (or Dodge in Hell), where malicious actors spike the Internet (or parallel universe version of the internet) with bullshit - the more realistic, the harder it is to clean up.

whitney Nov 17, 2022

@ct_bergstrom our writing program director got upset when I made this point (that LLMs are bullshit generators; I think she just didn’t like the word “bullshit”) at our last program meeting, where we were talking about students using AI text generation. I tried to make it clear that I was saying “bullshit” in the technical sense but couldn’t remember the philosopher’s name on the spot!

Tom Williams Nov 17, 2022

@ct_bergstrom this is spot on. I made this argument in a talk I gave at UW Robotics in Feb or Mar 2020 (albeit w.r.t. GPT2) right before everything locked down, and I now try to work it into my invited talks whenever I can 😅 I think it's an evocative, accurate, and necessary framing.

Grendel Dec 5, 2022

@ct_bergstrom

Presenting this technology in this fashion is beyond unethical, it's immoral.

This is on par with cigarette companies telling people they were good for you.

June T. Michael Dec 6, 2022

@ct_bergstrom I'm acquainted with quite a number of teachers and I was not prepared for the talk that goes "You'll need to be prepared that students might hand in auto-generated bullshit as an assignment."
(Yes, some kids still use Facebook, at least in Austria.)

Mike Taylor 🦕Nov 17, 2022

@ct_bergstrom In other words, bullshit.

Daniel Schultz™Nov 17, 2022

@ct_bergstrom Are you sure it's not Galatea 2.2? https://www.amazon.com/Galatea-2-2-Richard-Powers/dp/0312423136

Tim Waring Nov 17, 2022

@ct_bergstrom so dangerous!