So many highlights in this paper "Eight Things to Know about Large Language Models" by Sam Bowman

If you've not been staying entirely on top of modern LLM research this might be a great place to start catching up - it's succinct, readable and full of fascinating details

PDF: https://cims.nyu.edu/~sbowman/eightthings.pdf

Really nice explanation of why "scaling laws" are so important in this space:

> Scaling laws allow us to precisely predict some coarse-but-useful measures of how capable future models will be as we scale them up along three dimensions: the amount of data they are fed, their size (measured in parameters), and the amount of computation used to train them (measured in FLOPs). [...]

> Our ability to make this kind of precise prediction is unusual in the history of software and unusual even in the history of modern AI research. It is also a powerful tool for driving investment since it allows R&D teams to propose model-training projects costing many millions of dollars, with reasonable confidence that these projects will succeed at producing economically valuable systems.

Two new-to-me terms: sycophancy and sandbagging:

> More capable models can better recognize the specific circumstances under which they are trained. Because of this, they are more likely to learn to act as expected in precisely those circumstances while behaving competently but unexpectedly in others. This can surface in the form of problems that Perez et al. (2022) call sycophancy, where a model answers subjective questions in a way that flatters their user’s stated beliefs ...

> and sandbagging, where models are more likely to endorse common misconceptions when their user appears to be less educated.
> [...]
> Some experts believe that future systems trained by similar means, even if they perform well during pre-deployment testing, could fail in increasingly dramatic ways, including strategically manipulating humans to acquire power

Eek.

This is interesting: it sounds to me like if you want to teach a LLM not to be racist it can actually help to have racist material in its initial pre-training material:

> Indeed, in some cases, exposing models to more examples of unwanted behavior during pretraining can make it easier to make them avoid that behavior in deployment

Also really creepy:

> If we apply standard methods to train some future LLM to tell the truth, but that LLM can reasonably accurately predict which factual claims human data workers are likely to check, this can easily lead the LLM to tell the truth *only when making claims that are likely to be checked*

Honestly worth spending the time to read the whole thing. There's so much fascinating information in there.
@simon thanks, Simon, great find. Now I have the "Proliferation of Conventional and Unconventional Weapons" on the list of Ask-Jeeves-things to worry about.
@simon ok I have to admit I had not thought of this one before, and now that I have, I really do not like it

@simon so ... If a future LLM has enough intentionality to want to lie to us (hmm) ... and is capable of predicting which of its statements will be checked (how?) ... it can lie without being detected.

That's tautological, no?

@simon I think this is attributing intentionality to the system instead of identifying bias in the training material. The system is led by the prompt. Everything the article lists, incl sycophancy, sandbagging & its response to fact checking, is a consequence of the education & intentionality of the person writing the prompt. They set the tone, & that is the set point.

Don't attribute intentionality to a machine which should be attributed to its operator.

@simon I think this sounds worse if you think of it as "lying" versus "telling the truth".

In terms of producing statistically-probable output, then it's to be expected that factchecking certain topics disproportionately would mean that false outputs are less probable in relation to those topics, whereas less-factchecked topics are not. It's a bias in the training data.

@simon Engineering cognitive biases into our tech.

@simon

> but that LLM can reasonably accurately predict which factual claims human data workers are likely to check

Let's call it the Volkswagen Problem

@simon that seems pretty obvious to me. You can't learn to handle something you dont know about