So many highlights in this paper "Eight Things to Know about Large Language Models" by Sam Bowman

If you've not been staying entirely on top of modern LLM research this might be a great place to start catching up - it's succinct, readable and full of fascinating details

PDF: https://cims.nyu.edu/~sbowman/eightthings.pdf

Really nice explanation of why "scaling laws" are so important in this space:

> Scaling laws allow us to precisely predict some coarse-but-useful measures of how capable future models will be as we scale them up along three dimensions: the amount of data they are fed, their size (measured in parameters), and the amount of computation used to train them (measured in FLOPs). [...]

> Our ability to make this kind of precise prediction is unusual in the history of software and unusual even in the history of modern AI research. It is also a powerful tool for driving investment since it allows R&D teams to propose model-training projects costing many millions of dollars, with reasonable confidence that these projects will succeed at producing economically valuable systems.

Two new-to-me terms: sycophancy and sandbagging:

> More capable models can better recognize the specific circumstances under which they are trained. Because of this, they are more likely to learn to act as expected in precisely those circumstances while behaving competently but unexpectedly in others. This can surface in the form of problems that Perez et al. (2022) call sycophancy, where a model answers subjective questions in a way that flatters their user’s stated beliefs ...

> and sandbagging, where models are more likely to endorse common misconceptions when their user appears to be less educated.
> [...]
> Some experts believe that future systems trained by similar means, even if they perform well during pre-deployment testing, could fail in increasingly dramatic ways, including strategically manipulating humans to acquire power

Eek.

This is interesting: it sounds to me like if you want to teach a LLM not to be racist it can actually help to have racist material in its initial pre-training material:

> Indeed, in some cases, exposing models to more examples of unwanted behavior during pretraining can make it easier to make them avoid that behavior in deployment

Also really creepy:

> If we apply standard methods to train some future LLM to tell the truth, but that LLM can reasonably accurately predict which factual claims human data workers are likely to check, this can easily lead the LLM to tell the truth *only when making claims that are likely to be checked*

@simon ok I have to admit I had not thought of this one before, and now that I have, I really do not like it