Mastodawn

Cormac Herley Dec 21, 2024

Shriram Krishnamurthi Dec 20, 2024

I increasingly find myself alternating between positivist research modes and asking "first of all, what does this even mean", and realizing the seeds of the latter were all planted in my Epistemology class in college. I think the Scottish Rationalists are increasingly having a bigger influence on me than most computer scientists.

Cormac Herley Sep 25, 2024

New paper up: Can we count on LLMs? Can we rely on them? Can they perform basic tasks like counting reliably?
https://arxiv.org/abs/2409.07638v2

Can We Count on LLMs? The Fixed-Effect Fallacy and Claims of GPT-4 Capabilities

In this paper we explore evaluation of LLM capabilities. We present measurements of GPT-4 performance on several deterministic tasks; each task involves a basic calculation and takes as input parameter some element drawn from a large well-defined population (e.g., count elements in a list, multiply two k-digit numbers, etc). We examine several conditions per-task and perform enough trials so that statistically significant differences can be detected. This allows us to investigate the sensitivity of task-accuracy both to query phrasing and input parameter population. We find that seemingly trivial modifications in the task-prompt or input population can yield differences far larger than can be explained by sampling effects. For example, performance on a simple list-counting task varies with query-phrasing and list-length, but also with list composition (i.e., the thing-to-be-counted) and object frequency (e.g., success when an element accounts for $\approx$ 50\% of a list is different from when it accounts for $\approx$ 70\% etc). We conclude that efforts to quantify LLM capabilities easily succumb to the language-as-fixed-effect fallacy, where experimental observations are improperly generalized beyond what the data supports. A consequence appears to be that intuitions that have been formed based on interactions with humans form a very unreliable guide as to which input modifications should ``make no difference'' to LLM performance.

arXiv.org

Cormac Herley Nov 22, 2023

Royce Williams Nov 21, 2023

The recent emergence of "damages" as plural of damage - when "damage" was already plural and "damages" used to be reserved solely for legal damages - bugs me a bit. "Not responsible for any damages caused" should be "not responsible for any damage caused."

I'm also not a a fan of "learnings" and "trainings", which used to never appear with an 's' at the end.

But for internaional English, those ships appear to have sailed (and I understand how they happened, linguistically).

OldManShakesFistAtPidgins.gif

Cormac Herley Jul 27, 2023

Royce Williams Jul 26, 2023

Tycho's Meta-Law of Inversion of Sufficient Advancement:

When inverted, any sufficiently advanced "any sufficiently advanced X is indistinguishable from Y" law ... is also true.

Examples:

"Any sufficiently advanced technology is indistinguishable from magic" (Clarke's third law)
vs.
"Any sufficiently advanced magic is indistinguishable from technology"

"Any sufficiently advanced incompetence is indistinguishable from malice" (Grey's Law)
vs.
"Any sufficiently advanced malice is indistinguishable from incompetence"

etc.

Inversion should always reveal a different kind of wisdom - or at least food for thought. :D

(That second example has specific application in the security space - think about it.)

Cormac Herley May 10, 2023

Park service really updating for the times.

Cormac Herley Apr 21, 2023

Adam Shostack

Apr 21, 2023

New Longread: Layoffs in Responsible AI teams.

It starts:
Wendy Grossman asks “what about all those AI ethics teams that Silicon Valley companies are disbanding? Just in the last few weeks, these teams have been axed or cut at Microsoft and Twitch...” and I have a theory.

My theory is informed by a conversation that I had with Michael Howard, maybe 20 years ago. I was, at the time, a big proponent of code reviews, and I asked about Microsoft’s practices. He said, “oh, they don’t scale, we don’t do things that don't scale.” (Or something like that. It was a long time ago.) After I joined the SDL team, and we started working together, I saw the tremendous focus that the team had on bugs. (My first day on the job included an all-hands, and I saw GeorgeSt present how many bugs the Secure Windows Initiative had managed through the Vista process.)

https://shostack.org/blog/responsible-ai-layoffs/

Shostack + Friends Blog > Layoffs in Responsible AI Teams

Some inferences from layoffs in responsible AI teams

Cormac Herley Mar 23, 2023

"The original question, `Can machines think?' I believe to be too meaningless to deserve discussion."
A.M. Turing (1950)

Cormac Herley Dec 25, 2022

Jan Mieszkowski (on the bird site, but not here): "Beckett is furious with Giacometti for overdecorating the Christmas tree."