"Goodhart’s Law is really a statement about the process of trying to make policy based on proxy measures of “internal states of complex systems” which are not themselves directly observable. "

Got to this piece from a link from Ben Recht and it finally nailed something that's always bothered me about the constant reference to Goodhart's law across software.

https://backofmind.substack.com/p/goodhart-as-epistemologist

goodhart as epistemologist

what did that "law" really say?

Dan Davies - "Back of Mind"

First of all I actually think people usually mean to reference behavioral incentive change, in which case you should say Campbell's Law probably; second of all neither of these "laws" ever meant that measurement can't be meaningful

"So the message of Goodhart’s Law is that if you’re setting targets, they ought to target the thing that you care about, not something which you believe to be related to it, no matter how much easier that intermediate thing is to measure."

Of course, many people come to these conversations so convinced that no shared meaningful measurement is ever possible, they would prefer to say: all measures are always going to be subject to goodhart's law

I don't believe that, though. I think for most of us, if you're going to take a medicine you will want to know how many people were helped or died after that medicine was administered. Measurement is necessary. I do understand we may have reasonable disagreements about the when and how

I explicitly think that we should take seriously the way that measurement operates as a social lever and USE THAT

This is not remotely controversial if you're someone who's worked in policy. Here is an example:

https://www.urban.org/policy-centers/cross-center-initiatives/kids-context/projects/kids-share-analyzing-federal

"Children can’t vote and they can’t lobby for public resources, but their well-being and development affect the future economic and social health of the country. Children also can’t work their way out of poverty, so the government has a special calling to protect them."

"Public investments are used to educate children; promote their health, safety, and well-being; ensure their basic needs are met; and help protect their families from financial hardship. These investments come in the form of direct spending on programs that serve kids and through tax benefits that offer their families financial assistance.

Determining how government spends money, and who benefits, reveals our priorities."

And quant is part of this work.

@grimalkina I agree that output metrics can be useful, either as inputs to a decision tree or as incentive levers. I always used Goodhart's law as a warning against overfitting. Also as a humorous exaggeration, in a similar vein as the https://en.wikipedia.org/wiki/Ninety%E2%80%93ninety_rule. Is that not correct? Is that not how others use it?
Ninety–ninety rule - Wikipedia

@grimalkina I think the key graf of Goodhart’s original observation that goes carefully uninterrogated is the words “for control purposes”. On his context, the next obvious question might have been “control of what”, but a more sanguine interpretation today might be, “of who”.
@mhoye yes totally. That's actually why I like Campbell's law because it explicitly names the distortion of social processes
@grimalkina (vaguely related: Davies’ writing has been so consistently so good for ages. @brainwane put together some extensive collections for Metafilter a while ago - I’ll link them up when I’m off the train - but if it ever crosses your desk I would love to know what you think of The Unaccountability Machine.
Posts tagged with DanielDavies | MetaFilter

@mhoye @brainwane I've only read sections of it!!! Post turning in the final-final.final.for-real copy of my own book I have to read it in full :)))
@grimalkina (In that light, the desire to handwave away the invocation of math might deserve some sympathy, even if it’s misguided, given people’s collective experience of the worst of Taylorism.)
@grimalkina it is amazing how much institutional paralysis comes from just a collective experience of coercion an punishment as managerial tools.
@mhoye that's fine imho, but if that's ALL we ever talk about, I think we obscure moments when it's very handy for abuse and discrimination when certain things are rendered illegible and invisible. For instance, the current US admin's erasure of racial and gender data will render many effects illegible, which makes it difficult for people to advocate for policy change and show the true cost of things. I have faced many such situations where the "human side" (for lack of a better word) is...
@mhoye ...attacked even by the people who otherwise agree with the goals, because they can't help but bikeshed forever on this, and have a holy panic about measurement that isn't always warranted. Yeah, of course, OF COURSE sometimes it's warranted, but not every loudmouth white guy making a million dollars a year should spend his entire career never being accountable to every outcome because of an abstract ideal about how "humans are unmeasurable" that magically only seems to refer to him, yk?

@grimalkina @mhoye

It comes down to something like "feedback control isn't evil, evil people doing the controlling is evil".

In an ideal world of cooperation and non-domination we'd all be talking all the time about measurements we were making and how we should respond to them. ("Hey, the percentage of people who at risk of homelessness is increasing, let's do something!") Putting the response/control in the hands of a tiny few self centered people is the bad idea, not the act of measuring

@grimalkina yeah, this is a very strong point. Productivity metrics and consequences for thee, ineffable and unmeasurable qualities for me.

@grimalkina I think it’s useful to interrogate Campbell’s “the more subject it will be to corruption pressures […] the social processes it is intended to monitor”

It seems to me that “corruption” is an unexamined word here. A thing can only be corrupted relative to a definition, which suggests that corruption can only happen in a poorly defined system with freedoms available to the actors in it which cannot be constrained.

Which brings up the question of who gets to make the definition!

@grimalkina if making the definition is a marker of power, but the actions of entities in the system are not overdetermined, it follows that both “corruption” and resistance are not abnormal but are ongoing and regular processes.

The (inaccurate) invocation of Goodhart’s (Campbell’s) law can be understood as a desire to talk about misaligned incentives and benefits between the measure-definers and the measured.

Or conversely to indicate conditions necessary for such corruption to happen.

@arvindv what does "the actions of entities in the system are not overdetermined" mean?
@grimalkina people / institutions subject to policies having the freedom to respond to them in more than one way
@grimalkina also I’m not entirely making a point as much as gesturing towards maybe a way of identifying conditions under which it makes sense to think about measure corruption
@arvindv not totally sure what point you're making, but sure! If you read more of the examples he offers I think it pretty immediately becomes about the issues of operationalization and power involved (eg, examples of academic achievement)

@grimalkina

link broken for me, missing a t at the end of the URL. Working link here:

https://backofmind.substack.com/p/goodhart-as-epistemologist

goodhart as epistemologist

what did that "law" really say?

Dan Davies - "Back of Mind"
@dlakelan oops thank you - fixed!

@dlakelan ty for catching that! I was checking which of Davies' Goodhart posts it was.

Dan Davies is a pretty regular read for me, and I appreciate Cat's pointer and observation.

@grimalkina

I'm not sure if the quoted text is a good assessment. Even taking what Goodheart actually said.

Suppose you turn on cruise control on your car, and then drive over a hilly freeway with varying wind speed and soforth. As you drive, the position of the accelerator pedal will be a primary *cause* of your speed. But your speed will have zero correlation with that pedal's position because your speed will be essentially constant while the pedal moves up and down

@grimalkina

So it's not just that proxy measures make it difficult to control something. Sometimes proxy measures are perfectly fine, but even without proxies, it's just that controlling that thing removes the statistical regularity, because it makes the controlled thing constant while varying the control variable.

@dlakelan I'm not sure which quote you're critiquing here?

@grimalkina

"Goodhart’s Law is really a statement about the process of trying to make policy based on proxy measures of “internal states of complex systems” which are not themselves directly observable. "

This one

@dlakelan @grimalkina the first thing that came to mind when I read this was black hole physics (internal states that are not directly observable)
@dlakelan I mean, there are plenty of statistical models that are capable of handling more complex relationships than linear positive correlations? There IS a true causal relationship in what you're describing you just need a better theory of the mechanism to measure it correctly?

@grimalkina

Oh yeah, for sure. I'm just saying that if you have a good feedback system with a good mechanism, it will destroy many statistical correlations that are observable in the data before you turned it on. That's more or less to be expected, and I think that's what Goodheart was trying to say

@dlakelan oh yeah, that's fair. It's hard to talk/write about this because people move so immediately into the realm of "what people are inferring about measures" and away from the comments about actual analysis. At any rate I agree with the "ought to target the thing you care about" as a counter to criticisms that all gamification will always render measures useless. But take your point that it's a simplistic summary of the actual law!
@dlakelan and in general your example of cruise control is a great one, thank you for that, it reminds me a lot of the work we had to do when I was an efficacy scientist to convince school districts that "kept the failure rate steady despite increasing adversity" could count as a big educational success in the context of all that was happening for learners, as much as "test scores went up" in other cases

@grimalkina

Oh for sure. Use it please! It's not my example I heard it on the internet somewhere. If you keep the car speed constant while you go from flat ground to a steep incline that's already a big indicator your car is "succeeding" and I think it should be understandable to the education audience.

@grimalkina

This point made in the linked article is good though. you HAVE to have measurements, and you HAVE to respond to them. You don't have any choice other than to be passive and useless. The key questions should always be things like "how does this measurement relate to the thing I care about" and maybe "how does that relationship change with time and adversary action?"

@grimalkina

Another one I heard and I think is correct, is that basically **all** measurements are indirect measurements. When I look at a volt-meter needle I'm measuring the position of the needle, not the voltage of the circuit. Even if it's an LCD I'm measuring the position of the dark spots on the display. And then, maybe your voltage is actually the voltage across a resistor which lets you infer current, and that current infers magnetic field, and that magnetic field infers position of...

@dlakelan yes this is very familiar to social scientists :). We seem to have a very explicit awareness of operationalization that other areas are allowed to elide.

@grimalkina

A servo motor, and that servo motor infers position of a valve, and the valve position infers water flow, and the water flow infers power generation of a turbine, and the power generation of a turbine infers current in another circuit, and that infers voltage on a transmission line, and all of that lets you decide if you should cut off a circuit breaker which relates to the safety of a hospital operating room... etc etc even in physical science or engineering it's all indirect

@grimalkina

The difference in engineered systems is that we get to design the system so that the relationship has a reasonably "flat" transfer function, that is the position of the needle tells us a lot of low noise information about the power transmission line.

@grimalkina The best guidance about using metrics to monitor & control software processes came from Tom DeMarco, namely: “You cannot measure a professional & expect them to act professionally.”

As soon as a measure is used to measure *people*, it looses its value as measure. Measure processes, not people.

@causticmsngo do you therefore not believe in any individual promotion process? Or individually firing someone who is discriminatory to their colleagues? There are forms of measurement of human behavior involved in those decisions.
@grimalkina No, what I mean & what my experience seems to confirm is that if you **intend your measure to control a process** you must ensure that measure is not applied to a person. Otherwise, that person will modify their behavior to optimize the metric which will destroy it's value as process control measure. (1/3)
Suppose we measure "defects/LOC”. This can be a useful measure to compare across modules or over time. Changes in that metric may tell you something useful or may not. Often that module A is just different than module B. (2/3)
However, if that measure is used to incentivize human behavior (e.g. “team A has a higher 'defects/LOC' than team B, therefore team B is a 'better' team”) then you lose it's value to control the process or system under development. It degrades to a performance metric applied to people, most likely unfairly because it obscures why those values are different in the first place. (3/3)