In the end, I think the integration of ML into deductive social science is inevitable. The only remaining question is whether or not we do the meta-theoretical work to ensure that the use of ML produces reliable and cumulative scientific insights. If we don’t, I think it will be disastrous for social science. If we do, I think ML will transform the social sciences for the better.
In the chapter I spell this all out a lot more thoughtfully, provide examples of folks already deductively testing predictability hypotheses, and derive some additional methodological implications for testing predictability hypotheses using observational data.
In other words, if x and y strongly correspond to the theoretical construct of interest (big “if”) and there is a plausible null to refute the predictability hypothesis, machine learning can be incredibly valuable for deductive social science. In fact, when testing a statement of the form “x effects y”, it may be IRRESPONSIBLE to restrict yourself to OLS and other “interpretable” models.
That means that if you a priori refuse to use an estimator that is more complex than OLS or ANOVA, you will often not estimate this optimal f() and generate a lot of Type II errors—you won’t find support for a predictability hypothesis that is in fact true.
Now, importantly, when you’re testing a predictability hypothesis on observational data, ML models may be the most appropriate tool for the job. The key intuition is that you want to estimate the f() that generates the most accurate predictions of y from x, just like how we would want the best-fitting b from an OLS model.
Similar to how our typical social science theory was agnostic to the exaxt value of b above, many theories don’t really care what the specific f() is—just that there exists some f() that consistently maps x to y. I call these “x effects y” claims “predictability hypotheses”.
There are theories that are even less specific than “x increases/decreases y”. Take the Sapir-Whorf thesis or either side of the nature/nurture debate. These and many other theories in the social sciences imply what you might describe as an “x effects y” relationship. In this case, I argue that we can use any old estimator to find a mapping between x and y such that “y = f(x)”.
Most contemporary quantitative social science posits theories of the form “x increases y” or “x decreases y”. In this case, we may use OLS to estimate a formula “Y = a + bX”. The theory doesn’t imply a specific b, but it does imply that b is either greater than or less than zero. The model helps us provide evidence for that claim.
Theories vary in their specificity. “E = MC^2” is a very specific theory. It posits a very precise relationship between variables. Given M and C, I can plug the numbers into the formula and provide you an exact estimate of E. Social science theories are rarely this specific.
(Quick shout-out: other folks, such as the amazing
@LauraNelson, have extensively and thoughtfully discussed ML’s application to inductive social science)