@markigra @rlmcelreath it’s definitely worth saying, as I do in the chapter, that knowing f() is more scientifically valuable than only knowing that there is some f(). In the chapter, I say something along the lines of: all else equal, E=MC^2 should be preferred over “E is predictable from M and C”. In social science, however, we don’t always think there’s a single law that covers all social universes. In that case, predictability hypotheses might make more sense to pursue.
@markigra @rlmcelreath thanks for engaging with the chapter! I completely agree that the flexible estimators we’re talking about (e.g., deep learning models) don’t give us any information about the f() that maps x to y—that’s one of their biggest drawbacks. But, what I argue is that the ability for these models to predict can be (depending on the study specifics) strong evidence for the existence of some f(). For certain theoretical statements, that’s all that matters.
In the end, I think the integration of ML into deductive social science is inevitable. The only remaining question is whether or not we do the meta-theoretical work to ensure that the use of ML produces reliable and cumulative scientific insights. If we don’t, I think it will be disastrous for social science. If we do, I think ML will transform the social sciences for the better.
In the chapter I spell this all out a lot more thoughtfully, provide examples of folks already deductively testing predictability hypotheses, and derive some additional methodological implications for testing predictability hypotheses using observational data.
In other words, if x and y strongly correspond to the theoretical construct of interest (big “if”) and there is a plausible null to refute the predictability hypothesis, machine learning can be incredibly valuable for deductive social science. In fact, when testing a statement of the form “x effects y”, it may be IRRESPONSIBLE to restrict yourself to OLS and other “interpretable” models.
That means that if you a priori refuse to use an estimator that is more complex than OLS or ANOVA, you will often not estimate this optimal f() and generate a lot of Type II errors—you won’t find support for a predictability hypothesis that is in fact true.
Now, importantly, when you’re testing a predictability hypothesis on observational data, ML models may be the most appropriate tool for the job. The key intuition is that you want to estimate the f() that generates the most accurate predictions of y from x, just like how we would want the best-fitting b from an OLS model.
Similar to how our typical social science theory was agnostic to the exaxt value of b above, many theories don’t really care what the specific f() is—just that there exists some f() that consistently maps x to y. I call these “x effects y” claims “predictability hypotheses”.
There are theories that are even less specific than “x increases/decreases y”. Take the Sapir-Whorf thesis or either side of the nature/nurture debate. These and many other theories in the social sciences imply what you might describe as an “x effects y” relationship. In this case, I argue that we can use any old estimator to find a mapping between x and y such that “y = f(x)”.