A lot of #MachineLearning and #PredictiveModelling in #statistics is based on minimisation of loss with respect to a training data set. This assumes that the training data set as a whole is representative of potential training sets. Consequently, this implies that loss minimisation is not an appropriate approach (or way of conceptualising the problem) in problems where the training data sets are not representative of the potential testing sets. (As a working title, let's call this issue "radical nonstationarity".)
I recently read Javed & Sutton 2024 "The Big World Hypothesis and its Ramifications for Artificial Intelligence" (https://web.archive.org/web/20250203053026/https://openreview.net/forum?id=Sv7DazuCn8) and think it describes a superset of this issue of radical nonstationarity. I strongly recommend this paper for motivating why loss minimisation with respect to a training data set might not always be appropriate.
Imagine an intelligent agent existing over time in a "big world" environment. Each observation records information about a single interaction of the agent with it's environment, and this observation only records the locally observable part of the environment. The agent may be moving between locations in the environment that are radically different with respect to the predictive relationships that exist and the variables that are predictive of the outcome of interest may vary between observations. Nonetheless, there is some predictive information that an intelligent agent could exploit. The case where everything is totally random and unpredictable is of no interest when the focus of research is an intelligent agent. In such a world minimising loss with respect to the history of all observations seen by the agent or even a sliding window of recent history seems irrelevant to the point of obtuseness.
One possible approach to this issue might be for the agent to determine, on a per-observation basis, the subset of past observations that are most relevant to making a prediction for the current observation. Then loss minimisation might play some role in determining or using that subset. However, that use of a dynamically determined training set is not the same thing as loss minimisation with respect to a statically given training set.
I am trying to find pointers to scholarly literature that discusses this issue (i.e. situations where minimisation of loss with respect to some "fixed" training set). My problem is that I am struggling to come up with search terms to find them. So:
* Please suggest search terms that might help me find this literature
* Please provide pointers to relevant papers
#PhilosophyOfStatistics #PhilosophyOfMachineLearning #CognitiveRobotics #MathematicalPsychology #MathPsych #CognitiveScience #CogSci #CognitiveNeuroscience #nonstationarity #LossMinimisation