@ag3dvr @WorldImagining @PessoaBrain @neuralreckoning @NicoleCRust @albertcardona @matthewcobb @WiringtheBrain @elduvelle @clarepress I don't strongly disagree with your particulars. I frame things a little differently in my head.
First, I think the Wr construction is specific to artificial neural net lineages, and unlikely to be adequate at the the level of the whole brain (where this thread started), even if it is a decent model in more limited settings. There are too many regions with dynamics that are inelegantly described with rate functions (e.g. with strong transient responses, long inactivation, rebound bursting in thalamus). With a small abuse of history, I include Wilson-Cowen in that set of ANN models in this context.
I come from a dynamical systems background, so I'm generally ok including, say, gating variables or something like them as an intrinsic part of state. Of course, one could try to expand to a W\tilde{r} where the "rate" variable now includes things other than spike rates (and in fact we are sorta pursuing models of that kind), but I don't see a strong argument that
( dr/dt=sigma(Wr) )
is sufficiently universal or necessary as a brain model; it could just be
( dx/dt=f(x) )
whatever the universal approximator theorems are.
As a technicality, I don't strongly distinguish stochastic from deterministic Markov properties. So "distinct outputs for identical inputs" has to be interpreted in the probabilistic sense.
Second, I don't think it makes much sense to ask if the brain is Markovian. Models may or may not have the Markov property; it is not a property of physical systems. The argument is just what we already discussed. Every (>1D) system that can be described by a (reasonable) dynamical model can be approximated with either a Markovian or non-Markovian model via suitable transformation of (state,update_operator), at least if one is willing to work on function spaces or similar abstractions.
A practical example where this kind of flexibility might matter is construction of delay-embedding models with truncation after some number of terms. One gets to trade-off the dimension of the state they track vs how far into the future the model's predictions remain accurate. It is not always obvious that maximizing "Markovianess" is the best choice.
So I see the useful kinds of questions being along the lines of "What do we need to track as 'state' to make a Markovian model that does a good job of matching data?", or "How do we transform a non-Markovian model that is a good fit into an interpretable Markovian model?". That is, we prefer Markov, similar to how we prefer linearity, and prefer modularity, and prefer... but these are aspirations for models that are good enough, not to be confounded with the "real" properties of the physical system.
We should not ask whether a system has this property. We should ask what a good choice of state is. I think that is consistent with your statements, except that we put a different amount of value on Wr as a framing device.
Edit: Typo, and fighting the Latex interpreter.