@elduvelle_neuro Here are some thoughts in response to some of your questions (with some responses to @Andrewpapale sprinkled in. Also tagging @dimokaramanlis in case you're still interested)
We trained the rats on both strategies from the start. Each training session exposed them to forced-choice trials of both types. We also essentially gave them do-over trials early in training. Consecutive trials started at the same arm until they made the correct choice (instead of random start arm assignment). It took the rats anywhere from 1 - 6 weeks to learn the task (I think the average was about 10 training days). Usually an additional few days after surgery recovery as well.
An important thing to keep in mind is that the block durations are based on a running tally of correct choices. We decided that if they got 12/15 trials correct, they had learned the current strategy. We claimed, however, that the actual learning process occurred earlier in the block, before they were consistently making correct choices at the block switch.
That's where the Maggi et al. algorithm came in. I'm going to direct folks to their paper for more details because it's a bit of work to explain. What I will say is that each strategy is evaluated independently of all other strategies, so, no constraint that the posteriors sum to 1 on a trial-by-trial basis. I think this makes sense - different strategies are not, in the general sense, necessarily orthogonal (even though some are, like go east and go west, which do indeed sum to 1).
It's also worth noting that I had designed this task with the idea that strategy learning should be disentangled from task structure (e.g. block switches) prior to this algorithm's development, so I was very lucky that it came out when it did! (Although a number of other algorithms with the same premise already existed in some form, this one was just extremely generalizable and easy to implement).
Anyway, the accuracy declines to pre-learning point levels within 7 trials of the learning point because most rats finish their block by then. If they haven't, they often seem to think they have and start trying something new (that's anecdotal, I didn't analyze it directly).
Regarding the circularity of definitions:
The cause and effect between VTE, accuracy, and flexibility are, as mentioned, not well addressed here (for the most part). We just claim they are related. I actually think we did a pretty good job of avoiding circularity, though. VTE is defined exclusively by trajectory shape and does not rely on any other behavioral measure. Learning point is defined by strategy likelihood, which is defined exclusively by choice history and *not * choice outcome. Accuracy is defined exclusively by choice outcome. Flexibility, being derived from strategy likelihood, is also not defined by choice outcome, or VTE occurrence. None of these things were forced to align, but they did (usually).
What I think is most interesting is that these measures did not always align. Some VTE trials result in incorrect choices, sometimes flexibility spiked before from the learning point, sometimes VTE occurred while perseverating on a prior strategy, etc. That's really the big point - there were enough instances of these measures falling out of sync that we figured it was not quite right to consider them the same way at all times. Which led us to conclude that there were multiple types of VTE.
I did look for evidence of multiple VTE types based on trajectory shape alone, and do think there is more work that could be done there, but it wasn't consistent enough across rats to see any clear trends like differences in shape for deliberative vs uncertain/indecisive VTE.
Also, I have not had good luck with the IdPhi methods. Maybe because our maze has no walls, so trajectories can be kind of ... snakey? Hard to say. But that's why we used clustering on PCA projections of the trajectory shapes. I'm pretty pleased with how well it has worked, but acknowledge that it's not as straightforward. I would gladly have used an IdPhi-based classification if I were more confident in its labeling!
For the neural data, I did my best lol. I agree that 3 rats is a small n, but, with lockdowns, illnesses, and pressure to graduate continually mounting - it is what it is. Plus I knew I was hoping to switch fields and didn't want to have unpublished data lingering in the back of my mind.
My advisors and I agreed that following the recommendations of Saravanan, 2020 - using the hierarchical bootstrap to control our false positive rate for non-independent samples - was a (hopefully) good compromise for low-n ephys analysis, but, it's a decision worth critiquing. I tried to limit our conclusions to things that supplemented the behavioral story we were telling, and tried to be careful not to overstate our claims. Fair to take it with a grain of salt, though.
I like the question about sequences. I think hippocampal theta sequences are constantly occurring during navigation and, as in Kay 2020, switch between representing possible options (at least prior to decisions). We know mPFC and HPC sequences sometimes correlate, and my guess is that those correlations would be stronger during deliberative VTE. I bet the mPFC either doesn't form reliable sequences or its rhythms don't sync with HPC on uncertain/indecisive VTE. I'm not sure if the "synchrony" would be theta or beta based, though - I had some prelim data to suggest that trial-level mPFC-HPC coherence is best aligned with choice points in the beta band, while theta coherence fluctuated on longer timescales (multi-trial).
The last thing I'll comment on is the sensory vs deliberative VTE idea. I think taking in sensory info could be an obvious source of VTE for some sensory discrimination tasks. But I will say that our sensory environment was pretty lackluster, and never changed. If the rats were really just looking for visual cues, I would not expect VTE rates to fluctuate in any orderly way, but they do for a variety of tasks on these mazes. Just my 2 cents!