Flexible decision-making is related to strategy learning, vicarious trial and error, and medial prefrontal rhythms during spatial set-shifting

Peer-reviewed scientific journal publishing basic neuroscience research in the areas of neuronal plasticity, learning and memory

my (personal) summary and then comments for this #JournalClub

any corrections, comments, additional questions are welcome, especially from the first author @jessetm

1/7
The main goal of this paper is to test if #VTEs (Vicarious Trial-and Error) and medial prefrontal cortex LFP relate to navigation behaviour parameters such as behavioural flexibility, performance and strategy use, during allocentric* navigation.

VTEs are a behaviour that rodents and humans do at choice points, looking alternatively at the different available options before choosing one (check video below). They have been studied mostly during response-based tasks (when the subjects have to learn a body-oriented response or sequence of responses to the reward). From that research, two possible roles for VTEs have been suggested: deliberation (weighing down the available options) or uncertainty (hesitation).

The current paper aims to test which is the most likely role of these two, by having a task involving a lot of deliberation and a lot of uncertainty (protocol explained below).
The main conclusion is that these two VTE types actually exist, which means VTEs should not just be interpreted as a marker of behavioural flexibility or deliberation. There is are also some interesting findings about different LFP rhythms in the medial prefrontal cortex being stronger during different types of behaviours (explained below).
(*) (allocentric = based on an external reference frame, like the Water maze task)

1/7
cc: @Andrewpapale, @drdrowland, feel free to add your comments /questions anywhere you want!

2/7
The task: rats had to find the rewarded arm on a plus maze in two different types of tasks:

  • A place task where the goal was always at the same location within a block but the rat left from a different start, requiring flexible trajectories to the goal
  • An alternation task where the goal alternated between two locations and the start also alternated (so, an allocentric version of the classical alternation task)
  • Each sequence of sessions consisted in 3 blocks, each with a different task rule (place to one goal, place to the other, alternation)

2/7

3/7
Some important definitions that are used throughout the paper
yes, these might be quite detailed, writing this helped me understand them

  • strategy likelihood: trial by trial time series of strategy likelihood, using an existing algorithm (Maggi et al., 2024), this relies on comparing the rat's decisions with a model of a perfect decision-maker using each strategy.
  • learning point: trial when the target strategy became the most likely (it splits blocks into exploration and exploitation
  • flexibility score: absolute difference in strategy likelihoods from trial t − 1 to trial t, summed across strategies then normalized by median absolute deviation - in other words, it should be higher if the strategy used changed and lower if the rat keeps using the same strategy
  • flexible periods (different from flexibility score) = trials around the learning point, not trials at the end of a block, trials with flexibility score in top 60%
  • choice accuracy: same as performance or choice outcome
  • VTE: Vicarious Trial-and-Error, detected following method of Kidder et al., 2024: head is tracked via DeepLabCut, trajectories are aligned and scaled then projected in principal component space and clustered in two clusters using "hierarchical agglomerative clustering". Additional VTEs were found with another measure (combination of z-ln(idphi) and position crossing criteria).

3/7

Tracking subjects’ strategies in behavioural choice experiments at trial resolution

A new Bayesian algorithm for tracking subjects’ choice strategies on every trial reveals when subjects learn and what they tried while doing so, providing strong evidence that reward- and loss-driven exploration change independently.

eLife

4/7
Summary of the main findings on VTEs & flexibility score:

  • VTEs are more likely to happen during correct trials
  • VTEs are equally likely to happen for both strategies (even though the alternation strategy was apparently easier)
  • VTEs are not more numerous at the end of a block (when rat knows the new rule) than at the start (maybe because it would take more trials for the behaviour to become automatized?)
  • VTEs are increased around learning points, supporting the deliberation hypothesis
  • flexibility score also increases around learning points
  • VTEs during incorrect trials are associated to lower flexibility scores
  • trials with VTEs during flexible periods have increased choice accuracy compared to trials with VTEs during inflexible periods -> interpreted as two types of VTEs, one reflecting deliberation and one reflecting uncertainty. (is the same difference in performance obtained for trials without VTES?)

4/7
(I will add alt-text to these in a bit)

5/7 Some results from the mPFC (anterior prelimbic cortex) LFP recordings:

  • decreased gamma power on correct trials
  • increased beta and theta during VTE trials
  • increased gamma post-learning point (= exploitation mode)
  • no significant difference in any band depending on trial outcome (but remember that the time window analysed was around the choice point; it is possible that the rat has already made its decision, and it is likely that activity differences would be seen at the reward location)

5/7

6/7
In summary, performance, VTE rate and flexibility score all increased around learning points, VTEs are more present during correct trials, and the rats are likely to be in flexible mode when they are doing a VTE during a correct trial. They can also do VTES during incorrect & inflexible trials (e.g. sticking to a strategy).

=> Coming back to our original question about the role of VTES – deliberation or uncertainty – this shows that the two types are there on different trials and that VTES are not necessarily related to behavioural flexibility!

I am not sure how to summarise the LFP results at this stage but you can have a look at the very detailed discussion in the paper!

6/7

7/7 My comments and questions

Overall, I really appreciate this paper which, in my opinion, addresses some of the hard questions of spatial cognition in a pretty robust manner. I like that the tasks used are allocentric and quite demanding because these are the kinds of tasks I’m interested in and they are likely to engage the hippocampus. It’s also nice to see that the rats were pretty good at the task. And I think VTEs are fascinating and we really don’t know enough about them at this stage – it is nice to see this co-existence of two types of VTES, and it reminds us not to over-interpret everything (VTEs =/= flexibility).

Some of the measures were a little hard to understand at first read, and some of the results might appear a bit circular (the link between VTEs and performance , VTEs and learning points, learning points and performance.. which is the cause and which is the consequence??) but all the information is available for the reader to make up their own mind.

I have some questions, mostly for the author (@jessetm) but anyone should feel free to answer:

1). How long did the rats take to learn each task and then to do task switches?
2). Fig 2b, how come the performance ("accuracy" drops so quickly after the learning point? Shouldn't it be high for at least 5-10 more trials after the learning point?
3). One of the clearest results is that trials with VTEs are more likely to be correct than those without VTEs. Since this is a visual task I wonder if this simply shows that rats need to gather visual info (looking around) to know where the goal is, and it might not have much to do with actual deliberation.
4). Were the VTEs different-looking (e.g. stronger or weaker movement) for deliberative vs uncertain vtes? What about number of hesitations for a given VTE (left-right, left-right-left etc.)?
5). Would we expect to see theta sequences with similar properties for the deliberative VTEs vs the uncertainty VTEs??
6). Related: it seems that the mPFC theta is higher on VTE trials, but is that the case for both types of VTE
7). Shouldn’t fig 8 have some form of multiple comparison correction across those 12 tests (maybe it doesn’t apply here for some reason)?

Thank you!

7/7 THE END (for me)

@elduvelle_neuro Here are some thoughts in response to some of your questions (with some responses to @Andrewpapale sprinkled in. Also tagging @dimokaramanlis in case you're still interested)

We trained the rats on both strategies from the start. Each training session exposed them to forced-choice trials of both types. We also essentially gave them do-over trials early in training. Consecutive trials started at the same arm until they made the correct choice (instead of random start arm assignment). It took the rats anywhere from 1 - 6 weeks to learn the task (I think the average was about 10 training days). Usually an additional few days after surgery recovery as well.

An important thing to keep in mind is that the block durations are based on a running tally of correct choices. We decided that if they got 12/15 trials correct, they had learned the current strategy. We claimed, however, that the actual learning process occurred earlier in the block, before they were consistently making correct choices at the block switch.

That's where the Maggi et al. algorithm came in. I'm going to direct folks to their paper for more details because it's a bit of work to explain. What I will say is that each strategy is evaluated independently of all other strategies, so, no constraint that the posteriors sum to 1 on a trial-by-trial basis. I think this makes sense - different strategies are not, in the general sense, necessarily orthogonal (even though some are, like go east and go west, which do indeed sum to 1).

It's also worth noting that I had designed this task with the idea that strategy learning should be disentangled from task structure (e.g. block switches) prior to this algorithm's development, so I was very lucky that it came out when it did! (Although a number of other algorithms with the same premise already existed in some form, this one was just extremely generalizable and easy to implement).

Anyway, the accuracy declines to pre-learning point levels within 7 trials of the learning point because most rats finish their block by then. If they haven't, they often seem to think they have and start trying something new (that's anecdotal, I didn't analyze it directly).

Regarding the circularity of definitions:
The cause and effect between VTE, accuracy, and flexibility are, as mentioned, not well addressed here (for the most part). We just claim they are related. I actually think we did a pretty good job of avoiding circularity, though. VTE is defined exclusively by trajectory shape and does not rely on any other behavioral measure. Learning point is defined by strategy likelihood, which is defined exclusively by choice history and *not * choice outcome. Accuracy is defined exclusively by choice outcome. Flexibility, being derived from strategy likelihood, is also not defined by choice outcome, or VTE occurrence. None of these things were forced to align, but they did (usually).

What I think is most interesting is that these measures did not always align. Some VTE trials result in incorrect choices, sometimes flexibility spiked before from the learning point, sometimes VTE occurred while perseverating on a prior strategy, etc. That's really the big point - there were enough instances of these measures falling out of sync that we figured it was not quite right to consider them the same way at all times. Which led us to conclude that there were multiple types of VTE.

I did look for evidence of multiple VTE types based on trajectory shape alone, and do think there is more work that could be done there, but it wasn't consistent enough across rats to see any clear trends like differences in shape for deliberative vs uncertain/indecisive VTE.

Also, I have not had good luck with the IdPhi methods. Maybe because our maze has no walls, so trajectories can be kind of ... snakey? Hard to say. But that's why we used clustering on PCA projections of the trajectory shapes. I'm pretty pleased with how well it has worked, but acknowledge that it's not as straightforward. I would gladly have used an IdPhi-based classification if I were more confident in its labeling!

For the neural data, I did my best lol. I agree that 3 rats is a small n, but, with lockdowns, illnesses, and pressure to graduate continually mounting - it is what it is. Plus I knew I was hoping to switch fields and didn't want to have unpublished data lingering in the back of my mind.

My advisors and I agreed that following the recommendations of Saravanan, 2020 - using the hierarchical bootstrap to control our false positive rate for non-independent samples - was a (hopefully) good compromise for low-n ephys analysis, but, it's a decision worth critiquing. I tried to limit our conclusions to things that supplemented the behavioral story we were telling, and tried to be careful not to overstate our claims. Fair to take it with a grain of salt, though.

I like the question about sequences. I think hippocampal theta sequences are constantly occurring during navigation and, as in Kay 2020, switch between representing possible options (at least prior to decisions). We know mPFC and HPC sequences sometimes correlate, and my guess is that those correlations would be stronger during deliberative VTE. I bet the mPFC either doesn't form reliable sequences or its rhythms don't sync with HPC on uncertain/indecisive VTE. I'm not sure if the "synchrony" would be theta or beta based, though - I had some prelim data to suggest that trial-level mPFC-HPC coherence is best aligned with choice points in the beta band, while theta coherence fluctuated on longer timescales (multi-trial).

The last thing I'll comment on is the sensory vs deliberative VTE idea. I think taking in sensory info could be an obvious source of VTE for some sensory discrimination tasks. But I will say that our sensory environment was pretty lackluster, and never changed. If the rats were really just looking for visual cues, I would not expect VTE rates to fluctuate in any orderly way, but they do for a variety of tasks on these mazes. Just my 2 cents!

@jessetm
All this makes perfect sense! Thanks for taking the time to answer so clearly and thoroughly :)
@Andrewpapale @dimokaramanlis