Flexible decision-making is related to strategy learning, vicarious trial and error, and medial prefrontal rhythms during spatial set-shifting

Peer-reviewed scientific journal publishing basic neuroscience research in the areas of neuronal plasticity, learning and memory

This ⬆️ is happening tomorrow, Friday 22nd Nov. Feel free to read the paper and add your comments or questions any time in our asynchronous discussion!
#JournalClub
I'm wrapping something else up rn, but I've read it and will add my comments soon! Anyone else should feel free to add comments or questions in answer to the main post!

my (personal) summary and then comments for this #JournalClub

any corrections, comments, additional questions are welcome, especially from the first author @jessetm

1/7
The main goal of this paper is to test if #VTEs (Vicarious Trial-and Error) and medial prefrontal cortex LFP relate to navigation behaviour parameters such as behavioural flexibility, performance and strategy use, during allocentric* navigation.

VTEs are a behaviour that rodents and humans do at choice points, looking alternatively at the different available options before choosing one (check video below). They have been studied mostly during response-based tasks (when the subjects have to learn a body-oriented response or sequence of responses to the reward). From that research, two possible roles for VTEs have been suggested: deliberation (weighing down the available options) or uncertainty (hesitation).

The current paper aims to test which is the most likely role of these two, by having a task involving a lot of deliberation and a lot of uncertainty (protocol explained below).
The main conclusion is that these two VTE types actually exist, which means VTEs should not just be interpreted as a marker of behavioural flexibility or deliberation. There is are also some interesting findings about different LFP rhythms in the medial prefrontal cortex being stronger during different types of behaviours (explained below).
(*) (allocentric = based on an external reference frame, like the Water maze task)

1/7
cc: @Andrewpapale, @drdrowland, feel free to add your comments /questions anywhere you want!

2/7
The task: rats had to find the rewarded arm on a plus maze in two different types of tasks:

  • A place task where the goal was always at the same location within a block but the rat left from a different start, requiring flexible trajectories to the goal
  • An alternation task where the goal alternated between two locations and the start also alternated (so, an allocentric version of the classical alternation task)
  • Each sequence of sessions consisted in 3 blocks, each with a different task rule (place to one goal, place to the other, alternation)

2/7

3/7
Some important definitions that are used throughout the paper
yes, these might be quite detailed, writing this helped me understand them

  • strategy likelihood: trial by trial time series of strategy likelihood, using an existing algorithm (Maggi et al., 2024), this relies on comparing the rat's decisions with a model of a perfect decision-maker using each strategy.
  • learning point: trial when the target strategy became the most likely (it splits blocks into exploration and exploitation
  • flexibility score: absolute difference in strategy likelihoods from trial t − 1 to trial t, summed across strategies then normalized by median absolute deviation - in other words, it should be higher if the strategy used changed and lower if the rat keeps using the same strategy
  • flexible periods (different from flexibility score) = trials around the learning point, not trials at the end of a block, trials with flexibility score in top 60%
  • choice accuracy: same as performance or choice outcome
  • VTE: Vicarious Trial-and-Error, detected following method of Kidder et al., 2024: head is tracked via DeepLabCut, trajectories are aligned and scaled then projected in principal component space and clustered in two clusters using "hierarchical agglomerative clustering". Additional VTEs were found with another measure (combination of z-ln(idphi) and position crossing criteria).

3/7

Tracking subjects’ strategies in behavioural choice experiments at trial resolution

A new Bayesian algorithm for tracking subjects’ choice strategies on every trial reveals when subjects learn and what they tried while doing so, providing strong evidence that reward- and loss-driven exploration change independently.

eLife

4/7
Summary of the main findings on VTEs & flexibility score:

  • VTEs are more likely to happen during correct trials
  • VTEs are equally likely to happen for both strategies (even though the alternation strategy was apparently easier)
  • VTEs are not more numerous at the end of a block (when rat knows the new rule) than at the start (maybe because it would take more trials for the behaviour to become automatized?)
  • VTEs are increased around learning points, supporting the deliberation hypothesis
  • flexibility score also increases around learning points
  • VTEs during incorrect trials are associated to lower flexibility scores
  • trials with VTEs during flexible periods have increased choice accuracy compared to trials with VTEs during inflexible periods -> interpreted as two types of VTEs, one reflecting deliberation and one reflecting uncertainty. (is the same difference in performance obtained for trials without VTES?)

4/7
(I will add alt-text to these in a bit)

5/7 Some results from the mPFC (anterior prelimbic cortex) LFP recordings:

  • decreased gamma power on correct trials
  • increased beta and theta during VTE trials
  • increased gamma post-learning point (= exploitation mode)
  • no significant difference in any band depending on trial outcome (but remember that the time window analysed was around the choice point; it is possible that the rat has already made its decision, and it is likely that activity differences would be seen at the reward location)

5/7

6/7
In summary, performance, VTE rate and flexibility score all increased around learning points, VTEs are more present during correct trials, and the rats are likely to be in flexible mode when they are doing a VTE during a correct trial. They can also do VTES during incorrect & inflexible trials (e.g. sticking to a strategy).

=> Coming back to our original question about the role of VTES – deliberation or uncertainty – this shows that the two types are there on different trials and that VTES are not necessarily related to behavioural flexibility!

I am not sure how to summarise the LFP results at this stage but you can have a look at the very detailed discussion in the paper!

6/7

7/7 My comments and questions

Overall, I really appreciate this paper which, in my opinion, addresses some of the hard questions of spatial cognition in a pretty robust manner. I like that the tasks used are allocentric and quite demanding because these are the kinds of tasks I’m interested in and they are likely to engage the hippocampus. It’s also nice to see that the rats were pretty good at the task. And I think VTEs are fascinating and we really don’t know enough about them at this stage – it is nice to see this co-existence of two types of VTES, and it reminds us not to over-interpret everything (VTEs =/= flexibility).

Some of the measures were a little hard to understand at first read, and some of the results might appear a bit circular (the link between VTEs and performance , VTEs and learning points, learning points and performance.. which is the cause and which is the consequence??) but all the information is available for the reader to make up their own mind.

I have some questions, mostly for the author (@jessetm) but anyone should feel free to answer:

1). How long did the rats take to learn each task and then to do task switches?
2). Fig 2b, how come the performance ("accuracy" drops so quickly after the learning point? Shouldn't it be high for at least 5-10 more trials after the learning point?
3). One of the clearest results is that trials with VTEs are more likely to be correct than those without VTEs. Since this is a visual task I wonder if this simply shows that rats need to gather visual info (looking around) to know where the goal is, and it might not have much to do with actual deliberation.
4). Were the VTEs different-looking (e.g. stronger or weaker movement) for deliberative vs uncertain vtes? What about number of hesitations for a given VTE (left-right, left-right-left etc.)?
5). Would we expect to see theta sequences with similar properties for the deliberative VTEs vs the uncertainty VTEs??
6). Related: it seems that the mPFC theta is higher on VTE trials, but is that the case for both types of VTE
7). Shouldn’t fig 8 have some form of multiple comparison correction across those 12 tests (maybe it doesn’t apply here for some reason)?

Thank you!

7/7 THE END (for me)

Thanks for the writeup @elduvelle_neuro and @jessetm for the great work!

I did not read the paper in depth for a journal club, but I was completely unaware of the VTE literature and I find it fascinating.

I think we are observing similar phenomena in freely moving mice performing perceptual decision-making tasks. Some mice have inherent biases towards turning to a particular side. When a stimulus is presented that indicates the opposite side is correct, they may start turning towards their favorite side, and quickly "overwrite" the initial decision by making a strong head turn to the correct side.

We think that this head-turning signal indicates stimulus detection by the mice, and we can use it to decode trial outcome quite decently. I was very intrigued to see the same in the rat data of @jessetm

@dimokaramanlis Thanks for the input! I am fascinated by VTEs and always appreciate more examples of them. I don't suppose you have a video..

I wondered about mice VTEs some time ago and indeed @adredish sees them in their restaurant row task: https://neuromatch.social/@adredish/112518586553330280

And if I remember well some of the earlier studies of #VTEs were done in a visual discrimination task. To learn more on them, check this great review: Vicarious trial and error also from #RedishLab

@jessetm

Redish Lab (@[email protected])

@elduvelle_neuro Mice definitely do #VTE (Vicarious Trial and Error) behaviors. We've seen them in our Restaurant Row task. (Take a look at the various @[email protected] papers. They VTE in the offer zone as they learn to precommit to avoid the sunk costs.) Primates do as well, but in primates they are generally saccade-fixate-saccade sequences rather than actual head movements. #DecisionMaking #SpatialCognition

Neuromatch Social

Very useful resources, I definitely have some reading to do...

No video at hand right now, but I'll compile one eventually as a visual sanity check for our method of separating high- from low-velocity head turns.

@dimokaramanlis @elduvelle_neuro @jessetm

It is important to distinguish the computations going on within the subject.

Technically, #VTE is a behavior of pausing attentively, orienting and re-orienting (as originally defined by Gentry, Muenzinger, and Tolman in the 1930s). But what we really care about are the neurophysiological computations that behavior is reflecting at the moment.
(We want to take the subject's POV, not the experimenter's.)

There are currently (to my knowledge) two computations that have been shown to create attentive pause and reorientation behaviors (i.e. VTE).

(1) Sensory discrimination --- the subject is turning to focus sensory signals on sensory receptors.

(2) Planning systems --- the subject pauses to send hippocampal sweeps down the two options. What causes the reorientation is unknown, but I suspect that it is action chains that are halted.

Both the timing of each of these (in what situations they appear) and the neural circuits involved in each are different.

See the paper @elduvelle cites (https://www.nature.com/articles/nrn.2015.30) for a review, including a discussion of this issue. A good example of the distinction is in Bett et al (https://www.frontiersin.org/journals/behavioral-neuroscience/articles/10.3389/fnbeh.2012.00070/full).

PS. Yes, mice and rats both definitely do VTE. Primates do a thing called SFS (saccade-fixate-saccade), which appears similar (likely with two meanings, much like rodents).

Vicarious trial and error - Nature Reviews Neuroscience

Sometimes when rats come to a location where a choice has to be made, they pause and look around, a behaviour that has been termed 'vicarious trial and error' (VTE). Redish reviews this behaviour and its underlying neurophysiology, and argues that VTE is probably the behavioural phenotype of a deliberative process.

Nature
@adredish @dimokaramanlis @jessetm @elduvelle we should all do a symposium on VTEs at some point :)
@elduvelle_neuro First off just want to thank you so much for the thoughtful engagement with the work. I will try and provide some responses soon when I can find a bit of time, but, generally, I'm happy that people seem to be taking things in the way I was hoping they would be interpreted! @Andrewpapale @dimokaramanlis

@jessetm no pressure! And please don't hesitate to correct me if I wrote anything wrong, which is totally possible! :)

@Andrewpapale @dimokaramanlis

@elduvelle_neuro Here are some thoughts in response to some of your questions (with some responses to @Andrewpapale sprinkled in. Also tagging @dimokaramanlis in case you're still interested)

We trained the rats on both strategies from the start. Each training session exposed them to forced-choice trials of both types. We also essentially gave them do-over trials early in training. Consecutive trials started at the same arm until they made the correct choice (instead of random start arm assignment). It took the rats anywhere from 1 - 6 weeks to learn the task (I think the average was about 10 training days). Usually an additional few days after surgery recovery as well.

An important thing to keep in mind is that the block durations are based on a running tally of correct choices. We decided that if they got 12/15 trials correct, they had learned the current strategy. We claimed, however, that the actual learning process occurred earlier in the block, before they were consistently making correct choices at the block switch.

That's where the Maggi et al. algorithm came in. I'm going to direct folks to their paper for more details because it's a bit of work to explain. What I will say is that each strategy is evaluated independently of all other strategies, so, no constraint that the posteriors sum to 1 on a trial-by-trial basis. I think this makes sense - different strategies are not, in the general sense, necessarily orthogonal (even though some are, like go east and go west, which do indeed sum to 1).

It's also worth noting that I had designed this task with the idea that strategy learning should be disentangled from task structure (e.g. block switches) prior to this algorithm's development, so I was very lucky that it came out when it did! (Although a number of other algorithms with the same premise already existed in some form, this one was just extremely generalizable and easy to implement).

Anyway, the accuracy declines to pre-learning point levels within 7 trials of the learning point because most rats finish their block by then. If they haven't, they often seem to think they have and start trying something new (that's anecdotal, I didn't analyze it directly).

Regarding the circularity of definitions:
The cause and effect between VTE, accuracy, and flexibility are, as mentioned, not well addressed here (for the most part). We just claim they are related. I actually think we did a pretty good job of avoiding circularity, though. VTE is defined exclusively by trajectory shape and does not rely on any other behavioral measure. Learning point is defined by strategy likelihood, which is defined exclusively by choice history and *not * choice outcome. Accuracy is defined exclusively by choice outcome. Flexibility, being derived from strategy likelihood, is also not defined by choice outcome, or VTE occurrence. None of these things were forced to align, but they did (usually).

What I think is most interesting is that these measures did not always align. Some VTE trials result in incorrect choices, sometimes flexibility spiked before from the learning point, sometimes VTE occurred while perseverating on a prior strategy, etc. That's really the big point - there were enough instances of these measures falling out of sync that we figured it was not quite right to consider them the same way at all times. Which led us to conclude that there were multiple types of VTE.

I did look for evidence of multiple VTE types based on trajectory shape alone, and do think there is more work that could be done there, but it wasn't consistent enough across rats to see any clear trends like differences in shape for deliberative vs uncertain/indecisive VTE.

Also, I have not had good luck with the IdPhi methods. Maybe because our maze has no walls, so trajectories can be kind of ... snakey? Hard to say. But that's why we used clustering on PCA projections of the trajectory shapes. I'm pretty pleased with how well it has worked, but acknowledge that it's not as straightforward. I would gladly have used an IdPhi-based classification if I were more confident in its labeling!

For the neural data, I did my best lol. I agree that 3 rats is a small n, but, with lockdowns, illnesses, and pressure to graduate continually mounting - it is what it is. Plus I knew I was hoping to switch fields and didn't want to have unpublished data lingering in the back of my mind.

My advisors and I agreed that following the recommendations of Saravanan, 2020 - using the hierarchical bootstrap to control our false positive rate for non-independent samples - was a (hopefully) good compromise for low-n ephys analysis, but, it's a decision worth critiquing. I tried to limit our conclusions to things that supplemented the behavioral story we were telling, and tried to be careful not to overstate our claims. Fair to take it with a grain of salt, though.

I like the question about sequences. I think hippocampal theta sequences are constantly occurring during navigation and, as in Kay 2020, switch between representing possible options (at least prior to decisions). We know mPFC and HPC sequences sometimes correlate, and my guess is that those correlations would be stronger during deliberative VTE. I bet the mPFC either doesn't form reliable sequences or its rhythms don't sync with HPC on uncertain/indecisive VTE. I'm not sure if the "synchrony" would be theta or beta based, though - I had some prelim data to suggest that trial-level mPFC-HPC coherence is best aligned with choice points in the beta band, while theta coherence fluctuated on longer timescales (multi-trial).

The last thing I'll comment on is the sensory vs deliberative VTE idea. I think taking in sensory info could be an obvious source of VTE for some sensory discrimination tasks. But I will say that our sensory environment was pretty lackluster, and never changed. If the rats were really just looking for visual cues, I would not expect VTE rates to fluctuate in any orderly way, but they do for a variety of tasks on these mazes. Just my 2 cents!

@jessetm
All this makes perfect sense! Thanks for taking the time to answer so clearly and thoroughly :)
@Andrewpapale @dimokaramanlis

@elduvelle_neuro @jessetm

I'll read it and I can leave comments here next Friday

@elduvelle_neuro Oh wow! For some reason I am nervous about this, even though the whole point of publishing is to have people carefully read the work haha. I will try to follow along with the conversation and answer questions or offer whatever insights I can. Thanks for taking an interest!

@jessetm oh no you have nothing to worry about! 🤗 The mastodon crowd is very chill. This is more a way for us to make ourselves read a new paper & have a nice discussion rather than trying to find flaws in it!

If you're around to answer potential questions (doesn't have to be instantly) that will certainly be the cherry on the cake! 🙏

Edited because I mixed up your reviewers for this paper (unknown) and for your other VTE paper (where reviewers are public)