CNS*2020 Online has ended
Welcome to the Sched instance for CNS*2020 Online! Please read the instruction document on detailed information on CNS*2020.
Back To Schedule
Sunday, July 19 • 9:00pm - 10:00pm
P57: Motor Cortex Encodes A Value Function Consistent With Reinforcement Learning

Log in to save this to your schedule, view media, leave feedback and see who's attending!

Feedback form is now closed.
Venkata Tarigoppula, John Choi, John Hessburg, David McNiel, Brandi Marsh, Joseph Francis

Reinforcement learning (RL) theory provides a simple model that can help explain many animal behaviors. RL models have been very successful in describing the neural activity in multiple brain regions and at several spatiotemporal scales ranging from single units up to hemodynamics during the learning process in animals including humans. A key component of RL is the value function, which captures the expected, temporally discounted reward, from a given state. A reward prediction error occurs when there is a discrepancy between the value function and actual reward, and this error is used to drive learning. The value function can also be modified by the animal’s knowledge and certainty of its environment. Here we show that the bilateral primary motor cortical (M1) neural activity in non-human primates (Rhesus and Bonnet macaques either sex) encodes a value function in line with temporal difference RL. M1 responds to the delivery of unpredictable reward (unconditional stimulus (US)), and shifts its value related response earlier in a trial, becoming predictive of expected reward, when reward is predictable due to the presence of an explicit cue (conditional stimulus (CS)). This is observed in tasks performed manually or observed passively and in tasks without an explicit CS, but with a predictable temporal reward environment. M1 also encodes the expected reward value in a multiple reward level CS-US task. Here we extend the Microstimulus temporal difference RL model (MSTD), reported to accurately capture RL related dopaminergic activity, to account for both phasic and tonic M1 reward-related neural activity in a multitude of tasks, during manual trials, as well as observational trials. This information has implications towards autonomously updating brain-machine interfaces.


Venkata Tarigoppula

Biomedical Engineering, University of Melbourne

Sunday July 19, 2020 9:00pm - 10:00pm CEST
Slot 06