Rafal BogaczNeurostars topic for Q&AMuch evidence suggests that some dopaminergic neurons respond to unexpected rewards, and computational models have suggested that these neurons encode reward prediction error, which drives learning about rewards. However, these models do not explain recently observed diversity of dopaminergic responses, and dopamine function in action planning, evident from movement difficulties in Parkinson’s disease. The presented work aims at extending existing models to account for these data. It proposes that a more complete description of dopaminergic activity can be achieved by combining reinforcement learning with elements of other recently proposed theories including active inference.
The presented model describes how the basal ganglia network infers actions required to obtained reward using Bayesian inference. The model assumes that a likelihood of reward given action in encoded by the goal-directed system, while the prior probability of making a particular action in a given context is provided by the habit system. It is shown how the inference of the optimal action can be achieved through minimization of free-energy, and how this inference can be implemented in a network with an architecture bearing a striking resemblance to the known anatomy of the striato-dopaminergic circuit. In particular, this network includes nodes encoding prediction errors, which are connected with other nodes in the network in a way resembling the “ascending spiral” structure of dopaminergic connections.
In the proposed model, dopaminergic neurons projecting to different parts of the striatum encode errors in predictions made by the corresponding systems within the basal ganglia. These prediction errors are equal to differences between rewards and expectations in the goal-directed system, and to differences between the chosen and habitual actions in the habit system. The prediction errors enable learning about rewards resulting from actions and habit formation. During action planning, the expectation of reward in the goal-directed system arises from formulating a plan to obtain that reward. Thus dopaminergic neurons in this system provide feedback on whether the current motor plan is sufficient to obtain the available reward, and they facilitate action planning until a suitable plan is found. Presented models account for dopaminergic responses during movements, effects of dopamine depletion on behaviour, and make several experimental predictions.
The full paper describing this work is available at:
https://elifesciences.org/articles/53262