Venkata Tarigoppula, John Choi, John Hessburg, David McNiel, Brandi Marsh, Joseph FrancisReinforcement learning (RL) theory provides a simple model that can help explain many animal behaviors. RL models have been very successful in describing the neural activity in multiple brain regions and at several spatiotemporal scales ranging from single units up to hemodynamics during the learning process in animals including humans. A key component of RL is the value function, which captures the expected, temporally discounted reward, from a given state. A reward prediction error occurs when there is a discrepancy between the value function and actual reward, and this error is used to drive learning. The value function can also be modified by the animal’s knowledge and certainty of its environment. Here we show that the bilateral primary motor cortical (M1) neural activity in non-human primates (Rhesus and Bonnet macaques either sex) encodes a value function in line with temporal difference RL. M1 responds to the delivery of unpredictable reward (unconditional stimulus (US)), and shifts its value related response earlier in a trial, becoming predictive of expected reward, when reward is predictable due to the presence of an explicit cue (conditional stimulus (CS)). This is observed in tasks performed manually or observed passively and in tasks without an explicit CS, but with a predictable temporal reward environment. M1 also encodes the expected reward value in a multiple reward level CS-US task. Here we extend the Microstimulus temporal difference RL model (MSTD), reported to accurately capture RL related dopaminergic activity, to account for both phasic and tonic M1 reward-related neural activity in a multitude of tasks, during manual trials, as well as observational trials. This information has implications towards autonomously updating brain-machine interfaces.