CNS*2020 Online has ended
Welcome to the Sched instance for CNS*2020 Online! Please read the instruction document on detailed information on CNS*2020.
Back To Schedule
Monday, July 20 • 8:00pm - 9:00pm
P46: Faster Gradient Descent Learning

Log in to save this to your schedule, view media, leave feedback and see who's attending!

Feedback form is now closed.
Meeting link: https://meet.google.com/cku-izhp-ogg

Ho Ling Li
, Mark van Rossum

Back-propagation is a popular machine learning algorithm that uses gradient descent in training neural networks for supervised learning. In stochastic gradient descent a cost function C is minimized by adjusting the weights wij as Δwij = - η(∂C/∂wij) at every training sample. However, learning with back- propagation can be very slow. A number of algorithms have been developed to speed up convergence and improve robustness of the learning. One way is to start with a high learning rate and anneal it to lower values at the end of learning. Other approaches combine past updates with the current weight update, such as momentum [1] and Adam [2]. These algorithms are now standard in most machine learning studies, but are complicated to implement biologically.

Inspired by synaptic competition in biology, we have come up with a simple and local gradient descent optimization algorithm that can reduce training time, with no demand on past information. Our algorithm works similarly to the traditional gradient descent used in back-propagation, except that instead of having a uniform learning rate across all synapses, the learning rate depends on the current connection weights of individual synapses and the L2 norm of the weights of each neuron.

Our algorithm encourages neurons to form strong connections to a handful of neurons of their neighbouring layers by assigning higher learning rate ηij to synapses with bigger weights wij: Δwij = -η0(|wij|+α)/(|| **w j**||+α)(∂C/∂wij), where i represents the indices of the post-synaptic neurons and j represents the indices of the pre-synaptic neurons. The parameter α is set at the range of values such that at the beginning of training α > ||wj|| ≫ wij so that all synapses have learning rate close to η0. As learning progresses, the learning rate of large synapses stays close to η0, while the learning rate of small synapses decreases. Here, ||wj|| is summing over all the post-synaptic weights of a pre-synaptic neuron, leading to each pre- synaptic neuron having strong connections to a limited amount of post-synaptic neurons only. However, our algorithm also works by replacing this term with ||wi||, which promotes every post-synaptic neuron to form strong connections to small number of pre-synaptic neurons instead. We note that the proposed modulation of learning can easily be imagined to occur in biology, as it only requires post-synaptic factors and requires no memory.

We have tested our algorithm with back-propagation networks with one hidden layer consisting of 100 units to classify the MNIST handwritten digit dataset with 96% accuracy. Compared to networks equipped with the best constant learning rate, networks train 24% faster with our algorithm. The improvement is even greater with smaller networks: with 50 units in the hidden layer, our algorithm shortens the training time by 40% with respect to the best constant learning rate. Preliminary results also show that our algorithm is comparable to Adam for the small networks that we have tested. Thus, our algorithm has shown the possibility of a local and biological gradient descent optimization algorithm that only requires online information.

[1] Plaut D, Nowlan S, Hinton G. Experiments on Learning by Back Propagation. 1986. Technical Report CMU-CS-86-126.

[2] Kingma DP, Ba J. Adam: A Method for Stochastic Optimization. 2014 Dec. arXiv:1412.6980.


Ho Ling Li

School of Psychology, University of Nottingham
Hello, my research interests are synaptic plasticity and memory formation by using computational modelling, and biologically-inspired machine learning algorithms.

Monday July 20, 2020 8:00pm - 9:00pm CEST
Slot 05