A Learning Theory for Reward-Modulated Spike-Timing-Dependent Plasticity with Application to Biofeedback
Open Access
- 10 October 2008
- journal article
- research article
- Published by Public Library of Science (PLoS) in PLoS Computational Biology
- Vol. 4 (10) , e1000180
- https://doi.org/10.1371/journal.pcbi.1000180
Abstract
Reward-modulated spike-timing-dependent plasticity (STDP) has recently emerged as a candidate for a learning rule that could explain how behaviorally relevant adaptive changes in complex networks of spiking neurons could be achieved in a self-organizing manner through local synaptic plasticity. However, the capabilities and limitations of this learning rule could so far only be tested through computer simulations. This article provides tools for an analytic treatment of reward-modulated STDP, which allows us to predict under which conditions reward-modulated STDP will achieve a desired learning effect. These analytical results imply that neurons can learn through reward-modulated STDP to classify not only spatial but also temporal firing patterns of presynaptic neurons. They also can learn to respond to specific presynaptic firing patterns with particular spike patterns. Finally, the resulting learning theory predicts that even difficult credit-assignment problems, where it is very hard to tell which synaptic weights should be modified in order to increase the global reward for the system, can be solved in a self-organizing manner through reward-modulated STDP. This yields an explanation for a fundamental experimental result on biofeedback in monkeys by Fetz and Baker. In this experiment monkeys were rewarded for increasing the firing rate of a particular neuron in the cortex and were able to solve this extremely difficult credit assignment problem. Our model for this experiment relies on a combination of reward-modulated STDP with variable spontaneous firing activity. Hence it also provides a possible functional explanation for trial-to-trial variability, which is characteristic for cortical networks of neurons but has no analogue in currently existing artificial computing systems. In addition our model demonstrates that reward-modulated STDP can be applied to all synapses in a large recurrent neural network without endangering the stability of the network dynamics. A major open problem in computational neuroscience is to explain how learning, i.e., behaviorally relevant modifications in the central nervous system, can be explained on the basis of experimental data on synaptic plasticity. Spike-timing-dependent plasticity (STDP) is a rule for changes in the strength of an individual synapse that is supported by experimental data from a variety of species. However, it is not clear how this synaptic plasticity rule can produce meaningful modifications in networks of neurons. Only if one takes into account that consolidation of synaptic plasticity requires a third signal, such as changes in the concentration of a neuromodulator (that might, for example, be related to rewards or expected rewards), then meaningful changes in the structure of networks of neurons may occur. We provide in this article an analytical foundation for such reward-modulated versions of STDP that predicts when this type of synaptic plasticity can produce functionally relevant changes in networks of neurons. In particular we show that seemingly inexplicable experimental data on biofeedback, where a monkey learnt to increase the firing rate of an arbitrarily chosen neuron in the motor cortex, can be explained on the basis of this new learning theory.Keywords
This publication has 49 references indexed in Scilit:
- Behavioral dopamine signalsPublished by Elsevier ,2007
- Volitional control of neural activity: implications for brain–computer interfacesThe Journal of Physiology, 2007
- Spike Timing-Dependent Synaptic Depression in theIn VivoBarrel Cortex of the RatJournal of Neuroscience, 2007
- Computational Aspects of Feedback in Neural CircuitsPLoS Computational Biology, 2007
- Gradient Learning in Spiking Neural Networks by Dynamic Perturbation of ConductancesPhysical Review Letters, 2006
- Optimal Spike-Timing-Dependent Plasticity for Precise Action Potential Firing in Supervised LearningNeural Computation, 2006
- Real-Time Computing Without Stable States: A New Framework for Neural Computation Based on PerturbationsNeural Computation, 2002
- A neuronal analogue of state-dependent learningNature, 2000
- Input synchrony and the irregular firing of cortical neuronsNature Neuroscience, 1998
- Reliability of Spike Timing in Neocortical NeuronsScience, 1995