Reinforcement Learning
Reinforcement learning refers to a type of machine learning algorithm. An agent explores an environment and at the end receives a reward , which may be either positive or negative. In effect, the agent is told whether he was right or wrong, but is not told how. Examples include playing a game of chess (you don't know whether you've won or lost until the very end) or a waitress in a restaurant (she has to wait for the end of the meal before she finds out whether or not she receives a tip).
Sewell (2006)
"The most typical way to train a classifier is to present an input, compute its tentative category label, and use the known target category label to improve the classifier. For instance, in optical character recognition, the input might be an image of a character, the actual output of the classifier the category label “R,” and the desired output a “B.” In reinforcement learning or learning with a critic , no desired signal is given; instead, the only teaching feedback is that the tentative category is right or wrong. This is analogous to a critic who merely states that something is right or wrong, but does not say specifically how it is wrong. In pattern classification, it is most common that such reinforcement is binary—either the tentative decision is correct or it is not. How can the system learn from such nonspecific feedback?"
Duda, Hart and Stork (2001), page 17
"The problem of reinforcement learning , [...] is the most general of the three categories. Rather than being told what to do by a teacher, a reinforcement learning agent must learn from reinforcement .[The term reward [...] is a synonym for reinforcement .] For example, the lack of a tip at the end of the journey (or a hefty bill for rear-ending the car in front) give the agent some indication that its behavior is undesirable. Reinforcement learning typically includes the subproblem of learning how the environment works."
Russell and Norvig (2003), page 650
"In many domains the learner is not told the correct response, it only receives reward or punishment. Often reward is substantially delayed. For example in chess you only find out whether you won or lost at the very end. Learning in such domains is called reinforcement learning. Reinforcement learning can be very powerful: The current world champion backgammon player is a neural network that learnt by self-play using reinforcement learning techniques. However, reinforcement learning is poorly understood in comparison to supervised learning. We are looking for better models of reinforcement learning. In the meantime, we have developed some new techniques for learning in delayed reward games. They are currently being tested on backgammon and chess."
unknown origin
"Reinforcement learning refers to a class of problems in machine learning which postulate an agent exploring an environment in which the agent perceives its current state and takes action s. The environment, in return, provides a reward (which can be positive or negative). Reinforcement learning algorithms attempt to find a policy for maximizing cumulative reward for the agent over the course of the problem."
Wikipedia (2006)
Links
Bibliography
ACKLEY, D. and M. LITTMAN, 1992. Interactions between learning and evolution . Artificial Life II. [Cited by 241 ] (17.12/year)
ANDERSON, C.W., 1987. Strategy learning with multilayer connectionist representations . Proceedings of the Fourth International Workshop on Machine …. [Cited by 93 ] (4.88/year)
ASADA, M., et al. , 1995. Vision-based reinforcement learning for purposive behavioracquisition . Robotics and Automation, 1995. Proceedings., 1995 IEEE …. [Cited by 56 ] (5.06/year)
ASADA, M., et al. , 1996. Purposive Behavior Acquisition for a Real Robot by Vision-Based Reinforcement Learning . Machine Learning. [Cited by 155 ] (15.38/year)
ASADA, M., E. UCHIBE and K. HOSODA, 1999. … for mobile robots in dynamically changing real worlds via vision-based reinforcement learning and … . Artificial Intelligence. [Cited by 58 ] (8.20/year)
BAIRD, L. and A.W. MOORE, 1999. Gradient descent for general reinforcement learning . Advances in Neural Information Processing Systems. [Cited by 101 ] (14.27/year)
BAIRD, L.C., 1995. Residual algorithms: Reinforcement learning with function approximation . Proceedings of the Twelfth International Conference on …. [Cited by 176 ] (15.89/year)
BARTO, A.G. and S. MAHADEVAN, 2003. Recent Advances in Hierarchical Reinforcement Learning . Discrete Event Dynamic Systems. [Cited by 80 ] (26.00/year)
BARTO, A.G., R.S. SUTTON and P.S. BROUWER, 1981. Associative search network: A reinforcement learning associative memory . Biological Cybernetics. [Cited by 74 ] (2.95/year)
BARTO, A.G., S.J. BRADTKE and S.P. SINGH, 1995. Learning to act using real-time dynamic programming . Artificial Intelligence. [Cited by 434 ] (39.18/year)
BERTSEKAS, D.P. and J.N. TSITSIKLIS, 1996. Neuro-Dynamic Programming - ?num=100&hl=en&lr=&ie=UTF-8&cluster=14961636651510479603">group of 2 ». Athena Scientific. [Cited by 1192 ] (118.29/year)
BERTSEKAS, D.P., 1987. Dynamic programming: deterministic and stochastic models. Prentice-Hall, Inc. Upper Saddle River, NJ, USA. [Cited by 605 ] (31.71/year)
BOYAN, J.A. and A.W. MOORE, 1995. Generalization in reinforcement learning: Safely approximating the value function . Advances in Neural Information Processing Systems. [Cited by 191 ] (17.24/year)
BOYAN, J.A. and M.L. LITTMAN, 1994. Packet routing in dynamically changing networks: A reinforcement learning approach . Advances in Neural Information Processing Systems. [Cited by 146 ] (12.09/year)
BRADTKE, S.J. and M.O. DUFF, 1994. Reinforcement learning methods for continuous-time Markov Decision Problems . NIPS. [Cited by 84 ] (6.96/year)
BRADTKE, S.J., 1993. Reinforcement learning applied to linear quadratic regulation . Advances in Neural Information Processing Systems. [Cited by 50 ] (3.82/year)
CAMERER, C.F. and T.H. HO, 1999. Experience-weighted attraction learning in games: A unifying approach . Econometrica. [Cited by 104 ] (14.70/year)
CHAPMAN, D. and L.P. KAELBLING, 1991. Input generalization in delayed reinforcement learning: An algorithm and performance comparisons. Proceedings of the Twelfth International Joint Conference on …. [Cited by 147 ] (9.75/year)
CHRISMAN, L., 1992. Reinforcement learning with perceptual aliasing: the perceptual distinctions approach. . PROC TENTH NATL CONF ARTIF INTELL AAAI 92., AAAI, MENLO PARK …. [Cited by 146 ] (10.37/year)
CLAUS, C. and C. BOUTILIER, 1998. The dynamics of reinforcement learning in cooperative multiagent systems . Proceedings of the Fifteenth National Conference on …. [Cited by 235 ] (29.10/year)
CRITES, R. and A. BARTO, 1996. Improving elevator performance using reinforcement learning . Proceedings NIPS. [Cited by 259 ] (25.70/year)
CRITES, R.H. and A.G. BARTO, 1998. Elevator Group Control Using Multiple Reinforcement Learning Agents . Machine Learning. [Cited by 71 ] (8.79/year)
D?EROSKI, S., L. DE and K. DRIESSENS, 2001. Relational Reinforcement Learning . Machine Learning. [Cited by 115 ] (22.65/year)
DAYAN, P. and B.W. BALLEINE, 2002. Reward, motivation and reinforcement learning . Neuron. [Cited by 73 ] (17.91/year)
DAYAN, P. and G.E. HINTON, 1993. Feudal reinforcement learning . Advances in Neural Information Processing Systems. [Cited by 145 ] (11.09/year)
DIETTERICH, T.G., 1999. Hierarchical Reinforcement Learning with the MAXQ Value Function Decomposition . Arxiv preprint cs.LG/9905014. [Cited by 221 ] (31.23/year)
DIETTERICH, T.G., Machine Learning: Proceedings of the Fifteenth International …. The MAXQ method for hierarchical reinforcement learning . [Cited by 72 ] (?/year)
DOYA, K. and N. COMPUTATION, 2000. Reinforcement Learning in Continuous Time and Space . Neural Computation. [Cited by 95 ] (15.63/year)
EREV, I. and A.E. ROTH, 1998. Predicting How People Play Games: Reinforcement Learning in Experimental Games with Unique, Mixed … . The American Economic Review. [Cited by 441 ] (54.60/year)
GAMBARDELLA, L.M. and M. DORIGO, 1995. Ant-Q: A reinforcement learning approach to the traveling salesman problem . Proceedings of ML-95, Twelfth International Conference on …. [Cited by 125 ] (11.28/year)
GROSSBERG, S. and J.W. MERRILL, 1992. A neural network model of adaptively timed reinforcement learning and hippocampal dynamics. . Brain Res Cogn Brain Res. [Cited by 86 ] (6.11/year)
GULLAPALLI, V., 1990. A stochastic reinforcement learning algorithm for learning real-Valued functions. . Neural Networks. [Cited by 111 ] (6.90/year)
GULLAPALLI, V., J.A. FRANKLIN and H. BENBRAHIM, 1994. Acquiring robot skills via reinforcement learning . Control Systems Magazine, IEEE. [Cited by 66 ] (5.46/year)
HEGER, M., 1994. Consideration of risk in reinforcement learning . Proceedings of the Eleventh International Conference on …. [Cited by 40 ] (3.31/year)
HOLROYD, C.B. and M.G.H. COLES, 2002. The neural basis of human error processing: Reinforcement learning, dopamine, and the error-related … . Psychological Review. [Cited by 207 ] (50.77/year)
HU, J. and M.P. WELLMAN, 1998. Multiagent reinforcement learning: Theoretical framework and an algorithm . Proceedings of the Fifteenth International Conference on …. [Cited by 277 ] (34.30/year)
HUMPHRYS, M., 1997. Action selection methods using reinforcement learning . [Cited by 93 ] (10.25/year)
JAAKKOLA, T., S.P. SINGH and M.I. JORDAN, 1995. Reinforcement learning algorithm for partially observable Markov decision problems . Advances in Neural Information Processing Systems. [Cited by 126 ] (11.38/year)
KAELBLING, L.P., 1993. Learning in embedded systems. MIT Press Cambridge, MA, USA. [Cited by 295 ] (22.56/year)
KAELBLING, L.P., M.L. LITTMAN and A.W. MOORE, 1996. Reinforcement Learning: A Survey . Arxiv preprint cs.AI/9605103. [Cited by 1325 ] (131.49/year)
KEARNS, M. and S. SINGH, 2002. Near-Optimal Reinforcement Learning in Polynomial Time . Machine Learning. [Cited by 79 ] (19.38/year)
KELLEY, A.E., S.L. SMITH-ROE and M.R. HOLAHAN, 1997. Response-reinforcement learning is dependent on N-methyl-D-aspartate receptor activation in the … . Proceedings of the National Academy of Sciences. [Cited by 155 ] (17.08/year)
KITANO, H., et al. , 1997. RoboCup: The Robot World Cup Initiative . Proceedings of the first international conference on …. [Cited by 287 ] (31.62/year)
LAUER, M. and M. RIEDMILLER, lip;. An algorithm for distributed reinforcement learning in cooperative multi-agent systems . Proceedings of International Conference on Machine Learning, &h. [Cited by 54 ] (?/year)
LIN, C.T. and C.S.G. LEE, 1993. Reinforcement structure/parameter learning for neural-network-basedfuzzy logic control systems . Fuzzy Systems, 1993., Second IEEE International Conference …. [Cited by 113 ] (8.64/year)
LIN, L. and T. MITCHELL, 1992. Memory Approaches to Reinforcement Learning in Non-Markovian Domains . [Cited by 51 ] (3.62/year)
LIN, L.J., 1992. Reinforcement learning for robots using neural networks . [Cited by 170 ] (12.08/year)
LIN, L.J., 1992. Self-improving reactive agents based on reinforcement learning, planning and teaching . Machine Learning. [Cited by 318 ] (22.59/year)
LITTMAN, M. and J. BOYAN, 1993. A Distributed Reinforcement Learning Scheme for Network Routing . cs.rutgers.edu. [Cited by 69 ] (5.28/year)
LITTMAN, M.L. and C. SZEPESVARI, 1996. A generalized reinforcement-learning model: Convergence and applications . Proceedings of the 13th International Conference on Machine …. [Cited by 65 ] (6.45/year)
LITTMAN, M.L., 1994. Markov games as a framework for multi-agent reinforcement learning . Proceedings of the Eleventh International Conference on …. [Cited by 449 ] (37.18/year)
MAHADEVAN, S. and J. CONNELL, 1992. Automatic programming of behavior-based robots using reinforcement learning . Artificial Intelligence. [Cited by 373 ] (26.50/year)
MAHADEVAN, S., 1996. Average reward reinforcement learning: Foundations, algorithms, and empirical results . Machine Learning. [Cited by 94 ] (9.33/year)
MATARIC, M.J., 1997. Reinforcement Learning in the Multi-Robot Domain . Autonomous Robots. [Cited by 182 ] (20.05/year)
MCCALLUM, A.K., 1996. Reinforcement Learning with Selective Perception and Hidden State . [Cited by 198 ] (19.65/year)
MCCALLUM, R.A., 1995. Instance-based utile distinctions for reinforcement learning with hidden state . Proceedings of the Twelfth International Conference on …. [Cited by 101 ] (9.12/year)
MOORE, A.W. and C.G. ATKESON…, 1993. Prioritized sweeping: Reinforcement learning with less data and less real time . Machine Learning. [Cited by 134 ] (10.25/year)
MOORE, A.W. and C.G. ATKESON, 1993. Prioritized sweeping: Reinforcement learning with less data and less time . Machine Learning. [Cited by 153 ] (11.70/year)
MOORE, A.W. and C.G. ATKESON, 1995. The parti-game algorithm for variable resolution reinforcement learning in multidimensional state- … . Machine Learning. [Cited by 194 ] (17.51/year)
MORIARTY, D.E. and R. MIIKKULAINEN, 1996. Efficient reinforcement learning through symbiotic evolution . Machine Learning. [Cited by 131 ] (13.00/year)
MORIARTY, D.E., A.C. SCHULTZ and J.J. GREFENSTETTE, 1999. Evolutionary Algorithms for Reinforcement Learning . JAIR. [Cited by 70 ] (9.89/year)
PARR, R. and S. RUSSELL, 1998. Reinforcement learning with hierarchies of machines . Advances in Neural Information Processing Systems. [Cited by 167 ] (20.68/year)
PENG, J. and B. BHANU, 1998. Closed-loop object recognition using reinforcement learning . Pattern Analysis and Machine Intelligence, IEEE Transactions …. [Cited by 61 ] (7.55/year)
PUTERMAN, M.L., 1994. Markov Decision Processes: Discrete Stochastic Dynamic Programming. John Wiley & Sons, Inc. New York, NY, USA. [Cited by 995 ] (82.39/year)
RENNIE, J. and A. MCCALLUM, 1999. Using reinforcement learning to spider the web efficiently . Proceedings of the Sixteenth International Conference on …. [Cited by 128 ] (18.09/year)
RING, M.B., 1994. CONTINUAL LEARNING IN REINFORCEMENT ENVIRONMENTS . [Cited by 52 ] (4.31/year)
RUMMERY, G.A. and M. NIRANJAN, 1994. On-line Q-learning using connectionist systems . Cambridge University Engineering Department, CUED/F-INENG/TR. [Cited by 190 ] (15.73/year)
RUMMERY, G.A., 1995. Problem solving with reinforcement learning . Unpublished doctoral dissertation, Cambridge University. [Cited by 56 ] (5.06/year)
SANDHOLM, T.W. and R.W. CRITES, 1995. Multiagent Reinforcement Learning in the Iterated Prisoner's Dilemma . cs.wustl.edu. [Cited by 139 ] (12.55/year)
SANTAMARIA, J.C., R.S. SUTTON and A. RAM, 1997. Experiments with reinforcement learning in problems with continuous state and action spaces . Adaptive Behavior. [Cited by 104 ] (11.46/year)
SCHWARTZ, A., 1993. A reinforcement learning method for maximizing undiscounted rewards. Proceedings of the Tenth International Conference on Machine …. [Cited by 92 ] (7.04/year)
SINGH, S. and D. BERTSEKAS, 1997. Reinforcement learning for dynamic channel allocation in cellular telephone systems . Advances in Neural Information Processing Systems. [Cited by 115 ] (12.67/year)
SINGH, S., et al. , 2000. Convergence Results for Single-Step On-Policy Reinforcement-Learning Algorithms . Machine Learning. [Cited by 99 ] (16.29/year)
SINGH, S.P. and R.S. SUTTON, 1996. Reinforcement learning with replacing eligibility traces . Machine Learning. [Cited by 171 ] (16.97/year)
SINGH, S.P., 1992. Reinforcement learning with a hierarchy of abstract models . Proceedings of the Tenth National Conference on Artificial …. [Cited by 55 ] (3.91/year)
SINGH, S.P., T. JAAKKOLA and M.I. JORDAN, 1995. Reinforcement learning with soft state aggregation . Advances in Neural Information Processing Systems. [Cited by 85 ] (7.67/year)
SMART, W.D. and L.P. KAELBLING, 2000. Practical reinforcement learning in continuous spaces . Proceedings of the Seventeenth International Conference on …. [Cited by 84 ] (13.82/year)
STONE, P. and M. VELOSO, 1999. Team-partitioned, opaque-transition reinforcement learning . Proceedings of the third annual conference on Autonomous …. [Cited by 67 ] (9.47/year)
STONE, P. and R.S. SUTTON, 2001. Scaling reinforcement learning toward RoboCup soccer . Proceedings of the Eighteenth International Conference on …. [Cited by 77 ] (15.17/year)
SUBRAMANIAN, D., P. DRUSCHEL and J. CHEN, 1997. Ants and reinforcement learning: A case study in routing in dynamic networks . Proceedings of IJCAI-97, International Joint Conference on …. [Cited by 94 ] (10.36/year)
SUTTON, R., 1984. Temporal credit assignment in reinforcement learning . [Cited by 239 ] (10.83/year)
SUTTON, R.S. and A.G. BARTO, 1998. Reinforcement learning: an introduction . journals.cambridge.org. [Cited by 2790 ] (345.43/year)
SUTTON, R.S. and A.G. BARTO, 1998. Introduction to Reinforcement Learning . [Cited by 253 ] (31.32/year)
SUTTON, R.S. and A.G. BARTO, 1998. Reinforcement learning . MIT Press. [Cited by 659 ] (81.59/year)
SUTTON, R.S., 1991. Reinforcement learning architectures for animats . Proceedings of the first international conference on …. [Cited by 101 ] (6.70/year)
SUTTON, R.S., 1996. Generalization in reinforcement learning: Successful examples using sparse coarse coding . Advances in Neural Information Processing Systems. [Cited by 327 ] (32.45/year)
SUTTON, R.S., et al. , 2000. Policy gradient methods for reinforcement learning with function approximation . Advances in Neural Information Processing Systems. [Cited by 153 ] (25.18/year)
SUTTON, R.S., A. BARTO and R.J. WILLIAMS, 1992. Reinforcement learning is direct adaptive optimal control . IEEE Control Systems Magazine. [Cited by 72 ] (5.11/year)
SUTTON, R.S., D. PRECUP and S. SINGH, 1999. Between MDPs and semi-MDPs: A framework for temporal abstraction in reinforcement learning . Artificial Intelligence. [Cited by 237 ] (33.49/year)
TAN, M., 1993. Multi-agent reinforcement learning: Independent vs. cooperative agents . Proceedings of the Tenth International Conference on Machine …. [Cited by 269 ] (20.57/year)
THRUN, S. and A. SCHWARTZ, 1995. Finding structure in reinforcement learning . Advances in Neural Information Processing Systems. [Cited by 75 ] (6.77/year)
THRUN, S.B., 1992. Efficient Exploration In Reinforcement Learning . [Cited by 110 ] (7.81/year)
WATKINS, C.J.C.H. and P. DAYAN, 1992. Q-learning . Machine Learning. [Cited by 861 ] (61.16/year)
WEBROS, P.J., 1990. A menu of designs for reinforcement learning over time . Mit Press Series In Neural Network Modeling And …. [Cited by 82 ] (5.10/year)
WHITEHEAD, S.D. and D.H. BALLARD, 1990. Active perception and reinforcement learning . Proceedings of the seventh international conference (1990) …. [Cited by 95 ] (5.91/year)
WHITEHEAD, S.D., 1991. A complexity analysis of cooperative mechanisms in reinforcement learning. Proc. AAAI. [Cited by 107 ] (7.10/year)
WHITEHEAD, S.D., 1992. Reinforcement learning for the adaptive control of perception and action . [Cited by 56 ] (3.98/year)
WHITLEY, D., et al. , 1993. Genetic reinforcement learning for neurocontrol problems . Machine Learning. [Cited by 92 ] (7.04/year)
WILLIAMS, R.J., 1992. Simple statistical gradient-following algorithms for connectionist reinforcement learning . Machine Learning. [Cited by 243 ] (17.26/year)
ZHANG, W. and T.G. DIETTERICH, 1995. A reinforcement learning approach to job-shop scheduling . Proceedings of the Fourteenth International Joint Conference …. [Cited by 133 ] (12.01/year)