Reinforcement Learning

Reinforcement learning refers to a type of machine learning algorithm. An agent explores an environment and at the end receives a reward, which may be either positive or negative. In effect, the agent is told whether he was right or wrong, but is not told how. Examples include playing a game of chess (you don't know whether you've won or lost until the very end) or a waitress in a restaurant (she has to wait for the end of the meal before she finds out whether or not she receives a tip).
Sewell (2006)

"The most typical way to train a classifier is to present an input, compute its tentative category label, and use the known target category label to improve the classifier. For instance, in optical character recognition, the input might be an image of a character, the actual output of the classifier the category label “R,” and the desired output a “B.” In reinforcement learning or learning with a critic, no desired signal is given; instead, the only teaching feedback is that the tentative category is right or wrong. This is analogous to a critic who merely states that something is right or wrong, but does not say specifically how it is wrong. In pattern classification, it is most common that such reinforcement is binary—either the tentative decision is correct or it is not. How can the system learn from such nonspecific feedback?"
Duda, Hart and Stork (2001), page 17

"The problem of reinforcement learning, [...] is the most general of the three categories. Rather than being told what to do by a teacher, a reinforcement learning agent must learn from reinforcement.[The term reward [...] is a synonym for reinforcement.] For example, the lack of a tip at the end of the journey (or a hefty bill for rear-ending the car in front) give the agent some indication that its behavior is undesirable. Reinforcement learning typically includes the subproblem of learning how the environment works."
Russell and Norvig (2003), page 650

"In many domains the learner is not told the correct response, it only receives reward or punishment. Often reward is substantially delayed. For example in chess you only find out whether you won or lost at the very end. Learning in such domains is called reinforcement learning. Reinforcement learning can be very powerful: The current world champion backgammon player is a neural network that learnt by self-play using reinforcement learning techniques. However, reinforcement learning is poorly understood in comparison to supervised learning. We are looking for better models of reinforcement learning. In the meantime, we have developed some new techniques for learning in delayed reward games. They are currently being tested on backgammon and chess."
unknown origin

"Reinforcement learning refers to a class of problems in machine learning which postulate an agent exploring an environment in which the agent perceives its current state and takes actions. The environment, in return, provides a reward (which can be positive or negative). Reinforcement learning algorithms attempt to find a policy for maximizing cumulative reward for the agent over the course of the problem."
Wikipedia (2006)

Links

Bibliography