In the last year, or even more, we have put all our efforts into the development of a novel Reinforcement Learning algorithm for temporal pattern learning. In our approach, we use a Hierarchy of self-organizing maps to encode sequences of motion primitives. More details of this work are available in the publications section and as soon as we finish the review more work will be out.
To make long story short, in the context of this work we want to reproduce a tactile gesture pattern, e.g., a stroke, with our beloved finger-enabled Aldebaran Nao using as input the observations from the joint space plus the fingertip feedback and learn the actions that will reproduce the pattern. We merely focus on three factors when reproducing patterns, which simply answer the following questions:
- What I've done so far; how well does it fit with what I've learned during training?
- What I currently see/observe; does it relate to the states I've seen during training?
- From where I am now; how can I reach a state that will give me more reward?
Even though these three factors from a first glance should be more than enough to successfully reproduce a temporal pattern, it is not case when perturbations occur in the actions, or the actions simply fail to complete, e.g. an obstacle is not allowing the robotic arm to move.
Our specific problem was when our Nao was reproducing correctly the learned pattern but a perturbation (a rather large one, we are moving the arm to a state far away) in the motion could not be handled by the robot. Our algorithm was able to understand the sudden change, but because of the long matches of historical observations during the correct actions, it was not able to recover the motion from the new state.
The missing link is probably the surprise. According to dictionary.com surprise would be a state where you discover something suddenly or unexpectedly. Hey...Story of a robot's life! When their environment changes, it will be either too late to realize it and adapt, or will never realize it and you know the catastrophic results...
Izard in his book [1] tried to describe surprise amongst other emotions. He argued that it is the result of an unexpected event or stimuli which leaves our mind blank momentarily as if normal thinking processes are stopped. Ortony and Partridge [2] highlighted the problems of artificial systems and suggested that a surprise mechanism is “a crucial component of general intelligence”.
Peters, M. W. [3] in a really interesting paper (look for the "Encoding Surprise" section) is serially modelling surprise as the absolute difference of the expectation and the current observation. What is more interesting though is that explains the idea of homeostasis:
"For all biological systems some states are more conducive to life than others, and the basic biological function is to preserve homeostasis by moving from less conducive states to more conducive ones."
"In other words, it does not even need to keep telling itself that things are still the same. It is not surprising then, that over time individual neurons and even semi-discrete neural systems exhibit reduction in response to unvarying stimuli (Day 1972)."
What I think is happening to our Nao is that the historical match of actions is acting as homeostasis (there is a momentum in all these good historical matches) and unless something surprises our robot it cannot recover. This surprise will have to allow our algorithm ignore the massively built historical match through time, and compare the expected state with the current observation, and... And what? Start afresh? Wait for a few steps to see if it was just noise, and now I am back on track? Is there any systematic way to solve this problem, or do we follow problem-specific approaches to decay the importance of history?
References
- The psychology of emotions , Springer, 1991
- Ortony, A., & Partridge, D. (1987). Surprisingness and expectation failure: what’s the difference? In J. McDermott (Ed.), Proceedings of the 10th international joint conference on Artificial intelligenceVolume 1 (pp. 106-108). Morgan Kaufmann.
- Peters, M. W. (n.d.). Towards Artificial Forms of Intelligence , Creativity , and Surprise . Science.
Extra Links
* http://www.ifaamas.org/Proceedings/aamas2010/pdf/02%20Extended%20Abstracts/Red/R-17.pdf