Summary

Jean Piaget laid out the foundations of constructivism and Drescher (1991) presented a mathematical formulation that is still considered a landmark in learning theories. Chaput (2004), in his PhD, developed the Constructivist Learning Architecture (CLA) to build sensorimotor schemas in a hierarchical network. CLA reimplements Drescher's Schema Mechanism, but more efficiently, allowing it to scale up to more complex problems, such as simulated robot control.

In my PhD, under the supervision of Dr Torbjorn Dahl, I developed a Hierarchical SOM-based Encoding algorithm that features a lossless memory model and reward oriented decision making motivated by reinforcing feedback. The algorithm follows the unitary-store memory models suggesting that short-term memory consists of brief activations of long-term memory representations (Jonides et al. (2008) offer a comprehensive review). It takes advantage of the relation to CLA and infant cognition models allowing us to argue that it is also a computational model of constructivism. However, it further supports complex physical humanoid robot control and novel skill acquisition by exploiting the demonstrated skills through Reinforcement Learning. Various limitations have been addressed which effectively led to reengineering a novel algorithm for sequential decision making. The computational model is first validated with experiments in abstract domains and learning complex physical humanoid robot control. The experiments feature multiple hidden states, null velocities, and action execution in noisy continuous environments. Additionally, it supports reproduction of novel skills based on reward oriented action execution. Finally, in simulated experiments, it is demonstrated how from a range of simple skills, complex swimming behaviour emerges in a biologically inspired robotic salamander.

Background

Cohen et al. (2002) proposed a learning theory that relies on six Information Processing Principles (IPP). Collectively, these principles describe a hierarchical system that shares common features with elements of infant cognition that have been hypothesized and confirmed in neuropsychological experiments. Chaput (2004), in his PhD, further explains this learning theory and introduces a computational model, i.e., the Constructivist Learning Architecture (CLA), that is based on a hierarchy of Kohonen Self-Organizing maps. CLA adheres to the six IPP, which are simply a restatement of Piaget's constructivism theory. CLA was able to reproduce and verify empirical results in causality experiments in abstract domains and also present learning capabilities in a simulated mobile robot experiment. However, CLA was never extended to learning complex physical robot control or advanced sequential decision making.

Hierarchical SOM-based Encoding

In my PhD, I present a novel algorithm for encoding sequential sensor and actuator data in a dynamic, hierarchical neural network that can grow to accommodate the length of the observed interactions.

Temporally Augmented SOM

An augmentation of the traditional Kohonen SOM is proposed that takes the form of maintaining a node-specific activation on each step. Similar models have been proposed in the literature, however, a different configuration is introduced that is both computationally efficient and offers lossless encoding. The winning nodes increase their activation by a fixed value A = 2D, where D is the capacity of the short-term memory and the remaining nodes decay their activation by half. The memory model is lossless in the sense that the node activations (and reactivations) of the nodes are encoded in uniquely represented values. The activation and decay process are based on binary operations that improves the computational efficiency of the model. The proposed SOM does not lose the advantage of topological preservation of the mapping while capturing temporal relationships at the same time. The output of the SOM is now two-fold, a 2D representation of the high dimensional input, and the trace of past activations of individual neurons.

Hierarchy of Temporally Augmented SOM

Virtually stacking temporally augmented SOM and feeding the sparse activation map of one SOM as input to the SOM above, allows nodes on the higher levels encode temporal information of past activations. Hence, the weights of a node represent the long-term memory of the architecture and the activations of the long-term memory represent the short-term memory.

The bottom level SOM discretizes the observation and action space, and the higher levels encode sequences of activations below. During reproduction, the action selection is influenced by 1) the immediate observation and its proximity to each of the nodes at the bottom level, 2) the proximity of the past observations to the demonstrations expressed as a comparison between short-term memory on one level and the long-term memory sequences on the higher level, and 3) the discounted reward expectation of each decision according to its proximity to the goal.

 

Beyond current neural networks

The Hierarchical SOM-based Encoding follows a different training and reproduction paradigm than traditional Recurrent Neural Networks (RNN). While RNN perform incremental updates on the weights of hundreds or thousands of neurons, e.g., using back-propagation through time, the presented algorithm performs single step updates of the weights on the higher levels according to the activation map below. Hence, the training times have been in the order of a few seconds on a 2011 i7 processor with no parallelization optimizations on the implementation of the algorithm. A complex decision making algorithm though is responsible for presenting all the aforementioned features of the algorithm, such as hidden state identification and novel skill reproduction.

This algorithm does not suffer from the vanishing gradient effects as in traditional RNN architectures (but not in the Long Short Term Memory network). The hierarchy permanently stores the observed sequences. Their effect in future decision making is either reinforced if the reproduction is faithful to the demonstrations or is inhibited if mismatches occur between the short-term and long-term memory.

The presented algorithm does not require a washout time to initialize the network before making predictions, which is common both in Echo State Networks, and the Long Short Term Memory networks. This feature is demonstrated in a simulated robot experiment where from the first observation the algorithm is able to reproduce the demonstrated skills.

Compared to the current state-of-the-art in SOM for robot control, unlike other RNN approaches that separately employ a SOM for the discretization of the input space, the complete learning and reproduction algorithm is integrated in a hierarchy of SOM.

Finally, the following experiments present results that have not been approached in the past either from neural networks and especially from SOM based approaches.  

 

Experiments

Five experiments have been specifically designed to evaluate the performance of the Hierarchical SOM-based Encoding algorithm with each highlighting a different aspect of the algorithm.

An abstract chain walk domain with an imposed hidden state is used to demonstrate the hidden state identification and the reward orientation behaviour of the algorithm in a tractable domain.

  

 

One Aldebaran Nao humanoid robot operating in the physical world is learning by demonstration how to reproduce Figure 8 pattern with its left arm in the air. Please note that in the physical robot experiments the motion is segmented and delayed, because the controller is operating in a perceive-act loop. Hence, it was forced to execute a command and allow sufficient time before receiving the next observation, otherwise, the robot would not have sufficient time to execute the actions.

 

A collection of three experiments is demonstrated, where a humanoid robot learns to perform a tactile gesture, in particular, a stroke on another humanoid robot, with no-, recoverable-, and irrecoverable-perturbations. 

 

A double tap tactile gesture is demonstrated to a humanoid robot that includes multiple hidden states given that the agent relies only on the angular positions of the joints and not on velocities or accelerations. The algorithm demonstrates that it is able to either reproduce the demonstrated skill or perform a novel skill, i.e., a single tap, by being more reward oriented.

 

The final experiment uses Ijspeert's et al. (2007) simulated salamander swimming in an aquatic environment in Webots. The salamander uses the original Central Pattern Generator controller as a source of demonstrations and the agent then learns to swim with only considering the initial feedback from the proximity sensors in a periodic execution of one cycle for each skill. As a result a complex behaviour emerges where the salamander learns how to swim and avoid the walls of the aquarium.

 

Our Publications

Most of the work described above is either published or in the final stage for publication in the following papers.

Submitted: Georgios Pierris and Torbjørn S. Dahl, Emerging Complex Swimming Behavior on Robotic Salamander through Demonstration, IEEE International Conference on Robotics and Automation (ICRA), 2015

Georgios Pierris and Torbjorn S. Dahl, A developmental perspective on humanoid skill learning using a hierarchical SOM-based encoding, Neural Networks (IJCNN), 2014 International Joint Conference on , vol., no., pp.708,715, 6-11 July 2014, doi: 10.1109/IJCNN.2014.6889900

Georgios Pierris and Torbjorn S. Dahl, Humanoid Tactile Gesture Production using a Hierarchical SOM-based Encoding, Autonomous Mental Development, IEEE Transactions on , vol.6, no.2, pp.153,167, June 2014, doi: 10.1109/TAMD.2014.2313615 

Georgios Pierris and  Torbjorn S. Dahl, Reinforcement Learning for Temporal Patterns using a Discrete Hierarchical Self-Organizing Mapposter CogSys 2012

Georgios Pierris and Torbjorn S. Dahl, Compressed Sparse Code Hierarchical SOM on learning and reproducing gestures in humanoid robots, RO-MAN, 2010 IEEE , vol., no., pp.330,335, 13-15 Sept. 2010 doi: 10.1109/ROMAN.2010.5598654 

 

References

Cohen, Leslie B., Harold H. Chaput, and Cara H. Cashon. A constructivist model of infant cognition. Cognitive Development 17.3 (2002): 1323-1343.

Drescher, Gary L. Made-up minds: a constructivist approach to artificial intelligence. MIT press, 1991.

Harold H. Chaput, The Constructivist Learning Architecture: A Model of Cognitive Development for Robust Autonomous Robots, Ph.D. thesis, Computer Science Department, University of Texas at Austin, Technical Report TR-04-34, August 2004.

Ijspeert, Auke Jan, et al. From swimming to walking with a salamander robot driven by a spinal cord modelScience 315.5817 (2007): 1416-1420.

Jonides, John, et al. The mind and brain of short-term memory. Annu. Rev. Psychol. 59 (2008): 193-224.