Existing monolithic artificial neural network architectures are not sufficient to cope with large complex problems. A better approach is to build large scale heterogenous networks using both supervised and unsupervised learning modules. In these architectures an unsupervised learning algorithm, such as the k-means algorithm, decomposes the overall task and a supervised learning algorithm, such as one based on gradient descent, solves each subtask.

We have investigated heterogenous architectures that are based on a novel k-means partitioning algorithm that integrates into its partitioning process information about the input distribution as well as the structures of the goal and network functions. We have also added two new mechanisms to our k-means algorithm. The first mechanism biases the partitioning process toward an optimal distribution of the approximation errors in the various subdomains. This leads to a consistently lower overall approximation error. The second mechanism adjusts the learning rate dynamically to match the instantaneous characteristics of a problem; the learning rate is large at first, allowing rapid convergence, and then decreases in magnitude as the adaptation converges. This results in a lower residual error and makes the new k-means algorithm also viable for non-stationary situations.

We evaluate the performance and complexity of these heterogenous architectures and compare them to homogenous radial basis function architectures and to multilayer perceptrons trained by the error back-propagation algorithm. The evaluation shows that the heterogenous architectures give higher performance with lower system complexity when solving the Mackey-Glass time series prediction problem and a hand-written capital letter recognition task.




Download Full History