A NEW CONSTRUCTION ALGORITHM OF EFFICIENT RADIAL BASIS FUNCTION NEURAL NET CLASSIFIER AND ITS APPLICATION TO CODES IDENTIFICATION F. Belloir, A. Fache and A. Billat Laboratoire d'Automatique et de Microélectronique Université de Reims Champagne-Ardenne, B.P. 1039, 51687 Reims, France Tel : +33.(0)220.127.116.11.16, Email : email@example.com ABSTRACT In this paper we present a new simple algorithm to construct Radial Basis Function (RBF) neural net based classifier. This algorithm has the major advantage to require nothing else that the training set to work (no step learning, threshold or other parameters like in other methods). Despite its simplicity, we show, on many benchmark datasets, that this algorithm provides a robust and efficient classifier. These two properties make the proposed algorithm very attractive. We also describe an application of such built RBF classifier on data obtained in a project of buried codes identification. Finally, we compare the results with other new recognition techniques like fuzzy pattern recognition. 1. INTRODUCTION RBF networks have been extensively studied in the past  . They consist of three layers, an input, a hidden and an output one. The input layer corresponds to the input vector feature space and the output layer to the pattern classes. So the whole architecture is fixed only by determining the hidden layer and the weights between the middle and the output layers. For an input vector X=[x1...xn] T∈ Rn, and with Nh middle layer neurons, the lth activation function ϕl(.) is characterised by a centre Cl ∈ Rn and eventually a width σl, l=1,...Nh. The general equation of an output neuron j is given by : s j ( X ) = wlj ∈ R is Nh ∑w ϕ (X) +b l =1 lj l j where the weight between the hidden neuron l and the output neuron j, bj is an eventual bias. As activation function we X − C 2 l . use here the hypergaussian ϕl ( X ) = exp − 2 2 σ l When the RBF network is used as a classifier, X is a vector of attributes to be classified and each output sj(X) represents the membership of X to the class Ωj.. So, when there are m disjoined classes, the RBF classifier contains m outputs. To assign the prototype X at a class, these outputs can be directly used by taking the one which gives the largest membership. But some other decision rules can also be used. Different methods to construct classifiers have been presented   but most of the time the algorithm complexity is important. Here, we present a very simple algorithm directly drawn from the intrinsic working of RBF net based classifier. 2. ALGORITHM PRESENTATION 2.1 Purpose The algorithm aim is to subdivide iteratively each of the m basic classes, disjoined but not necessarily convex, in a set of convex regions called clusters. In the RBF network, each cluster is represented by a hidden neuron and each output sj realises the union of some of them in order to form the corresponding class Ωj. The proposed algorithm can be considered as a "fully selforganised one". In fact, it determines a minimal number of local units needed to represent the whole classes known from the learning set. In the same time, it places them in such a manner that the receptive field induced by each hidden neuron covers optimally, in some sense, the attribute space. Each of these receptive fields is controlled by a scale factor, the width σ of the neuron, which is automatically adjusted according to the closest classes. So, from only the learning set and after a number of iterations proportional to the number of defined neurons, the algorithm gives the size and structure of the RBF net. Finally, it suffices to apply some least mean squares techniques to determinate the weights wlj. The RBF classifier is then totally defined and can be used in a decision making task. And all this procedure is done without having to set-up any parameters. We suppose to have a learning set of N patterns Xk, k=1 to N, for which we know the class, taken among m disjoined classes Ωj, j=1 to m. During iterations, a cluster l is characterised by its centre Cl and it is spatially limited in the feature space by an hyperball of radius proportional to the width σl. These clusters are placed in two different ways. In the case where they are owned by different classes, they will be disjoined. In the other case, they could overlapped themselves in order to cover the maximum space region with a small number of hyperballs. The union of the volume delimited by the hyperballs and which represents the class Ωj is denoted Rl . The algorithm adds new cluster until each point Xk of the learning set is include in at least one cluster of its respective class. The method necessarily converges since, in the worst case where data can not be globally partitioned, there will be a cluster per each point. 2.2 Algorithm Description Step 1 : (Initialisation) Define m centres Cl, each one is defined like the gravity centre of the points Xi ∈ Ωj : Cl0 = 1 ∑ X i , l=j=1 to m. Card (Ω j ) X i ∈Ω j
Step 2 : (Width definition) The width σl of the neuron l is defined like the half distance between his centre Cl and the closest centre of an other class : σ l = 1 arg min Cl − Ci . C i ∉Ω j 2 Step 3 : (Search of isolated point) We search the point Xk ∉ Rj and that is at the maximum distance from Rj : X k = arg max X i − R j and min X i − C j > σ j . If there is X k ∈Ω j the point which is at the maximum distance of one of the three neurons, then a new neuron corresponding at the belonging class of the found point is created. The K-means clustering algorithm is applied to adjust the position of the two same class neurons centres. Finally, the width of the all four neurons is computed once again. Iteration #1 : 4 neurons after K-means and Insertion of a new centre 4.5 C j ∈Ω l no such point, we go to step 4, else the point Xi creates a new centre defining the class Ωj. A K-means clustering algorithm is applied to adjust the centres position. Then we go back to step 2. Step 4 : (Learning) The calculation of the network weights wlj, which mathematically realises a non convex union of the clusters defining the class, is made by a least square method. For a point Xk which belongs to Ωj , the desired outputs si(Xk)=δ(i,j) i=1 to m. 4 3.5 3 2.5 2 2.3 Algorithm Illustration To illustrate the working of the construction algorithm, we describe its application on a example of three classes. As the construction algorithm developed use the centre of gravity to define the first centres in the initialisation step, we chose this example to show that even if the computed centres of gravity don't belong to the good classes, the algorithm working reminds accurate. Iteration #0 : Initialisation 3 class -> 3 neurons 1.5 1 -0.5 0 0.5 1 1.5 2 2.5 3 Figure 2. Representation of the algorithm first iteration result. The final clustering is presented in the figure 3 and it is reached after 11 iterations. The number of necessary neurons is 14 to get the linear separation of the three classes. 4.5 4 Final clustering after 11 iterations 3.5 4.5 3 4 2.5 3.5 2 3 1.5 1 -0.5 2.5 0 0.5 1 1.5 2 2.5 3 2 1.5 Figure 1. Representation of the three classes and of the first three neurons created with their width. The figure 1 shows the three classes of the chosen example, the result of the algorithm initialisation step and the creation of the first three corresponding neurons with their respective width as defined in the step 2 of the algorithm. The result of the first algorithm iteration is presented in the figure 2. The first iteration can be described as below. We search 1 -0.5 0 0.5 1 1.5 2 2.5 3 Figure 3. Representation of the final clustering. The last step of the algorithm which represents the non convex union of each class clusters is shown in the figure 4. The border lines shown correspond to a membership value of 0,5.
For this example we obtain a perfect good classification level of 100%. Isoclines for a belonging level of 50% 4.5 4 3.5 3 2.5 The "Iris" database is a very famous one in pattern recognition and numerous references are available. It is composed of three classes in 4 dimensions and there are 50 patterns per class. The "phoneme" database was used in the European ROARS Esprit project and presents difficulties for classification. It is composed of two classes in 5 dimensions and there are 5404 patterns, 3818 for the first class and only 1586 for the second one. For the comparison, it is the Holdout method averaged over five different partitions of the original database which is used. The original database is separated in two independent learnset and testset, each containing half the total available patterns (patterns used in each partition being always the same for each particular trial from one classifier to another). 3.2 Classifiers 2 1.5 1 -0.5 0 0.5 1 1.5 2 2.5 3 Figure 4. Representation of the border lines for a membership value of 0,5. 3. BENCHMARKS 3.1 Databases Our objective was not to develop a specialised classifier which only works well on our data but on the contrary we wanted to obtain a general high performance classifier able to be used on many kinds of pattern recognition problems. So before using this algorithm to our application, we test it on a set of databases. The comparison has been made with the results of an European project named ELENA . In fact, the databases used in this project cover a large range of domains. There are artificial databases like clouds, concentric and gaussian datasets in several dimensions, and also real databases like the Iris and speech recognition datasets (all the files can be downloaded at ftp.dice.ucl.ac.be/pub/neural-nets/elena/databases). For the gaussian database we have two classes, with 2500 patterns for each class and 8 attributes, and there is a fully overlapping between them, the centre of gravity is the same for the two classes. The theoretical Bayes error for this dataset is 9%. The samples distribution for the concentric database which has two classes and two attributes for 2500 patterns for each class, is uniform, there is no overlapping and the boundaries are non linear. Its theoretical error is 0%. For the clouds database which has the same constitution that the concentric one, the samples distribution is gaussian. The first class is the sum of three different gaussian distributions and the second one is a single gaussian distribution. There is an important overlapping between the two classes and its theoretical error is 9,66%. The two real databases are the Fisher's "Iris" and the "phoneme" one. We compare the results given by the classifier built with our algorithm with three others neural classifiers (MLP, LVQ, IRVQ) and with a reference one, the KNN. The "K Nearest Neighbor" classifier is a very classical one . It is used here as a reference for the best estimator of the theoretical Bayes error. It can be shown that the nearest-neighbor rule will give an error rate greater than the minimum possible such as the Bayes error, and with an infinite number of samples, the error rate is never worse than twice the Bayes error. The first neural classifier compared in our study is the MultiLayer Perceptrons classifier. This network, combined with the backpropagation algorithm, is the most widely know inside the neural network community. The second one is the Learning Vector Quantization classifier. It was proposed by Kohonen, it is a simple adaptive method of vector quantization. A finite number of prototypes, each one being labelled with a class identifier, are chosen in the input space. The last neural classifier we compare to the proposed classifier is the IRVQ one. It has been developed in the framework of the ELENA project. It is a suboptimal Bayesian classifier based on radial Gaussian kernels which uses an iterative unsupervised learning method based on vector quantization to obtain a low memory kernel density estimator, while keeping sufficiently accurate estimations of probability densities. More information about these classifiers can be obtained in . 3.3 Results The test used to compare the classifier results is performed, using for each classifier the optimal parameters for each particular database, except for our classifier which doesn't need any parameter to set. The construction algorithm of RBF classifier we propose here is not always the best classifier for all the databases. As it is shown in the figure 5, for the 5 databases, our RBF classifier obtains the best result three times and the second best result once. The only one disappointed result is for the clouds database, we obtain an error percentage of 13.6%, when the theoretical Bayes error is 9.66%, and the best classifier in this case, the IRVQ one, obtains 11,7%. The two classes of this database are too overlapped to realise an efficient union of the different clusters generated by our algorithm.
For a burying depth up to 80cm, we obtain the results given in the table 1. We can notice that the result of the built RBF classifier is better than the others, and always without any specialisation of the construction algorithm. Benchmarks Result RBF KNN MLP LVQ IRVQ 20 18 5. CONCLUSION Error Percentage 16 14 12 10 8 6 4 2 0 Concentric Clouds Gauss8D Phoneme Iris Databases Figure 5. Representation of the classifier results for the 5 chosen databases. We also notice that the results obtained with the Holdout method averaged over five different experiments show that the developed RBF classifier is particularly robust. As a matter of fact, the difference between the best and the worst level of error percentage is very small, less than 10% of the average value. So all these results show that our new algorithm of RBF classifier permits to obtain a very high performance and robust general classifier which can be applied on many kinds of pattern recognition problems with very good results. 4. CODE IDENTIFICATION The general purpose of our application is to detect and identify reliably different buried metallic codes with a smart eddy current sensor . The data are collected by a flat coils metal locator based on the induction balance principle. This detector is connected to a mobile measurement system which controls the data sampling. A code is built from a succession of different metal pieces separated by empty spaces. The different codes are obtained by the combination of different sizes of the metallic parts and empty spaces. Due to the codes similarity and the non linear locator answer with the burying depth, the classification problem is not very simple. That is why we have developed intelligent methods to well solve it. Our first methods was based on the fuzzy logic theory and the Kohonen SOM. As the SOM algorithm gave disappointing results, we replace it by the new proposed RBF algorithm. The methods based on the fuzzy logic theory are the well-known Fuzzy Pattern Matching (FPM)  and the distributed rules (DR)  developed among others by Ishibuchi. A comparison is made between these different methods and the proposed RBF classifier. Error percentage RBF SOM FPM DR 6.2 11.3 8.3 7.1 Table 1. Results of code misclassification for the 4 pattern recognition methods implemented. The use of incremental RBF networks has been already studied  but here we have presented a new simple incremental or "selforganised" RBF network algorithm which is able to be used in a lot of domains without any parameters to set. We have tried with this algorithm to translate the most simply the RBF network working. The results show that the RBF classifier, built simply in the way we have developed, is very robust and particularly efficient in a wide range of pattern recognition problems. 6. REFERENCES  Bishop C.M. "Neural Networks for Pattern Recognition", Clarendon Press, Oxford, 1995.  Poggio T. and Girosi F. "Networks for Approximation and Learning" Proceedings of the IEEE, Vol. 78, pp. 14811497, 1990.  Hwang Y.-S. and Bang S.-Y. "An Efficient Method to Construct a Radial Basis Function Neural Network Classifier" Neural Networks, Vol. 10, No. 8, pp. 1495-1503, 1997.  Bianchini M., Frasconi P. and Gori M. "Learning Without Local Minima in Radial Basis Function Networks" IEEE Transaction On Neural Networks, Vol. 6, No. 3, pp. 749756, 1995.  Guerin-Dugue A. and others "Deliverable R3-B4-P Task B4 : Benchmarks" Technical Report, ELENA Enhanced Learning for Evolutive Neural Architecture, ESPRIT Basic Research Project Number 6891, 1995.  Belloir F., Klein F. and Billat A. "Pattern Recognition Methods for Identification of Metallic Codes Detected by Eddy Current Sensor" Signal and Image Processing (SIP'97), Proceedings of the IASTED International Conference, pp. 293-297, 1997.  Grabisch M. and Sugeno, "A Comparison of some Methods of Fuzzy Classification on Real Data", Proc. Of IIZUKA'92, pp. 659-662, Iizuka, Japan, July 1992.  Ishibuchi H., Nosaki K. and Tanaka H., "Selecting Fuzzy IfThen Rules for Classification Problems Using Genetic Algorithms", IEEE Tansactions on Fuzzy Systems, vol. 3, n°3, 1995.  Fritzke B. "Transforming Hard Problems into Linearly Separable one with Incremental Radial Basis Function Networks" In M.J. Vand Der Heyden, J. Mrsic-Flögel and K. Weigel (eds), HELNET International Workshop on Neural Networks, Proceedings Volume I/II (1994/1995), VU University Press, 1996.