Jacek Biesiada1, Wlodzislaw Duch2,3, Adam Kachel1, Krystian Maczka1, and Sebastian Palucha1.
1Division of Computer Studies, Department of Electrotechnology, The Silesian University of Technology, Katowice, Poland;
2School of Computer Engineering, Nanyang Technological University, Singapore,
3Department of Informatics, Nicolaus Copernicus University, Grudziadzka 5, 87-100 Torun, Poland.
A comparison between several feature ranking methods used on artificial and real dataset is presented. Six ranking methods based on entropy and statistical indices, including chi square and Pearson's correlation coeffcient, are considered. The Parzen Window method for estimation of mutual information and other indices gives similar results as discretization based on the separability index, but results strongly dependent on the smoothing parameter. The quality of the feature subsets with highest ranks is evaluated by using decision tree, Naive Bayes and the nearest neighbour classifiers. Significant differences are found in some cases, but there is no single best index that works best for all data and all classifiers. To be sure that a subset of features giving the highest accuracy has been selected the use of many different indices is recommended.
Reference: Biesiada J, Duch W. Kachel A, Maczka K, Palucha S (2005), Feature ranking methods based on information entropy with Parzen windows.
In: Proceedings of the 9th International Conference on Research in Electrotechnology and Applied Informatics (REI'05), 31.08-3.09.2005, Katowice-Kraków, Poland, Vol. I, pp. 109-119.
Preprint for comments in PDF, 257 KB.
BACK to the publications of W. Duch.
BACK to the on-line publications of the Department of Informatics, NCU.