Feature ranking methods based on information entropy with Parzen windows

Feature ranking methods based on information entropy with Parzen windows.

Presented at the International Conference on Research in Electrotechnology and Applied Informatics, 31.08-3.09.2005, Katowice, Poland.

Jacek Biesiada¹, Wlodzislaw Duch^2,3, Adam Kachel¹, Krystian Maczka¹, and Sebastian Palucha¹.
¹Division of Computer Studies, Department of Electrotechnology, The Silesian University of Technology, Katowice, Poland;
²School of Computer Engineering, Nanyang Technological University, Singapore,
³Department of Informatics, Nicolaus Copernicus University, Grudziadzka 5, 87-100 Torun, Poland.

Abstract.

A comparison between several feature ranking methods used on artificial and real dataset is presented. Six ranking methods based on entropy and statistical indices, including chi square and Pearson's correlation coeffcient, are considered. The Parzen Window method for estimation of mutual information and other indices gives similar results as discretization based on the separability index, but results strongly dependent on the smoothing parameter. The quality of the feature subsets with highest ranks is evaluated by using decision tree, Naive Bayes and the nearest neighbour classifiers. Significant differences are found in some cases, but there is no single best index that works best for all data and all classifiers. To be sure that a subset of features giving the highest accuracy has been selected the use of many different indices is recommended.

Reference: Biesiada J, Duch W. Kachel A, Maczka K, Palucha S (2005), Feature ranking methods based on information entropy with Parzen windows.
In: Proceedings of the 9th International Conference on Research in Electrotechnology and Applied Informatics (REI'05), 31.08-3.09.2005, Katowice-Kraków, Poland, Vol. I, pp. 109-119.

Preprint for comments in PDF, 257 KB.

BACK to the publications of W. Duch.
BACK to the on-line publications of the Department of Informatics, NCU.