### Feature Selection for High-Dimensional Data: A Pearson Redundancy Based Filter.

1Division of Computer Studies, Department of Electrotechnology, The Silesian University of Technology, Katowice, Poland.
2Department of Informatics, Nicolaus Copernicus University, Grudziadzka 5, 87-100 Torun, Poland.
3School of Computer Engineering, Nanyang Technological University, Singapore.

Abstract.

An algorithm for filtering information based on the Pearson $\chi^2$ test approach has been implemented and tested on feature selection. This test is frequently used in biomedical data analysis and should be used only for nominal (discretized) features. The algorithm has only one parameter, statistical confidence level that two distributions are identical. Empirical comparisons with four other state-of-the-art features selection algorithms (FCBF, CorrSF, ReliefF and ConnSF) are very encouraging.

Preprint for comments in PDF, 85 KB.

Reference: Biesiada J, Duch W, Feature Selection for High-Dimensional Data: A Pearson Redundancy Based Filter.
Lecture Notes in Computer Science, Vol. xxx, pp. xxx-yyy, 2007

