Statlog Datasets: comparison of results

Computational Intelligence Laboratory | Department of Informatics | Nicolaus Copernicus University

4 x 4 digit dataset | Australian credit dataset | Chromosome | Credit management | German credit | Heart | Image segmentation | Karhunen-Loeve digits | Letters (16 moments) | Satellite image | Vehicle dataset |

Machine Learning, Neural and Statistical Classification, D. Michie, D.J. Spiegelhalter, C.C. Taylor (eds), Stalog project - whole book!
More results for medical and other data.


A note of caution: comparison of different classifiers is not an easy task. Before you get into ranking of methods using the numbers presented in tables below please note the following facts.

Many results we have collected give only a single number (even results from the StatLog project!), without standard deviation. Since most classifiers may give results that differ by several percent on slightly different data partitions single numbers do not mean much.

Leave-one-out tests have been criticized as a basis for accuracy evaluation, the conclusion is that crossvalidation is safer, cf:
Kohavi, R. (1995). A study of cross-validation and bootstrap for accuracy estimation and model selection. In: Proc. of the 14th Int. Joint Conference on Artificial Intelligence, Morgan Kaufmann, pp. 1137-1143.

Crossvalidation tests (CV) are also not ideal. Theoretically about 2/3 of results should be within a single standard deviation from the average, and 95% of results should be within two standard deviations, so in a 10-fold crossvalidation you should see very rarely reuslts that are beter or worse than 2xSTDs. Running CV several times may also give you different answers. Search for the best estimator continues. Cf:
Dietterich, T. (1998) Approximate statistical tests for comparing supervised classification learning algorithms. Neural Computation, 10 (7), 1895-1924;
Nadeau C, Bengio Y. (1999) Inference for the Generalization Error. Tech. rep. 99s-25, CIRANO, J. Machine Learning (Kluver, in print).

Even the best accuracy and variance estimation is not sufficient, since performance cannot be characterized by a single number. It should be much better to provide full Receiver Operator Curves (ROC). Combining ROC with variance estimation would be ideal. Unfortunately this still remains to be done. All we can do now is to collect some numbers in tables.


Credit management

Statlog version, 2 classes, 7 attributes, no. of training cases=15000, no. of test cases=5000;
Unfortunately this data is not public; anyone knows where to find it?

AlgorithmError (Train)Error (Test)who
Discrim0.0310.033statlog
Quadisc0.0510.050statlog
Logdisc 0.031 0.030 statlog
SMART 0.021 0.020 statlog
ALLOC80 0.033 0.031 statlog
k-NN 0.028 0.088 statlog
CASTLE 0.051 0.047 statlog
CART FD FD statlog
IndCART 0.010 0.025 statlog
NewID 0.000 0.033 statlog
AC2 0.000 0.030 statlog
Baytree 0.002 0.028 statlog
NaiveBay 0.041 0.043 statlog
CN2 0.000 0.032 statlog
C4.5 0.014 0.046 statlog
ITrule 0.041 0.046 statlog
Cal5 0.018 0.023 statlog
Kohonen 0.037 0.043 statlog
DIPOL92 0.020 0.020 statlog
Backprop 0.020 0.023 statlog
RBF 0.033 0.031 statlog
LVQ 0.024 0.040 statlog
Default 0.051 0.047statlog 


Australian credit dataset

Statlog dataset, 2 classes, 14 attributes, 690 observations, class distribution 55.5%, 44.5%.
37 missing values, A1: 12, A2: 12, A4: 6, A5: 6, A6: 9, A7: 9, A14: 13
10-fold cross-validation.

AlgorithmError
(Train)
Error
(Test)
who
Cal5 0.132 0.131 statlog
k-NN,k=18,manh,std --- 0.136KG
ITrule 0.162 0.137 statlog
w-NN,k=18,manh, simplex,std --- 0.138 KG
SVM Gauss -- 0.138±0.041 C=0.1, s=0.01, over 460 SV
Discrim 0.139 0.141 statlog
DIPOL92 0.139 0.141 statlog
SSV 3 nodes --- 0.142±0.040 Ghostminer, WD, uses 5 features.
SSV 3 nodes --- 0.145±0.035 Ghostminer, WD, uses F8 only!
C4.5 -- 0.145±0.007 statlog
CART 0.145 0.145 statlog
RBF 0.107 0.145 statlog
SVM lin -- 0.148±0.030 C=1, over 190 SV
CASTLE 0.144 0.148 statlog
NaiveBay 0.136 0.151 statlog
SVM Gauss -- 0.152±0.032 C=1, over 290 SV
IndCART 0.081 0.152 statlog
k-NN k=11,std,eucl -- 0.152KG
Backprop 0.087 0.154 statlog
C4.5 0.099 0.155 statlog
k-NN,k=11,fec.sel, eucl,std -- 0.156KG
SMART 0.090 0.158 statlog
Baytree 0.000 0.171 statlog
k-NN -- 0.181statlog
NewID 0.000 0.181 statlog
AC2 0.000 0.181 statlog
LVQ 0.065 0.197 statlog
ALLOC80 0.194 0.201 statlog
CN2 0.001 0.204 statlog
Quadisc 0.185 0.207 statlog
Default 0.4400.440 statlog
Kohonen FailedFailed statlog


4 x 4 digit dataset


Statlog dataset, 10 classes, 16 attributes (train,test)=(9000,9000) observations)

Algorithm Error
(Train)
Error
(Test)
who
Discrim 0.111 0.114 statlog
Quadisc 0.052 0.054 statlog
Logdisc 0.079 0.086 statlog
SMART 0.096 0.104 statlog
ALLOC80 0.066 0.068 statlog
k-NN0.0160.047statlog
CASTLE0.1800.170statlog
CART 0.180 0.160 statlog
IndCART 0.011 0.154 statlog
NewID 0.080 0.150 statlog
AC2 * 0.155 statlog
Baytree 0.015 0.140 statlog
NaiveBay 0.220 0.233 statlog
CN2 0.000 0.134 statlog
C4.5 0.041 0.149 statlog
ITrule * 0.222 statlog
Cal5 0.118 0.220 statlog
Kohonen 0.051 0.075 statlog
DIPOL92 0.065 0.072 statlog
Backprop 0.072 0.080 statlog
RBF 0.080 0.083 statlog
LVQ 0.040 0.061 statlog
Default 0.900 0.900 statlog


Karhunen-Loeve digits

Statlog dataset, 10 classes, 40 attributes, (train,test) = (9000,9000) observations
Unfortunately this data is not public; anyone knows where to find it?

Algorithm Error
(Train)
Error
(Test)
who
Discrim 0.070 0.075 statlog
Quadisc 0.016 0.025 statlog
Logdisc 0.032 0.051 statlog
SMART 0.043 0.057 statlog
ALLOC80 0.000 0.024 statlog
k-NN0.0000.020statlog
CASTLE 0.126 0.135 statlog
CART FD FD statlog
IndCART 0.003 0.170 statlog
NewID 0.000 0.162 statlog
AC2 0.000 0.168 statlog
Baytree 0.006 0.163 statlog
NaiveBay 0.205 0.223 statlog
CN2 0.036 0.180 statlog
C4.5 0.050 0.180 statlog
ITrule * 0.216 statlog
Cal5 0.128 0.270 statlog
Kohonen FD FD statlog
DIPOL92 0.030 0.039 statlog
Backprop 0.041 0.049 statlog
RBF 0.048 0.055 statlog
LVQ 0.011 0.026 statlog
Cascade 0.063 0.075 statlog
Default 0.900 0.900 statlog


Vehicle dataset

Statlog dataset,  4 classes, 18 attributes, 846 observations, 9-fold cross-validation
 
 
Algorithm Error
(Train)
Error
(Test)
who
Discrim 0.202 0.216 statlog
Quadisc 0.085 0.150 statlog
Logdisc 0.167 0.192 statlog
SMART 0.062 0.217 statlog
ALLOC80 0.000 0.173 statlog
k-NN--0.275statlog
k-NN,k=4,manh,std-0.272KG
k-NN,k=4, manh,fec. sel,std-0.283KG
w-NN,k=4
manh,std
simplex
0.287KG
CASTLE 0.545 0.505 statlog
CART 0.284 0.235 statlog
IndCART 0.047 0.298 statlog
NewID 0.030 0.298 statlog
AC2 * 0.296 statlog
Baytree 0.079 0.271 statlog
NaiveBay 0.519 0.558 statlog
CN2 0.018 0.314 statlog
C4.5 0.065 0.266 statlog
ITrule * 0.324 statlog
Kohonen 0.115 0.340 statlog
DIPOL92 0.079 0.151 statlog
Backprop 0.168 0.207 statlog
RBF 0.098 0.307 statlog
LVQ 0.171 0.287 statlog
Cascade 0.263 0.280 statlog
Default 0.750 0.750 statlog


Letters

Statlog dataset, 26 classes, 16 attributes, (train,test) = (15000,5000) observations
 
 
Algorithm Error
(Train) 
Error
(Test)
who
ALLOC80 0.065 0.064 statlog
k-NN 0.000 0.068 statlog
LVQ 0.057 0.079 statlog
Quadisc 0.101 0.113 statlog
CN2 0.021 0.115 statlog
Baytree 0.015 0.124 statlog
NewID 0.000 0.128 statlog
IndCART 0.010 0.130 statlog
C4.5 0.042 0.132 statlog
DIPOL92 0.167 0.176 statlog
RBF 0.220 0.233 statlog
Logdisc 0.234 0.234 statlog
CASTLE 0.237 0.245 statlog
AC2 0.000 0.245 statlog
Kohonen 0.218 0.252 statlog
Cal5 0.158 0.253 statlog
SMART 0.287 0.295 statlog
Discrim 0.297 0.302 statlog
Backprop 0.323 0.327 statlog
NaiveBay 0.516 0.529 statlog
ITrule 0.585 0.594 statlog
Default 0.955 0.960 statlog
CART FD FD statlog


Chromosome dataset

Statlog dataset, 24 classes, 16 attributes, (train,test) = (20000,20000) observations; unfortunately the dataset is not public! Anyone knows where to find it?

Algorithm Error
(Train)
Error
(Test)
who
Discrim 0.073 0.107 statlog
Quadisc 0.046 0.084 statlog
Logdisc 0.079 0.131 statlog
SMART 0.082 0.128 statlog
ALLOC80 0.192 0.253 statlog
k-NN0.0000.123statlog
CASTLE 0.129 0.178 statlog
CART FD FD statlog
IndCART 0.007 0.173 statlog
NewID 0.000 0.176 statlog
AC2 0.000 0.234 statlog
Baytree 0.034 0.164 statlog
NaiveBay 0.260 0.324 statlog
CN2 0.010 0.150 statlog
C4.5 0.038 0.175 statlog
ITrule 0.681 0.697 statlog
Cal5 0.142 0.244 statlog
Kohonen 0.109 0.174 statlog
 DIPOL92 0.049 0.091 statlog
Backprop FD FD statlog
RBF 0.087 0.129 statlog
LVQ 0.067 0.121 statlog
Default 0.956 0.956 statlog


Satellite image (SatImage)

Statlog dataset, 6 classes, 36 attributes, (train,test)=(4435,2000) observations
 
 
Algorithm Error
(Train)
Error
(Test)
who
k-NN0.0890.094statlog
k-NN,k=2,3, eucl-0.097KG
LVQ 0.048 0.105 statlog
DIPOL92 0.051 0.111 statlog
RBF 0.111 0.121 statlog
ALLOC80 0.036 0.132 statlog
CART 0.079 0.138 statlog
IndCART 0.023 0.138 statlog
Backprop 0.112 0.139 statlog
Baytree 0.020 0.147 statlog
CN2 0.010 0.150 statlog
C4.5 0.040 0.150 statlog
NewID 0.067 0.150 statlog
Cal5 0.125 0.151 statlog
Quadisc 0.106 0.155 statlog
AC2 * 0.157 statlog
SMART 0.123 0.159 statlog
Logdisc 0.119 0.163 statlog
Discrim 0.149 0.171 statlog
Kohonen 0.101 0.179 statlog
Cascade 0.112 0.163 statlog
CASTLE 0.186 0.194 statlog
Default 0.758 0.769 statlog
ITrule FD FD statlog


Image segmentation

Statlog dataset, 7 classes, 11 attributes, 2310 observations, 10-fold cross-validation
 
 
Algorithm Error
(Train)
Error
(Test)
who
Discrim 0.112 0.116 statlog
Quadisc 0.155 0.157 statlog
Logdisc 0.098 0.109 statlog
SMART 0.039 0.052 statlog
ALLOC80 0.033 0.030 statlog
k-NN--0.077statlog
k-NN,k=1,
eucl
-0.035KG
k-NN,k=1,
manh
-0.028KG
CASTLE 0.108 0.112 statlog
CART 0.005 0.040 statlog
IndCART 0.012 0.045 statlog
NewID 0.000 0.034 statlog
AC2 0.000 0.031 statlog
Baytree 0.000 0.033 statlog
NaiveBay 0.260 0.265 statlog
CN2 0.003 0.043 statlog
C4.5 0.013 0.040 statlog
ITrule 0.445 0.455 statlog
Cal5 0.042 0.062 statlog
Kohonen 0.046 0.067 statlog
DIPOL92 0.021 0.039 statlog
Backprop 0.028 0.054 statlog
RBF 0.047 0.069 statlog
LVQ 0.019 0.046 statlog
Default 0.760 0.760 statlog


Datasets with costs


Heart disease

Statlog dataset, 2 classes, 13 attributes, 270 observations, 9-fold cross-validation.
Algorithms in italics have not incorporated costs.

The below table illustrates misclassification costs for the heart disease dataset.
The columns represent the predicted class and the rows the true class.

Cost Matrix = Absence Presence
0 1
5 0

 
AlgorithmError
(Train)
Error
(Test)
who
k-NN,k=30,eucl,std-0.344KG
NaiveBay0.3510.374statlog
Discrim0.3150.393statlog
Logdisc0.2710.396statlog
ALLOC800.3940.407statlog
Quadisc0.2740.422statlog
CASTLE0.3740.441statlog
Cal50.3300.444statlog
CART0.4630.452statlog
Cascade0.2070.467statlog
k-NN0.0000.478statlog
SMART0.2640.478statlog
DIPOL920.4290.507statlog
ITrule*0.515statlog
Baytree0.1110.526statlog
Default0.5600.560statlog
Backprop0.3810.574statlog
LVQ0.1400.600statlog
IndCART0.2610.630statlog
Kohonen0.4290.693statlog
AC20.0000.744statlog
CN20.2060.767statlog
RBF0.3030.781statlog
C4.50.4390.781statlog
NewID0.0000.844statlog
k-NN,k=1,eucl,std-0.725KG


German credit

Statlog dataset, 2 classes, 24 attributes, 1000 observations, 10-fold cross-validation
Algorithms in italics have not incorporated costs.

The table below illustrates the cost matrix for the German credit dataset. The columns are the predicted class and the rows the true class.

good bad
good 0 1
bad 5 0

 
Algorithm Error
(Train)
Error
(Test)
who
Discrim0.5090.535statlog
Quadisc0.4310.619statlog
Logdisc0.4990.538statlog
SMART0.3890.601statlog
ALLOC800.5970.584statlog
k-NN0.0000.694statlog
k-NN,k=17, eucl,std-0.411KG
CASTLE0.5820.583statlog
CART0.5810.613statlog
IndCART0.069 0.761 statlog
NewID 0.000 0.925 statlog
AC2 0.000 0.878 statlog
Baytree 0.126 0.778 statlog
NaiveBay 0.600 0.703 statlog
CN2 0.000 0.856 statlog
C4.5 0.640 0.985 statlog
ITrule * 0.879 statlog
Cal5 0.600 0.603 statlog
Kohonen 0.689 1.160 statlog
DIPOL92 0.574 0.599 statlog
Backprop 0.446 0.772 statlog
RBF 0.848 0.971 statlog
LVQ 0.229 0.963 statlog
Default 0.700 0.700 statlog


Włodzisław Duch, last modification 6.02.2010