Statlog Datasets: comparison of results |
4 x 4 digit dataset |
Australian credit dataset |
Chromosome |
Credit management |
German credit |
Heart |
Image segmentation |
Karhunen-Loeve digits |
Letters (16 moments) |
Satellite image |
Vehicle dataset |
Machine Learning, Neural and Statistical Classification,
D. Michie, D.J. Spiegelhalter, C.C. Taylor (eds), Stalog project - whole book!
More results for medical and other data.
A note of caution: comparison of different classifiers is not an easy task. Before you get into ranking of methods using the numbers presented in tables below please note the following facts.
Many results we have collected give only a single number (even results from the StatLog project!), without standard deviation. Since most classifiers may give results that differ by several percent on slightly different data partitions single numbers do not mean much.
Leave-one-out tests have been criticized as a basis for accuracy evaluation, the conclusion is that crossvalidation is safer, cf:
Kohavi, R. (1995). A study of cross-validation and bootstrap for accuracy estimation and model selection.
In: Proc. of the 14th Int. Joint Conference on Artificial Intelligence, Morgan Kaufmann, pp. 1137-1143.
Crossvalidation tests (CV) are also not ideal. Theoretically about 2/3 of results should be within a single standard deviation from the average, and 95% of results should be within two standard deviations, so in a 10-fold crossvalidation you should see very rarely reuslts that are beter or worse than 2xSTDs. Running CV several times may also give you different answers. Search for the best estimator continues. Cf:
Dietterich, T. (1998) Approximate statistical tests for comparing supervised classification learning algorithms. Neural Computation, 10 (7), 1895-1924;
Nadeau C, Bengio Y. (1999) Inference for the Generalization Error. Tech. rep. 99s-25, CIRANO, J. Machine Learning (Kluver, in print).
Even the best accuracy and variance estimation is not sufficient, since performance cannot be characterized by a single number. It should be much better to provide full Receiver Operator Curves (ROC). Combining ROC with variance estimation would be ideal. Unfortunately this still remains to be done. All we can do now is to collect some numbers in tables.
Statlog version, 2 classes, 7 attributes, no. of training cases=15000, no. of test cases=5000;
Unfortunately this data is not public; anyone knows where to find it?
Algorithm | Error (Train) | Error (Test) | who |
Discrim | 0.031 | 0.033 | statlog |
Quadisc | 0.051 | 0.050 | statlog |
Logdisc | 0.031 | 0.030 | statlog |
SMART | 0.021 | 0.020 | statlog |
ALLOC80 | 0.033 | 0.031 | statlog |
k-NN | 0.028 | 0.088 | statlog |
CASTLE | 0.051 | 0.047 | statlog |
CART | FD | FD | statlog |
IndCART | 0.010 | 0.025 | statlog |
NewID | 0.000 | 0.033 | statlog |
AC2 | 0.000 | 0.030 | statlog |
Baytree | 0.002 | 0.028 | statlog |
NaiveBay | 0.041 | 0.043 | statlog |
CN2 | 0.000 | 0.032 | statlog |
C4.5 | 0.014 | 0.046 | statlog |
ITrule | 0.041 | 0.046 | statlog |
Cal5 | 0.018 | 0.023 | statlog |
Kohonen | 0.037 | 0.043 | statlog |
DIPOL92 | 0.020 | 0.020 | statlog |
Backprop | 0.020 | 0.023 | statlog |
RBF | 0.033 | 0.031 | statlog |
LVQ | 0.024 | 0.040 | statlog |
Default | 0.051 | 0.047 | statlog |
Statlog dataset, 2 classes, 14 attributes, 690 observations, class distribution 55.5%, 44.5%.
37 missing values, A1: 12, A2: 12, A4: 6, A5: 6, A6: 9, A7: 9, A14: 13
10-fold cross-validation.
Algorithm | Error (Train) | Error (Test) | who |
Cal5 | 0.132 | 0.131 | statlog |
k-NN,k=18,manh,std | --- | 0.136 | KG |
ITrule | 0.162 | 0.137 | statlog |
w-NN,k=18,manh, simplex,std | --- | 0.138 | KG |
SVM Gauss | -- | 0.138±0.041 | C=0.1, s=0.01, over 460 SV |
Discrim | 0.139 | 0.141 | statlog |
DIPOL92 | 0.139 | 0.141 | statlog |
SSV 3 nodes | --- | 0.142±0.040 | Ghostminer, WD, uses 5 features. |
SSV 3 nodes | --- | 0.145±0.035 | Ghostminer, WD, uses F8 only! |
C4.5 | -- | 0.145±0.007 | statlog |
CART | 0.145 | 0.145 | statlog |
RBF | 0.107 | 0.145 | statlog |
SVM lin | -- | 0.148±0.030 | C=1, over 190 SV |
CASTLE | 0.144 | 0.148 | statlog |
NaiveBay | 0.136 | 0.151 | statlog |
SVM Gauss | -- | 0.152±0.032 | C=1, over 290 SV |
IndCART | 0.081 | 0.152 | statlog |
k-NN k=11,std,eucl | -- | 0.152 | KG |
Backprop | 0.087 | 0.154 | statlog |
C4.5 | 0.099 | 0.155 | statlog |
k-NN,k=11,fec.sel, eucl,std | -- | 0.156 | KG |
SMART | 0.090 | 0.158 | statlog |
Baytree | 0.000 | 0.171 | statlog |
k-NN | -- | 0.181 | statlog |
NewID | 0.000 | 0.181 | statlog |
AC2 | 0.000 | 0.181 | statlog |
LVQ | 0.065 | 0.197 | statlog |
ALLOC80 | 0.194 | 0.201 | statlog |
CN2 | 0.001 | 0.204 | statlog |
Quadisc | 0.185 | 0.207 | statlog |
Default | 0.440 | 0.440 | statlog |
Kohonen | Failed | Failed | statlog |
Statlog dataset, 10 classes, 16 attributes (train,test)=(9000,9000) observations)
Algorithm | Error (Train) | Error (Test) | who |
Discrim | 0.111 | 0.114 | statlog |
Quadisc | 0.052 | 0.054 | statlog |
Logdisc | 0.079 | 0.086 | statlog |
SMART | 0.096 | 0.104 | statlog |
ALLOC80 | 0.066 | 0.068 | statlog |
k-NN | 0.016 | 0.047 | statlog |
CASTLE | 0.180 | 0.170 | statlog |
CART | 0.180 | 0.160 | statlog |
IndCART | 0.011 | 0.154 | statlog |
NewID | 0.080 | 0.150 | statlog |
AC2 | * | 0.155 | statlog |
Baytree | 0.015 | 0.140 | statlog |
NaiveBay | 0.220 | 0.233 | statlog |
CN2 | 0.000 | 0.134 | statlog |
C4.5 | 0.041 | 0.149 | statlog |
ITrule | * | 0.222 | statlog |
Cal5 | 0.118 | 0.220 | statlog |
Kohonen | 0.051 | 0.075 | statlog |
DIPOL92 | 0.065 | 0.072 | statlog |
Backprop | 0.072 | 0.080 | statlog |
RBF | 0.080 | 0.083 | statlog |
LVQ | 0.040 | 0.061 | statlog |
Default | 0.900 | 0.900 | statlog |
Statlog dataset, 10 classes, 40 attributes, (train,test) = (9000,9000) observations
Unfortunately this data is not public; anyone knows where to find it?
Algorithm | Error (Train) | Error (Test) | who |
Discrim | 0.070 | 0.075 | statlog |
Quadisc | 0.016 | 0.025 | statlog |
Logdisc | 0.032 | 0.051 | statlog |
SMART | 0.043 | 0.057 | statlog |
ALLOC80 | 0.000 | 0.024 | statlog |
k-NN | 0.000 | 0.020 | statlog |
CASTLE | 0.126 | 0.135 | statlog |
CART | FD | FD | statlog |
IndCART | 0.003 | 0.170 | statlog |
NewID | 0.000 | 0.162 | statlog |
AC2 | 0.000 | 0.168 | statlog |
Baytree | 0.006 | 0.163 | statlog |
NaiveBay | 0.205 | 0.223 | statlog |
CN2 | 0.036 | 0.180 | statlog |
C4.5 | 0.050 | 0.180 | statlog |
ITrule | * | 0.216 | statlog |
Cal5 | 0.128 | 0.270 | statlog |
Kohonen | FD | FD | statlog |
DIPOL92 | 0.030 | 0.039 | statlog |
Backprop | 0.041 | 0.049 | statlog |
RBF | 0.048 | 0.055 | statlog |
LVQ | 0.011 | 0.026 | statlog |
Cascade | 0.063 | 0.075 | statlog |
Default | 0.900 | 0.900 | statlog |
Statlog dataset, 4 classes, 18 attributes, 846 observations, 9-fold cross-validation
Algorithm | Error (Train) | Error (Test) | who |
Discrim | 0.202 | 0.216 | statlog |
Quadisc | 0.085 | 0.150 | statlog |
Logdisc | 0.167 | 0.192 | statlog |
SMART | 0.062 | 0.217 | statlog |
ALLOC80 | 0.000 | 0.173 | statlog |
k-NN | -- | 0.275 | statlog |
k-NN,k=4,manh,std | - | 0.272 | KG |
k-NN,k=4, manh,fec. sel,std | - | 0.283 | KG |
w-NN,k=4 manh,std simplex | 0.287 | KG | |
CASTLE | 0.545 | 0.505 | statlog |
CART | 0.284 | 0.235 | statlog |
IndCART | 0.047 | 0.298 | statlog |
NewID | 0.030 | 0.298 | statlog |
AC2 | * | 0.296 | statlog |
Baytree | 0.079 | 0.271 | statlog |
NaiveBay | 0.519 | 0.558 | statlog |
CN2 | 0.018 | 0.314 | statlog |
C4.5 | 0.065 | 0.266 | statlog |
ITrule | * | 0.324 | statlog |
Kohonen | 0.115 | 0.340 | statlog |
DIPOL92 | 0.079 | 0.151 | statlog |
Backprop | 0.168 | 0.207 | statlog |
RBF | 0.098 | 0.307 | statlog |
LVQ | 0.171 | 0.287 | statlog |
Cascade | 0.263 | 0.280 | statlog |
Default | 0.750 | 0.750 | statlog |
Statlog dataset, 26 classes, 16 attributes, (train,test) = (15000,5000) observations
Algorithm | Error (Train) | Error (Test) | who |
ALLOC80 | 0.065 | 0.064 | statlog |
k-NN | 0.000 | 0.068 | statlog |
LVQ | 0.057 | 0.079 | statlog |
Quadisc | 0.101 | 0.113 | statlog |
CN2 | 0.021 | 0.115 | statlog |
Baytree | 0.015 | 0.124 | statlog |
NewID | 0.000 | 0.128 | statlog |
IndCART | 0.010 | 0.130 | statlog |
C4.5 | 0.042 | 0.132 | statlog |
DIPOL92 | 0.167 | 0.176 | statlog |
RBF | 0.220 | 0.233 | statlog |
Logdisc | 0.234 | 0.234 | statlog |
CASTLE | 0.237 | 0.245 | statlog |
AC2 | 0.000 | 0.245 | statlog |
Kohonen | 0.218 | 0.252 | statlog |
Cal5 | 0.158 | 0.253 | statlog |
SMART | 0.287 | 0.295 | statlog |
Discrim | 0.297 | 0.302 | statlog |
Backprop | 0.323 | 0.327 | statlog |
NaiveBay | 0.516 | 0.529 | statlog |
ITrule | 0.585 | 0.594 | statlog |
Default | 0.955 | 0.960 | statlog |
CART | FD | FD | statlog |
Statlog dataset, 24 classes, 16 attributes, (train,test) = (20000,20000) observations; unfortunately the dataset is not public! Anyone knows where to find it?
Algorithm | Error (Train) | Error (Test) | who |
Discrim | 0.073 | 0.107 | statlog |
Quadisc | 0.046 | 0.084 | statlog |
Logdisc | 0.079 | 0.131 | statlog |
SMART | 0.082 | 0.128 | statlog |
ALLOC80 | 0.192 | 0.253 | statlog |
k-NN | 0.000 | 0.123 | statlog |
CASTLE | 0.129 | 0.178 | statlog |
CART | FD | FD | statlog |
IndCART | 0.007 | 0.173 | statlog |
NewID | 0.000 | 0.176 | statlog |
AC2 | 0.000 | 0.234 | statlog |
Baytree | 0.034 | 0.164 | statlog |
NaiveBay | 0.260 | 0.324 | statlog |
CN2 | 0.010 | 0.150 | statlog |
C4.5 | 0.038 | 0.175 | statlog |
ITrule | 0.681 | 0.697 | statlog |
Cal5 | 0.142 | 0.244 | statlog |
Kohonen | 0.109 | 0.174 | statlog |
DIPOL92 | 0.049 | 0.091 | statlog |
Backprop | FD | FD | statlog |
RBF | 0.087 | 0.129 | statlog |
LVQ | 0.067 | 0.121 | statlog |
Default | 0.956 | 0.956 | statlog |
Statlog dataset, 6 classes, 36 attributes, (train,test)=(4435,2000) observations
Algorithm | Error (Train) | Error (Test) | who |
k-NN | 0.089 | 0.094 | statlog |
k-NN,k=2,3, eucl | - | 0.097 | KG |
LVQ | 0.048 | 0.105 | statlog |
DIPOL92 | 0.051 | 0.111 | statlog |
RBF | 0.111 | 0.121 | statlog |
ALLOC80 | 0.036 | 0.132 | statlog |
CART | 0.079 | 0.138 | statlog |
IndCART | 0.023 | 0.138 | statlog |
Backprop | 0.112 | 0.139 | statlog |
Baytree | 0.020 | 0.147 | statlog |
CN2 | 0.010 | 0.150 | statlog |
C4.5 | 0.040 | 0.150 | statlog |
NewID | 0.067 | 0.150 | statlog |
Cal5 | 0.125 | 0.151 | statlog |
Quadisc | 0.106 | 0.155 | statlog |
AC2 | * | 0.157 | statlog |
SMART | 0.123 | 0.159 | statlog |
Logdisc | 0.119 | 0.163 | statlog |
Discrim | 0.149 | 0.171 | statlog |
Kohonen | 0.101 | 0.179 | statlog |
Cascade | 0.112 | 0.163 | statlog |
CASTLE | 0.186 | 0.194 | statlog |
Default | 0.758 | 0.769 | statlog |
ITrule | FD | FD | statlog |
Statlog dataset, 7 classes, 11 attributes, 2310 observations, 10-fold cross-validation
Algorithm | Error (Train) | Error (Test) | who |
Discrim | 0.112 | 0.116 | statlog |
Quadisc | 0.155 | 0.157 | statlog |
Logdisc | 0.098 | 0.109 | statlog |
SMART | 0.039 | 0.052 | statlog |
ALLOC80 | 0.033 | 0.030 | statlog |
k-NN | -- | 0.077 | statlog |
k-NN,k=1, eucl | - | 0.035 | KG |
k-NN,k=1, manh | - | 0.028 | KG |
CASTLE | 0.108 | 0.112 | statlog |
CART | 0.005 | 0.040 | statlog |
IndCART | 0.012 | 0.045 | statlog |
NewID | 0.000 | 0.034 | statlog |
AC2 | 0.000 | 0.031 | statlog |
Baytree | 0.000 | 0.033 | statlog |
NaiveBay | 0.260 | 0.265 | statlog |
CN2 | 0.003 | 0.043 | statlog |
C4.5 | 0.013 | 0.040 | statlog |
ITrule | 0.445 | 0.455 | statlog |
Cal5 | 0.042 | 0.062 | statlog |
Kohonen | 0.046 | 0.067 | statlog |
DIPOL92 | 0.021 | 0.039 | statlog |
Backprop | 0.028 | 0.054 | statlog |
RBF | 0.047 | 0.069 | statlog |
LVQ | 0.019 | 0.046 | statlog |
Default | 0.760 | 0.760 | statlog |
Statlog dataset, 2 classes, 13 attributes, 270 observations, 9-fold cross-validation.
Algorithms in italics have not incorporated costs.
The below table illustrates misclassification costs for the heart disease
dataset.
The columns represent the predicted class and the rows the true class.
Cost Matrix = | Absence | Presence |
0 | 1 | |
5 | 0 |
Algorithm | Error (Train) | Error (Test) |
who |
k-NN,k=30,eucl,std | - | 0.344 | KG |
NaiveBay | 0.351 | 0.374 | statlog |
Discrim | 0.315 | 0.393 | statlog |
Logdisc | 0.271 | 0.396 | statlog |
ALLOC80 | 0.394 | 0.407 | statlog |
Quadisc | 0.274 | 0.422 | statlog |
CASTLE | 0.374 | 0.441 | statlog |
Cal5 | 0.330 | 0.444 | statlog |
CART | 0.463 | 0.452 | statlog |
Cascade | 0.207 | 0.467 | statlog |
k-NN | 0.000 | 0.478 | statlog |
SMART | 0.264 | 0.478 | statlog |
DIPOL92 | 0.429 | 0.507 | statlog |
ITrule | * | 0.515 | statlog |
Baytree | 0.111 | 0.526 | statlog |
Default | 0.560 | 0.560 | statlog |
Backprop | 0.381 | 0.574 | statlog |
LVQ | 0.140 | 0.600 | statlog |
IndCART | 0.261 | 0.630 | statlog |
Kohonen | 0.429 | 0.693 | statlog |
AC2 | 0.000 | 0.744 | statlog |
CN2 | 0.206 | 0.767 | statlog |
RBF | 0.303 | 0.781 | statlog |
C4.5 | 0.439 | 0.781 | statlog |
NewID | 0.000 | 0.844 | statlog |
k-NN,k=1,eucl,std | - | 0.725 | KG |
Statlog dataset, 2 classes, 24 attributes, 1000 observations, 10-fold cross-validation
Algorithms in italics have not incorporated costs.
The table below illustrates the cost matrix for the German credit dataset. The columns are the predicted class and the rows the true class.
good | bad | |
good | 0 | 1 |
bad | 5 | 0 |
Algorithm | Error (Train) | Error (Test) | who |
Discrim | 0.509 | 0.535 | statlog |
Quadisc | 0.431 | 0.619 | statlog |
Logdisc | 0.499 | 0.538 | statlog |
SMART | 0.389 | 0.601 | statlog |
ALLOC80 | 0.597 | 0.584 | statlog |
k-NN | 0.000 | 0.694 | statlog |
k-NN,k=17, eucl,std | - | 0.411 | KG |
CASTLE | 0.582 | 0.583 | statlog |
CART | 0.581 | 0.613 | statlog |
IndCART | 0.069 | 0.761 | statlog |
NewID | 0.000 | 0.925 | statlog |
AC2 | 0.000 | 0.878 | statlog |
Baytree | 0.126 | 0.778 | statlog |
NaiveBay | 0.600 | 0.703 | statlog |
CN2 | 0.000 | 0.856 | statlog |
C4.5 | 0.640 | 0.985 | statlog |
ITrule | * | 0.879 | statlog |
Cal5 | 0.600 | 0.603 | statlog |
Kohonen | 0.689 | 1.160 | statlog |
DIPOL92 | 0.574 | 0.599 | statlog |
Backprop | 0.446 | 0.772 | statlog |
RBF | 0.848 | 0.971 | statlog |
LVQ | 0.229 | 0.963 | statlog |
Default | 0.700 | 0.700 | statlog |