Datasets used for classification: comparison of results

Cite as: W. Duch, Datasets used for classification: comparison of results. Department of Informatics , Nicolaus Copernicus University, 2010. PDF preprint

Before using any new dataset it should be described here!
Results from the Statlog project are here.
Logical rules derived for data are here.


MedicalAppendicitis | Breast cancer (Wisconsin) | Breast Cancer (Ljubljana) | Diabetes (Pima Indian) | Heart disease (Cleveland) | Heart disease (Statlog version) | Hepatitis | Hypothyroid | Hepatobiliary disorders |

Other datasetsIonosphere | Satellite image dataset (Statlog version) | Sonar | Telugu Vovel | Vovel | Wine | Other data: Glass, DNA |
More results for   Statlog datasets.


A note of caution: comparison of different classifiers is not an easy task. Before you get into ranking of methods using the numbers presented in tables below please note the following facts. Here relatively small data are analyzed, and simple classification methods are used, but not all task require large deep learning systems, sometimes simpler is better.

Many results we have collected give only a single number (even results from the StatLog project!), without standard deviation. Since most classifiers may give results that differ by several percent on slightly different data partitions single numbers do not mean much.

Leave-one-out tests have been criticized as a basis for accuracy evaluation, the conclusion is that crossvalidation is safer, cf:
Kohavi, R. (1995). A study of cross-validation and bootstrap for accuracy estimation and model selection. In: Proc. of the 14th Int. Joint Conference on Artificial Intelligence, Morgan Kaufmann, pp. 1137-1143.

Crossvalidation tests (CV) are also not ideal. Theoretically about 2/3 of results should be within a single standard deviation from the average, and 95% of results should be within two standard deviations, so in a 10-fold crossvalidation you should see very rarely reuslts that are beter or worse than 2xSTDs. Running CV several times may also give you different answers. Search for the best estimator continues. Cf:
Dietterich, T. (1998) Approximate statistical tests for comparing supervised classification learning algorithms. Neural Computation, 10 (7), 1895-1924;
Nadeau C, Bengio Y. (1999) Inference for the Generalization Error. Tech. rep. 99s-25, CIRANO, J. Machine Learning (Kluver, in print).

Even the best accuracy and variance estimation is not sufficient, since performance cannot be characterized by a single number. It should be much better to provide full Receiver Operator Curves (ROC). Combining ROC with variance estimation would be ideal.
Unfortunately this still remains to be done. All we can do now is to collect some numbers in tables.

Our results are obtained usually with the GhostMiner package, developed in our group.
Some publications with results are on my page.

TuneIT, Testing Machine Learning & Data Mining Algorithms - Automated Tests, Repeatable Experiments, Meaningful Results.


Appendicitis.

106 vectors, 8 attributes, two classes (85 acute a. +21 other, or 80.2+19.8%), data from Shalom Weiss;
Results obtained with the leave-one-out test, % of accuracy given
Attribute names: WBC1, MNEP, MNEA, MBAP, MBAA, HNEP, HNEA

Method
Accuracy %
Reference
PVM (logical rules) 89.6Weiss, Kapouleas
C-MLP2LN (logical rules) 89.6± ?our
k-NN, stand. Manhatan, k=8,9,22-25 
k=4,5, stand. Euclid, f2+f4 removed
88.7± 6.0our (WD/KG)
9-NN, stand. Euclides 87.7 our (KG)
RIAC (prob. inductive) 86.9 Hamilton et.al
1-NN, stand. Euclides, f2+f4 rem86.8 our (WD/KG)
MLP+backpropagation 85.8 Weiss, Kapouleas
CART, C4.5 (dec. trees) 84.9 Weiss, Kapouleas
FSM 84.9 our (RA)
Bayes rule (statistical)  83.0 Weiss, Kapouleas

For 90% accuracy and p=0.95 confidence level 2-tailed bounds are: [82.8%,94.4%]

S.M. Weiss, I. Kapouleas, "An empirical comparison of pattern recognition, neural nets and machine learning classification methods", in: J.W. Shavlik and T.G. Dietterich, Readings in Machine Learning, Morgan Kauffman Publ, CA 1990
H.J. Hamilton, N. Shan, N. Cercone, RIAC: a rule induction algorithm based on approximate classification, Tech. Rep. CS 96-06, Regina University 1996.

C-MLP2LN  (logical rules) only estimated  l-o-o  since the rules are like PVM.
3 crisp logical rules, overall  91.5% accuracy

Results for 10-fold stratified crossvalidation

Method Accuracy % Reference
NBC+WX+G(WX) ??.5± 7.7 TM-GM
NBC+G(WX) ??.2± 6.7 TM-GM
kNN auto+G(WX) Eukl ??.2± 6.7 TM-GM
C-MLP2LN 89.6 our logical rules
20-NN, stand. Eukl f 4,1,7 89.3± 8.6 our (KG); feature sel. from CV on the whole data set
SSV beam leaves 88.7± 8.5 WD
SVM linear C=1 88.1± 8.6 WD
6-NN, stand. Eukl. 88.0± 7.9 WD
SSV default 87.8± 8.7 WD
SSV beam pruning 86.9± 9.8 WD
kNN, k=auto, Eucl 86.7± 6.6 WD
FSM, a=0.9, Gauss, cluster 86.1± 8.8 WD-GM
NBC 85.9± 10.2 TM-GM
VSS 1 neuron, 4 it 84.9± 7.4 WD/MK
SVM Gauss C=32, s=0.1 84.4± 8.2 WD
MLP+BP (Tooldiag) 83.9 Rafał Adamczak
RBF (Tooldiag) 80.2 Rafał Adamczak

Maszczyk T, Duch W, Support Feature Machine, WCCI 2010 (submitted).


Wisconsin breast cancer.

From UCI repository, 699 cases, 9 attributes, two classes, 458 (65.5%) & 241 (34.5%).
Results obtained with the leave-one-out test, % of accuracy given.

F6 has 16 missing values, removing these vectors leaves 683 examples.

Method
Accuracy %
Reference
FSM 98.3  our (RA)
3-NN stand Manhatan 97.1 our (KG)
21-NN stand. Euclidean 96.9 our (KG)
C4.5 (decision tree) 96.0 Hamilton et.al
RIAC (prob. inductive) 95.0 Hamilton et.al

H.J. Hamilton, N. Shan, N. Cercone, RIAC: a rule induction algorithm based on approximate classification, Tech. Rep. CS 96-06, Regina University 1996.

Results obtained with the 10-fold crossvalidation, 16 vectors with F6 values missing removed, 683 samples left, % of accuracy given.

method
Accuracy %
Reference
Naive MFT 97.1 Opper, Winther, L-1-O est. 97.3
SVM Gauss, C=1,s=0.1 97.0± 2.3 WD-GM
SVM (10xCV) 96.9 Opper, Winther
SVM lin, opt C 96.9± 2.2 WD-GM, same with Minkovsky kernel
Cluster means, 2 prototypes 96.5± 2.2 MB
Default, majority 65.5 --

Results obtained with the 10-fold crossvalidation, % of accuracy given, all data, missing vlues handled in different ways.

method
Accuracy %
Reference
NB + kernel est 97.5± 1.8 WD, WEKA, 10X10CV
SVM (5xCV) 97.2 Bennet and Blue
kNN with DVDM distance 97.1 our (KG)
GM k-NN, k=3, raw, Manh 97.0± 2.1 WD, 10X10CV
GM k-NN, k=opt, raw, Manh 97.0± 1.7 WD, 10CV only
VSS, 8 it/2 neurons 96.9± 1.8 WD/MK; 98.1% train
FSM-Feature Space Mapping 96.9± 1.4 RA/WD, a=.99 Gaussian
Fisher linear discr. anal 96.8 Ster, Dobnikar
MLP+BP 96.7 Ster, Dobnikar
MLP+BP (Tooldiag) 96.6 Rafał Adamczak
LVQ 96.6 Ster, Dobnikar
kNN, Euclidean/Manhattan f. 96.6 Ster, Dobnikar
SNB, semi-naive Bayes (pairwise dependent) 96.6 Ster, Dobnikar
SVM lin, opt C 96.4± 1.2 WD-GM, 16 missing with -10
VSS, 8 it/1 neuron! 96.4± 2.0 WD/MK, train 98.0%
GM IncNet 96.4± 2.1 NJ/WD; FKF, max. 3 neurons
NB - naive Bayes (completly independent) 96.4 Ster, Dobnikar
SSV opt nodes, 3CV int 96.3± 2.2 WD/GM; training 96.6± 0.5
IB1 96.3± 1.9 Zarndt
DB-CART (decision tree) 96.2 Shang, Breiman
GM SSV Tree, opt nodes BFS 96.0± 2.9 WD/KG (beam search 94.0)
LDA - linear discriminant analysis 96.0 Ster, Dobnikar
OC1 DT (5xCV) 95.9 Bennet and Blue
RBF (Tooldiag) 95.9 Rafał Adamczak
GTO DT (5xCV) 95.7 Bennet and Blue
ASI - Assistant I tree 95.6 Ster, Dobnikar
MLP+BP (Weka) 95.4± 0.2 TW/WD
OCN2 95.2± 2.1 Zarndt
IB3 95.0± 4.0 Zarndt
MML tree 94.8± 1.8 Zarndt
ASR - Assistant R (RELIEF criterion) tree 94.7 Ster, Dobnikar
C4.5 tree 94.7± 2.0 Zarndt
LFC, Lookahead Feature Constr binary tree 94.4 Ster, Dobnikar
CART tree 94.4± 2.4 Zarndt
ID3 94.3± 2.6 Zarndt
C4.5 (5xCV) 93.4 Bennet and Blue
C 4.5 rules 86.7± 5.9 Zarndt
Default, majority 65.5 --
QDA - quadratic discr anal 34.5 Ster, Dobnikar

For 97% accuracy and p=0.95 confidence level 2-tailed bounds are: [95.5%,98.0%]

K.P. Bennett, J. Blue, A Support Vector Machine Approach to Decision Trees, R.P.I Math Report No. 97-100, Rensselaer Polytechnic Institute, Troy, NY, 1997

N. Shang, L. Breiman, ICONIP'96, p.133

B. Ster and A. Dobnikar, Neural networks in medical diagnosis: Comparison with other methods. In A. Bulsari et al., editor, Proceedings of the International Conference EANN '96, pages 427-430, 1996.

F. Zarndt, A Comprehensive Case Study: An Examination of Machine Learning and Connectionist Algorithms, MSc Thesis, Dept. of Computer Science, Brigham Young University, 1995


Breast Cancer (Ljubljana data)

From UCI repository (restricted):  286 instances, 201 no-recurrence-events (70.3%), 85 recurrence-events (29.7%);
9 attributes, between 2-13 values each, 9 missing values

Results - 10xCV? Sometimes methodology was unclear;
difficult, noisy data, some methods are below the base rate (70.3%).

Method
Accuracy, % test
Reference
C-MLP2LN/SSV single rule 76.2± 0.0 WD/K. Grabczewski, stable rule
SSV Tree rule 75.7± 1.1 WD, av. from 10x10CV
MML Tree 75.3± 7.8 Zarndt
SVM Gauss, C=1, s =0.1 73.8± 4.3 WD, GM
MLP+backprop 73.5± 9.4 Zarndt
SVM Gauss, C, s opt 72.4± 5.1 WD, GM
IB1 71.8± 7.5 Zarndt
CART 71.4± 5.0 Zarndt
ODT trees 71.3± 4.2 Blanchard
SVM lin, C=opt 71.0± 4.7 WD, GM
UCN 2 70.7± 7.8 Zarndt
SFC, Stack filters 70.6± 4.2 Porter
Default, majority 70.3± 0.0 ============
SVM lin, C=1 70.0± 5.6 WD, GM
C 4.5 rules 69.7± 7.2 Zarndt
Bayes rule 69.3± 10.0 Zarndt
C 4.5 69.2± 4.9 Blanchard
Weighted networks  68-73.5 Tan, Eshelman
IB3 67.9± 7.7 Zarndt
ID3 rules 66.2± 8.5 Zarndt
AQ15 66-72 Michalski e.a.
Inductive 65-72  Clark, Niblett

For 78% accuracy and p=0.95 confidence level 2-tailed bounds are: [72.9%,82.4%]


They used leave-one-out tests and obtained:
MLP+backprop: 75.7% train, 71.5% test;
Bayes 75.9% train, 71.8% test,
CART & PVM 77.4% train, 77.1% test;
k-NN 65.3 test


Hepatitis.

From UCI repository, 155 vectors, 19 attributes,
Two classes, die with 32 (20.6%), live with 123 (79.4%).
Many missing values! F18 has 67 missing values, F15 has 29, F17 has 16 and other features between 0 and 11.

Results obtained with the leave-one-out test, % of accuracy given

Method
Accuracy %
Reference
21-NN, stand Manhattan 90.3 our (KG)
FSM 90.0 our (RA)
14-NN, stand. Euclid 89.0 our (KG)
LDA 86.4 Weiss & K
CART (decision tree) 82.7 Weiss & K
MLP+backprop  82.1 Weiss & K

MLP, CART, LDA results from (check it ?) S.M. Weiss, I. Kapouleas, "An empirical comparison of pattern recognition, neural nets and machine learning classification methods", in: J.W. Shavlik and T.G. Dietterich, Readings in Machine Learning, Morgan Kauffman Publ, CA 1990.
Other results - our own;

Results obtained with the 10-fold crossvalidation, % of accuracy given; our results with stratified crossvalidation, other results - who knows? Differences for this dataset are rather small, 0.1-0.2%.

Method
Accuracy %
Reference
Weighted 9-NN 92.9± ? Karol Grudziński
18-NN, stand. Manhattan 90.2± 0.7 Karol Grudziński
FSM with rotations  89.7± ? Rafał Adamczak
15-NN, stand. Euclidean 89.0± 0.5 Karol Grudziński
VSS 4 neurons, 5 it 86.5± 8.8 WD/MK, train 97.1
FSM without rotations  88.5 Rafał Adamczak
LDA, linear discriminant analysis 86.4 Stern & Dobnikar
Naive Bayes and Semi-NB 86.3 Stern & Dobnikar
IncNet 86.0 Norbert Jankowski
QDA, quadratic discriminant analysis 85.8 Stern & Dobnikar
1-NN 85.3± 5.4 Stern & Dobnikar, std added by WD
VSS 2 neurons, 5 it 85.1± 7.4 WD/MK, train 95.0
ASR 85.0 Stern & Dobnikar
Fisher discriminant analysis 84.5 Stern & Dobnikar
LVQ 83.2 Stern & Dobnikar
CART (decision tree) 82.7 Stern & Dobnikar
MLP with BP 82.1 Stern & Dobnikar
ASI 82.0 Stern & Dobnikar
LFC 81.9 Stern & Dobnikar
RBF (Tooldiag) 79.0 Rafał Adamczak
MLP+BP (Tooldiag) 77.4 Rafał Adamczak

Results on BP, LVQ, ..., SNB are from: B. Ster and A. Dobnikar, Neural networks in medical diagnosis: Comparison with other methods. In A. Bulsari et al., editor, Proceedings of the International Conference EANN '96, pages 427-430, 1996.
Our good results reflect superior handling of missing values ?
Duch W, Grudziński K (1998) A framework for similarity-based methods. Second Polish Conference on Theory and Applications of Artificial Intelligence, Lodz, 28-30 Sept. 1998, pp. 33-60
Weighted kNN: Duch W, Grudzinski K and Diercksen G.H.F (1998) Minimal distance neural methods. World Congress of Computational Intelligence, May 1998, Anchorage, Alaska, IJCNN'98 Proceedings, pp. 1299-1304


Statlog version of Cleveland Heart disease.

13 attributes (extracted from 75), no missing values.
270=150+120 observations selected from the 303 cases (Cleveland Heart).

Attribute Information:
 

1. age 2. sex 3. chest pain type  (4 values) 4. resting blood pressure 5. serum cholestorol 
in mg/dl
6. fasting blood sugar 120 mg/dl 7. resting electrocardiographic results  (values 0,1,2) 8. maximum heart rate achieved 9. exercise induced angina 10. oldpeak = ST depression induced by exercise relative to rest
11. the slope of the peak exercise ST segment 12. number of major vessels (0-3) colored 
by flouroscopy
13.  thal:  3 = normal;  6 = fixed defect;  7 = reversible defect

Attributes types: Real: 1,4,5,8,10,12;  Ordered:11,   Binary: 2,6,9   Nominal:7,3,13
Classes: Absence (1) or presence (2) of heart disease;

In Statlog experiments on heart data cost or risk matrix has been used with 9-fold crossvalidation, only cost values are given.
Results below are obtained with the 10-fold crossvalidation, % of accuracy given, no risk matrix

Method
Accuracy %
Reference
Lin SVM 2D QCP 85.9± 5.5 MG, 10xCV
kNN auto+WX ??.8± 5.6 TM GM 10xCV
SVM Gauss+WX+G(WX), C=1 s=2-5??.8± 6.4 TM GM 10xCV
SVM lin, C=0.01 84.9± 7.9 WD, GM 10x(9xCV)
SFM, G(WX), default C=1 ??± 5.1 TM, GM 10xCV
Naive-Bayes 84.5± 6.3 TM, GM 10xCV
Naive-Bayes 83.6 RA, WEKA
SVML default C=1 82.5± 6.4 TM, GM 10xCV
K* 76.7 WEKA, RA
IB1c 74.0 WEKA, RA
1R 71.4 WEKA, RA
T2 68.1 WEKA, RA
MLP+BP 65.6 ToolDiag, RA 
FOIL 64.0 WEKA, RA
RBF 60.0 ToolDiag, RA
InductH 58.5 WEKA, RA
Base rate (majority classifier) 55.7
IB1-4 50.0 ToolDiag, RA

Results for Heart and other Statlog datasest are collected here.


Cleveland heart disease.

From UCI repository, 303 cases, 13 attributes (4 cont, 9 nominal), 7 vectors with missing values ?
2 (no, yes) or 5 classes (no, degree 1, 2, 3, 4).
Class distribution: 164 (54.1%) no, 55+36+35+13 yes (45.9%) with disease degree 1-4.

Results obtained with the leave-one-out test, % of accuracy given, 2 classes used.

Method
Accuracy %
Reference
LDA
84.5
Weiss ?
25-NN, stand, Euclid
83.6± 0.5
WD/KG repeat??
C-MLP2LN
82.5
RA, estimated?
FSM
82.2
Rafał Adamczak
MLP+backprop
81.3
Weiss ?
CART
80.8
Weiss ?

MLP, CART, LDA where are these results from ???
Other results - our own.

Results obtained with the 10-fold crossvalidation, % of accuracy given.
Ster & Dobnikar reject 6 vectors (leaving 297) with missing values.
We use all 303 vectors replacing missing values by means for their class; in KNN we have used Stalog convention, 297 vectors

Method
Accuracy %
Reference
IncNet+transformations 90.0 Norbert Jankowski; check again!
28-NN, stand, Euclid, 7 features 85.1± 0.5 WD/KG
LDA 84.5 Ster & Dobnikar
Fisher discriminant analysis 84.2 Ster & Dobnikar
k=7, Euclid, std 84.2± 6.6   WD, GhostMiner
16-NN, stand, Euclid 84± 0.6 WD/KG
FSM, 82.4-84% on test only 84.0 Rafał Adamczak
k=1:10, Manhattan, std 83.8± 5.3 WD, GhostMiner
Naive Bayes 82.5-83.4 Rafał; Ster, Dobnikar
SNB 83.1 Ster & Dobnikar
LVQ 82.9 Ster & Dobnikar
GTO DT (5xCV) 82.5 Bennet and Blue
kNN, k=19, Eculidean 82.1± 0.8 Karol Grudziński
k=7, Manhattan, std 81.8± 10.0 WD, GhostMiner
SVM (5xCV) 81.5 Bennet and Blue
kNN (k=1? raw data?) 81.5 Ster & Dobnikar
MLP+BP (standarized) 81.3 Ster, Dobnikar, Rafał Adamczak
Cluster means, 2 prototypes 80.8± 6.4 MB
CART 80.8 Ster & Dobnikar
RBF (Tooldiag, standarized) 79.1 Rafał Adamczak
Gaussian EM, 60 units 78.6 Stensmo & Sejnowski
ASR 78.4 Ster & Dobnikar
C4.5 (5xCV) 77.8 Bennet and Blue
IB1c (WEKA) 77.6 Rafał Adamczak
QDA 75.4 Ster & Dobnikar
LFC 75.1 Ster & Dobnikar
ASI 74.4 Ster & Dobnikar
K* (WEKA) 74.2 Rafał Adamczak
OC1 DT (5xCV) 71.7 Bennet and Blue
1 R (WEKA) 71.0 Rafał Adamczak
T2 (WEKA) 69.0 Rafał Adamczak
FOIL (WEKA) 66.4 Rafał Adamczak
InductH (WEKA) 61.3 Rafał Adamczak
Default, majority 54.1 == baserate ==
C4.5 rules 53.8± 5.9 Zarndt
IB1-4 (WEKA) 46.2 Rafał Adamczak

For 85% accuracy and p=0.95 confidence level 2-tailed bounds are: [80.5%,88.6%]

Results obtained with BP, LVQ, ..., SNB are from: B. Ster and A. Dobnikar, Neural networks in medical diagnosis: Comparison with other methods. In: A. Bulsari et al., editor, Proceedings of the International Conference EANN '96, pages 427-430, 1996.

Magnus Stensmo and Terrence J. Sejnowski, A Mixture Model System for Medical and Machine Diagnosis, Advances in Neural Information Processing Systems 7 (1995) 1077-1084

Kristin P. Bennett, J. Blue, A Support Vector Machine Approach to Decision Trees, R.P.I Math Report No. 97-100, Rensselaer Polytechnic Institute, Troy, NY, 1997

Other results for this dataset (methodology sometimes uncertain):
D. Wettschereck, averaging 25 runs with 70% train and 30% test, variants of k-NN with different metric functions and scaling.

David Aha & Dennis Kibler - From UCI repository past usage

Method Accuracy % Reference
k-NN, Value Distance Metric (VDM)
82.6
D. Wettschereck
k-NN, Euclidean
82.4± 0.8
D. Wettschereck
k-NN, Variable Similarity Metric
82.4
D. Wettschereck
k-NN, Modified VDM
83.1
D. Wettschereck
Other k-NN variants
< 82.4 
D. Wettschereck
k-NN, Mutual Information
81.8
D. Wettschereck
CLASSIT (hierarchical clustering)
78.9
Gennari, Langley, Fisher
NTgrowth (instance-based)
77.0
Aha & Kibler
C4
74.8
Aha & Kibler
Naive Bayes
82.8± 1.3
Friedman et.al, 5xCV, 296 vectors

Gennari, J.H., Langley, P, Fisher, D. (1989). Models of incremental concept formation. Artificial Intelligence, 40, 11-61.

Friedman N, Geiger D, Goldszmit M (1997). Bayesian networks classifiers. Machine Learning 29: 131--163


Diabetes.

From the UCI repository, dataset "Pima Indian diabetes":
2 classes, 8 attributes, 768 instances, 500 (65.1%) negative (class1), and 268 (34.9%) positive tests for diabetes. class2.
All patients were females at least 21 years old of Pima Indian heritage.

Attributes used:
1. Number of times pregnant
2. Plasma glucose concentration a 2 hours in an oral glucose tolerance test
3. Diastolic blood pressure (mm Hg)
4. Triceps skin fold thickness (mm)
5. 2-Hour serum insulin (mu U/ml)
6. Body mass index (weight in kg/(height in m)^2)
7. Diabetes pedigree function
8. Age (years)

Results obtained with the 10-fold crossvalidation, % of accuracy given; Statlog results are with 12-fold crossvalidation

Method Accuracy %  Reference
Logdisc 77.7 Statlog
IncNet 77.6 Norbert Jankowski
DIPOL92 77.6 Statlog
Linear Discr. Anal. 77.5-77.2 Statlog; Ster & Dobnikar
SVM, linear, C=0.01 77.5± 4.2 WD-GM, 10XCV averaged 10x
SVM, Gauss, C, sigma opt 77.4± 4.3 WD-GM, 10XCV averaged 10x
SMART 76.8 Statlog
GTO DT (5xCV) 76.8 Bennet and Blue
kNN, k=23, Manh, raw, W 76.7± 4.0 WD-GM, feature weighting 3CV
kNN, k=1:25, Manh, raw 76.6± 3.4 WD-GM, most cases k=23
ASI 76.6 Ster & Dobnikar
Fisher discr. analysis 76.5 Ster & Dobnikar
MLP+BP 76.4 Ster & Dobnikar
MLP+BP 75.8± 6.2 Zarndt
LVQ 75.8 Ster & Dobnikar
LFC 75.8 Ster & Dobnikar
RBF 75.7 Statlog
NB 75.5-73.8 Ster & Dobnikar; Statlog
kNN, k=22, Manh 75.5 Karol Grudziński
MML 75.5± 6.3 Zarndt
FSM stand. 5 feat. 75.4± 4.9 WD, 10x10 test, CC>0.15
SNB 75.4 Ster & Dobnikar
BP 75.2 Statlog
SSV DT 75.0± 3.6 WD-GM, SSV BS, node 5CV MC
kNN, k=18, Euclid, raw 74.8± 4.8 WD-GM
CART DT 74.7± 5.4 Zarndt
CART DT 74.5 Stalog
DB-CART 74.4 Shang & Breiman
ASR 74.3 Ster & Dobnikar
FSM standard 74.1± 1.1 WD, 10x10 test
ODT, dyadic trees 74.0± 2.3 Blanchard
Cluster means, 2 prototypes 73.7± 3.7 MB
SSV DT 73.7± 4.7 WD-GM, SSV BS, node 10CV strat
SFC, stacking filters 73.3± 1.9 Porter
C4.5 DT 73.0 Stalog
C4.5 DT 72.7± 6.6 Zarndt
Bayes 72.2± 6.9 Zarndt
C4.5 (5xCV) 72.0 Bennet and Blue
CART 72.8 Ster & Dobnikar
Kohonen 72.7 Statlog
C4.5 DT 72.1± 2.6 Blanchard (averaged over 100 runs)
kNN 71.9 Ster & Dobnikar
ID3 71.7± 6.6 Zarndt
IB3 71.7± 5.0 Zarndt
IB1 70.4± 6.2 Zarndt
kNN, k=1, Euclides, raw 69.4± 4.4 WD-GM
kNN 67.6 Statlog
C4.5 rules 67.0± 2.9 Zarndt
OCN2 65.1± 1.1 Zarndt
Default, majority 65.1  
QDA 59.5 Ster, Dobnikar

For 77.7% accuracy and p=0.95 confidence level 2-tailed bounds are: [74.6%,80.5%]

Results on BP, LVQ, ..., SNB are from: B. Ster and A. Dobnikar, Neural networks in medical diagnosis: Comparison with other methods. In A. Bulsari et al., editor, Proceedings of the International Conference EANN '96, pages 427-430, 1996.

Other results (with different tests):

Method
Accuracy % 
Reference
SVM (5xCV) 77.6 Bennet and Blue
C4.5 76.0± 0.9 Friedman, 5xCV
Semi-Naive Bayes 76.0± 0.8 Friedman, 5xCV
Naive Bayes 74.5± 0.9 Friedman, 5xCV
Default, majority 65.1  

Friedman N, Geiger D, Goldszmit M (1997). Bayesian networks classifiers. Machine Learning 29: 131--163

Opper/Winther use 200 training and 332 test examples (following Rippley), with TAP MFT results on test 81%, SVS at 80.1% and best NN as 77.4%.


Hypothyroid.

Thyroid, From UCI repository, dataset "ann-train.data": A Thyroid database suited for training ANNs.
3772 learning and 3428 testing examples; primary hypothyroid, compensated hypothyroid, normal.
Training: 93+191+3488 or 2.47%, 5.06%, 92.47%
Test: 73+177+3178 or 2.13%, 5.16%, 92.71%
21 attributes (15 binary, 6 continuous); 3 classes

The problem is to determine whether a patient referred to the clinic has hypothyroid. Therefore three classes are built: normal (not hypothyroid), hyperfunction and subnormal functioning. Because 92 percent of the patients are not hyperthyroid. A good classifier must be significant better than 92%.
Note: These are the data Quinlan has used in the case study of in the article "Simplifying Decision Trees" (International Journal of Man-Machine Studies (1987) 221-234)

Names: I (W.D.) have investigated this issue and after some mail exchange with Chris Mertz, who maintains the UCI repository; here is the conclusion:

1   age: continuous 2  sex: {M, F} 3   on thyroxine: logical
4   maybe on thyroxine: logical 5  on antithyroid medication: logical 6   sick - patient reports malaise: logical
7   pregnant: logical 8   thyroid surgery: logical 9  I131 treatment: logical
10 test hypothyroid: logical 11 test hyperthyroid: logical 12 on lithium: logical
13 has goitre: logical 14 has tumor: logical 15 hypopituitary: logical
16 psychological symptoms: logical 17 TSH: continuous 18 T3: continuous
19 TT4: continuous 20 T4U: continuous 21 FTI: continuous

Results:

Method
 % training
  % test 
Reference
C-MLP2LN rules+ASA 99.90 99.36 Rafał/Krzysztof/Grzegorz
CART 99.80 99.36 Weiss
PVM 99.80 99.33 Weiss
SSV beam search 99.80 99.33 WD
IncNet 99.68 99.24 Norbert
SSV opt leaves or pruning 99.7 99.1 WD
MLP init+ a,b opt. 99.5 99.1 Rafał
C-MLP2LN rules  99.7 99.0 Rafał/Krzysztof
Cascade correlation 100.0  98.5 Schiffmann
Local adapt. rates  99.6 98.5 Schiffmann
BP+genetic opt.  99.4 98.4 Schiffmann
Quickprop 99.6 98.3 Schiffmann
RPROP  99.6 98.0 Schiffmann
3-NN, Euclides, with 3 features 98.7 97.9 W.D./Karol
1-NN, Euclides, with 3 features   98.4 97.7 W.D./Karol
Best backpropagation  99.1 97.6 Schiffmann
1-NN, Euclides, 8 features used -- 97.3 Karol/W.D.
SVM Gauss, C=8 s=0.1 98.3 96.1 WD
Bayesian classif.  97.0 96.1 Weiss?
SVM Gauss, C=1 s=0.1 95.4 94.7 WD
BP+conj. gradient 94.6 93.8 Schiffmann
1-NN Manhattan, std data   93.8 Karol G./WD
SVM lin, C=1 94.1 93.3 WD
SVM Gauss, C=8 s=5 100 92.8 WD
Default, majority 250 test errors   92.7  
1-NN Manhattan, raw data   92.2 Karol G./WD

For 99.90% accuracy on training and p=0.95 confidence level 2-tailed bounds are: [99.74%,99.96%]

Most NN results from W. Schiffmann, M. Joost, R. Werner, 1993; MLP2LN and Init+a,b ours.
k-NN, PVM and CART from S.M. Weiss, I. Kapouleas, "An empirical comparison of pattern recognition, neural nets and machine learning classification methods", in: J.W. Shavlik and T.G. Dietterich, Readings in Machine Learning, Morgan Kauffman Publ, CA 1990
SVM with linear and Gaussian kernels gives quite poor results on this data.

3 crisp logical rules using TSH, FTI, T3, on_thyroxine, thyroid_surgery, TT4  give 99.3% of accuracy on the test set.


Hepatobiliary disorders

Contains medical records of 536 patients admitted to a university-affiliated Tokyo-based hospital, with four types of hepatobiliary disorders: alcoholic liver damage, primary hepatoma, liver cirrhosis and cholelithiasis. The records included results of 9 biochemical tests and sex of the patient. The same 163 cases as in [Hayashi et.al]  were used as the test data.

FSM gives about 60 Gaussian or triangular membership functions achieving accuracy of 75.5-75.8%. Rotation of these functions (i.e. introducing linear combination of inputs to the rules) does not improve this accuracy. 10-fold crossvalidation tests on the mixed, training plus test data, give similar results. The best results were obtained with the K* method based on algorithmic complexity optimization, giving 78.5% on the test set, and kNN with Manhattan distance function, k=1 and selection of features (using the leave-one-out method on the training data, features 2, 5, 6 and 9 were removed), giving 80.4% accuracy. Simulated annealing optimization  of the scaling factors for the remaining 5 features give 81.0% and optimizing scaling factors using all input features 82.8%. The scaling factors are: 0.92, 0.60, 0.91, 0.92, 0.07, 0.41, 0.55, 0.86, 0.30. Similar accuracy is obtained using multisimplex method for optimization of the scaling factors.

Method Training set Test set Reference
IB2-IB481.2-85.5 43.6-44.6 WEKA, our calculation
Naive Bayes -- 46.6 WEKA, our calculation
1R (rules) 58.4 50.3 WEKA, our calculation
T2 (rules from decision tree) 67.5 53.3 WEKA, our calculation
FOIL (inductive logic) 99 60.1 WEKA, our calculation
FSM, initial 49 crisp logical rules 83.5 63.2 FSM, our calculation
LDA (statistical) 68.4 65.0 our calculation
DLVQ (38 nodes) 100 66.0 our calculation
C4.5 decision rules 64.5 66.3 our calculation
Best fuzzy MLP model 75.5 66.3 Mitra et. al
MLP with RPROP   68.0 our calculation
Cascade Correlation   71.0 our calculation
Fuzzy neural network 100 75.5 Hayashi
C4.5 decision tree 94.4 75.5 our calculation
FSM, Gaussian functions 93 75.6 our calculation
FSM, 60 triangular functions 93 75.8 our calculation
IB1c (instance-based) -- 76.7 WEKA, our calculation
kNN, k=1, Canberra, raw 76.1 80.4 WD/SBL
K* method -- 78.5 WEKA, our calculation
1-NN, 4 features removed, Manhattan 76.9 80.4 our calculation, KG
1-NN, Canberra, raw, removed f2, 6, 8, 9 77.2 83.4 our calculation, KG

Y. Hayashi, A. Imura, K. Yoshida, Fuzzy neural expert system and its application to medical diagnosis. In: 8th International Congress on Cybernetics and Systems, New York City 1990, pp. 54-61

S. Mitra, R. De, S. Pal, Knowledge based fuzzy MLP for classification and rule generation. IEEE Transactions on Neural Networks 8, 1338-1350, 1997, a knowledge-based fuzzy MLP system gives results on the test set in the range from 33% to 66.3%, depending on the actual fuzzy model used.

W. Duch and K. Grudziński, ``Prototype Based Rules - New Way to Understand the Data,'' Int. Joint Conference on Neural Networks, Washington D.C., pp. 1858-1863, 2001. Contains best results with 1-NN, Canberra and feature selection, 83.4% on the test.


Other, non-medical data


Landsat Satellite image dataset (STATLOG version)

Training 4435 test 2000 cases, 36 semi-continuous [0 to 255] attributes (= 4 spectral bands x 9 pixels in neighborhood) and 6 decision classes: 1,2,3,4,5 and 7 (class 6 has been removed because of doubts about the validity of this class).

The StatLog database consists of the multi-spectral values of pixels in 3x3 neighborhoods in a satellite image, and the classification associated with the central pixel in each neighborhood. The aim is to predict this classification, given the multi-spectral values. In the sample database, the class of a pixel is coded as a number.

Method
% training
% test
Time 
train
Time test
MLP+SCG
96.0
91.0
reg alfa=0.5, 36 hidden nodes, 1400 it
fast; WD
k-NN
--
90.9
auto-k=3, Manhattan, std data
GM 2.0
k-NN
91.1
90.6
2105, Statlog
944; parametry?
k-NN
--
90.4
auto-k=5, Euclidean, std data
GM 2.0
k-NN
--
90.0
k=1, Manhattan, std data, no training
fast, GM 2.0
FSM
95.1
89.7
std data, a=0.95
fast, GM 2.0; best NN result
LVQ
95.2
89.5
1273
44
k-NN
--
89.4
k=1, Euclidean, std data, no training
fast, GM 2.0
Dipol92
94.9
88.9
746
111
MLP+SCG
94.4
88.5
5000 it; active learning+reg a=0.5, 8-12 hidden
fast; WD
SVM
91.6
88.4
std data, Gaussian kernel
fast, GM 2.0; unclassified 4.3%
Radial
88.9
87.9
564
74
Alloc80
96.4
86.8
63840
28757
IndCart
97.7
86.2
2109
9
CART
92.1
86.2
330
14
MLP+BP
88.8
86.1
72495
53
Bayesian Tree
98.0
85.3
248
10
C4.5
96.0
85.0
434
1
New ID
93.3
85.0
226
53
QuaDisc 
89.4
84.5
157
53
SSV
90.9
84.3
default par.
very fast, GM 2.0
Cascade 
88.8
83.7
7180
1
Log DA, Disc 
88.1
83.7
4414
41
LDA, Discrim 
85.1
82.9
68
12
Kohonen 
89.9
82.1
12627
129
Bayes 
69.2
71.3
75
17

The original database was generated from Landsat Multi-Spectral Scanner image data. The sample database was generated taking a small section (82 rows and 100 columns) from the original data. One frame of Landsat MSS imagery consists of four digital images of the same scene in different spectral bands. Two of these are in the visible region (corresponding approximately to green and red regions of the visible spectrum) and two are in the (near) infra-red. Each pixel is a 8-bit binary word, with 0 corresponding to black and 255 to white. The spatial resolution of a pixel is about 80m x 80m. Each image contains 2340 x 3380 such pixels.

The database is a (tiny) sub-area of a scene, consisting of 82 x 100 pixels. Each line of data corresponds to a 3x3 square neighborhood of pixels completely contained within the 82x100 sub-area. Each line contains the pixel values in the four spectral bands (converted to ASCII) of each of the 9 pixels in the 3x3 neighborhood and a number indicating the classification label of the central pixel. In each line of data the four spectral values for the top-left pixel are given first followed by the four spectral values for the top-middle pixel and then those for the top-right pixel, and so on with the pixels read out in sequence left-to-right and top-to-bottom. Thus, the four spectral values for the central pixel are given by attributes 17,18,19 and 20. If you like you can use only these four attributes, while ignoring the others. This avoids the problem which arises when a 3x3 neighborhood straddles a boundary.

All results from Statlog book, except GM - GhostMiner calculations, W. Duch.

N Description Train Test
1 red soil
1072 (24.17%)
461 (23.05%)
2 cotton crop
479 (10.80%)
224 (11.20%)
3 grey soil
961 (21.67%)
397 (19.85%)
4 damp grey soil
415 (09.36%)
211 (10.55%)
5 veg. Stubble
470 (10.60%)
237 (11.85%)
6 Mixture class
0
0
7 very damp grey soil
1038 (23.40%) 
470 (23.50%)

Machine Learning, Neural and Statistical Classification, D. Michie, D.J. Spiegelhalter, C.C. Taylor (eds), Stalog project book!


Ionosphere

351 data records, with class division 224 (63.8%) + 126 (35.9%). Usually first 200 vectors are taken for training, and last 151 for the test, but this is very unbalanced: in the training set 101 (50.5%) and 99 (49.5%) are from 1/2 class, in the test set 123 (82%) and 27 (18%) are from class 1/2.
34 attributes, but f2=0 always and should be removed; f1 is binary, the remaining 32 attributes are continuous.
2 classes - different types of radar signals reflected from ionosphere.

Some vectors: 8, 18, 20, 22, 24, 30, 38, 52, 76, 78, 80, 82, 103, 163, 169, 171, 183, 187, 189, 191, 201, 215, 219, 221, 223, 225, 227, 229, 231, 233, 249, are either binary 0, 1 or have only 3 values -1, 0, +1.
For example, vector 169 has only one component = 1, all others are 0.

Method
Accuracy %
Reference
3-NN + simplex 98.7 Our own weighted kNN
VSS 2 epochs 96.7 MLP with numerical gradient
3-NN 96.7 KG, GM with or without weights
IB3 96.7 Aha, 5 errors on test
1-NN, Manhattan 96.0 GM kNN (our)
MLP+BP 96.0 Sigillito
SVM Gaussian 94.9± 2.6 GM (our), defaults, similar for C=1-100
C4.5 94.9 Hamilton
3-NN Canberra 94.7 GM kNN (our)
RIAC 94.6 Hamilton
C4 (no windowing) 94.0 Aha
C4.5 93.7 Bennet and Blue
SVM 93.2 Bennet and Blue
Non-lin perceptron92.0 Sigillito
FSM + rotation 92.8 our
1-NN, Euclidean 92.1 Aha, GM kNN (our)
DB-CART 91.3 Shang, Breiman
Linear perceptron 90.7 Sigillito
OC1 DT 89.5 Bennet and Blue
CART 88.9 Shang, Breiman
SVM linear 87.1± 3.9 GM (our), defaults
GTO DT 86.0 Bennet and Blue

Perceptron+MLP results:
Sigillito, V. G., Wing, S. P., Hutton, L. V., & Baker, K. B. (1989)  Classification of radar returns from the ionosphere using neural networks. Johns Hopkins APL Technical Digest, 10, 262-266.
N. Shang, L. Breiman, ICONIP'96, p.133
David Aha: k-NN+C4+IB3, from Aha, D. W., & Kibler, D. (1989). Noise-tolerant instance-based learning algorithms. In Proceedings of the Eleventh International Joint Conference on Artificial Intelligence (pp. 794-799). Detroit, MI: Morgan Kaufmann.
IB3 parameter settings: 70% and 80% for acceptance and dropping respectively.
RIAC, C4.5 from: H.J. Hamilton, N. Shan, N. Cercone, RIAC: a rule induction algorithm based on approximate classification, Tech. Rep. CS 96-06, Regina University 1996.
K.P. Bennett, J. Blue, A Support Vector Machine Approach to Decision Trees, R.P.I Math Report No. 97-100, Rensselaer Polytechnic Institute, Troy, NY, 1997

Training/test division is not too good in this case, distributions are a bit different.
In 10xCV results are:

Method
Accuracy %
Reference
SFM+G+G(WX) ??± 2.6 GM (our), C=1, s=2-5
kNN auto+WX+G(WX) ??.4± 3.6 GM (our)
SVM Gaussian 94.8± 3.5 GM (our), C=1, s=0.1, 10x10CV, std
SVM Gaussian 94.6± 4.3 GM (our), C=1, s=2-5
VSS-MKNN 91.5± 4.3 MK, 12 neurons (similar 8-17)
SVM lin 89.5± 3.8 GM (our), C=1, s=2-5
SSV tree 87.8± 4.5 GM (our), default
1-NN 85.8± 4.9 GM std, Euclid
3-NN 84.0± 5.4 GM std, Euclid

VSS is an MLP with search, implemented by Mirek Kordos, used with 3 epochs; neurons may be sigmoidal or step-wise (64 values).
Maszczyk T, Duch W, Support Feature Machine, WCCI 2010 (submitted).


Sonar: Mines vs Rocks

208 cases, 60 continuous attributes, 2 classes, 111 metal, 97 rock.
From the CMU benchmark repository

This dataset has been used in two kinds of experiments:
1. The "aspect-angle independent" experiments use all 208 cases with 13-fold crossvalidation, averaged over 10 runs to get std.
2. The "angle independent experiments" use training / test sets with 104 vectors each. Class distribution in training is 49 + 55, in test 62 + 42.

Estimation of L1O on the whole dataset (Opper and Winther) give 78.2% only; is the test so easy? Some of this results were made without standardization of the data, which is here very important!

The "angle independent experiments" with training / test sets.

Method
Train %
Test %
Reference
1-NN, 5D from MDS, Euclid, std   97.1 our, GM (WD)
1-NN, Manhattan std   97.1 our, GM (WD)
1-NN, Euclid std   96.2 our, GM (WD)
TAP MFT Bayesian -- 92.3 Opper, Winther
Naive MFT Bayesian -- 90.4 Opper, Winther
SVM -- 90.4 Opper, Winther
MLP+BP, 12 hidden, best MLP -- 90.4 Gorman, Sejnowski
1-NN, Manhattan raw   92.3 our, GM (WD)
1-NN, Euclid raw   91.3 our, GM (WD)
FSM - methodology ?    83.6 our (RA)

The "angle dependent experiments" with 13 CV on all data.

1-NN Euclid on 5D MDS input   88.0± 7.8 our GM (WD) av 10x10CV
1-NN Euclidean, std data   87.7± 6.8 our GM (WD), 10x10CV av
1-NN Manhattan, std data   86.7± 8.6 our GM (WD) av 10x10CV
MLP+BP, 12 hidden 99.8± 0.1 84.7± 5.7 Gorman, Sejnowski
1-NN Manhattan, raw data   84.8± 8.3 our GM (WD) av 10x10CV
MLP+BP, 24 hidden 99.8± 0.1 84.5± 5.7 Gorman, Sejnowski
MLP+BP, 6 hidden 99.7± 0.2 83.5± 5.6 Gorman, Sejnowski
SVM linear, C=0.1   82.7± 8.5 our GM (WD), std data
1-NN Euclidean, raw data   82.4± 10.7 our GM (WD) av 10x10CV
SVM Gauss, C=1, s=0.1   77.4± 10.1 our GM (WD), std data
SVM linear, C=1   76.9± 11.9 our GM (WD), raw data
SVM linear, C=1   76.0± 9.8 our GM (WD), std data
------ ------ ------------
DB-CART, 10xCV    81.8 Shang, Breiman
CART, 10xCV    67.9 Shang, Breiman

M. Opper and O. Winther, Gaussian Processes and SVM: Mean Field Results and Leave-One-Out. In: Advances in Large Margin Classifiers, Eds. A. J. Smola, P. Bartlett, B. Sch�lkopf, D. Schuurmans, MIT Press, 311-326, 2000; same methodology as Gorman with Sejnowski.

N. Shang, L. Breiman, ICONIP'96, p.133, 10xCV

Gorman, R. P., and Sejnowski, T. J. (1988).  "Analysis of Hidden Units in a Layered Network Trained to Classify Sonar Targets", Neural Networks 1, pp. 75-89,  13xCV

Our results: kNN results from 10xCV and from 13xCV are quite similar, so Shang and Breiman should not differ much from 13 CV.

WD Leave-one-out (L1O) estimations on std data:
L1O with k=1, Euclidean distance, for all data gives 87.50%, other k and distance function do not give significant improvement.
SVM linear, C=1, L1O 75.0%, for Gaussian kernel, C=1, L1O is 78.8%

Other L1O results taken from C. Domeniconi, J. Peng, D. Gunopulos, "An adaptive metric for pattern classification".

Discriminant Adaptive NN, DANN   92.3
Adaptive metric NN   90.9
kNN   87.5
SVM Gauss C=1   78.8
C4.5   76.9
SVM linear C=1   75.0


Vovel

528 training, 462 test cases, 10 continuous attributes, 11 classes
From the UCI benchmark repository.

Speaker independent recognition of the eleven steady state vowels of British English using a specified training set of lpc derived log area ratios.

Results on the total set

Method
Train
Test
Reference
CART-DB, 10xCV on total set !!!   90.0 Shang, Breiman
CART, 10xCV on total set   78.2 Shang, Breiman

Method
Train
Test
Reference
Square node network, 88 units   54.8 UCI
Gaussian node network, 528 units   54.6 UCI
1-NN, Euclides, raw 99.24 56.3 WD/KG
Radial Basis Function, 528 units   53.5 UCI
Gaussian node network, 88 units   53.5 UCI
FSM Gauss, 10CV na treningowym 92.60 51.94 our (RA)
Square node network, 22   51.1 UCI
Multi-layer perceptron, 88 hidden   50.6 UCI
Modified Kanerva Model, 528 units   50.0 UCI
Radial Basis Function, 88 units   47.6 UCI
Single-layer perceptron, 88 hidden   33.3 UCI

N. Shang, L. Breiman, ICONIP'96, p.133, made 10xCv instead of using the test set.


Telugu Vovel

871 patterns, 6 overlapping vowel classes (Indian Telugu vowel sounds), 3 features (formant frequencies).

Method
Test
Reference
10xCV tests below
3-NN, Manhattan 87.8± 4.0 Kosice
3-NN, Canberra 87.8± 4.2 WD/GM
FSM, 65 Gaussian nodes 87.4± 4.5 Kosice
3-NN, Euclid 87.3± 3.9 WD/GM
SSV dec. tree, 22 rules 86.0± ?? Kosice
SVM Gauss opt C~1000, s~1 85.0± 4.0 WD, Ghostminer
SVM Gauss C=1000, s=1 83.5± 4.1 WD, Ghostminer
SVM, Gauss, C=1, s=0.1 76.6± 2.5 WD, Ghostminer
2xCV tests below
3-NN, Euclidean 86.1± 0.6 Kosice
FSM, 40 Gaussian nodes 85.2± 1.2 Kosice
MLP 84.6 Pal
Fuzzy MLP 84.2 Pal
SSV dec. tree, beam search 83.3± 0.9 Kosice
SSV dec. tree, best first 83.0± 1.0 Kosice
Bayes Classifier 79.2 Pal
Fuzzy SOM 73.5 Pal

Parameters in SVM were optimized, that is in each CV different parameters were used, so only approximate value can be quoted. If they are fixed to C=1000, s=1 results are a bit worse.

Papers using this data:


Wine data

Source: UCI, described in Forina, M. et al, PARVUS - An Extendible Package for Data Exploration, Classification and Correlation. Institute of Pharmaceutical and Food Analysis and Technologies, Via Brigata Salerno, 16147 Genoa, Italy.
These data are the results of a chemical analysis of wines grown in the same region in Italy but derived from three different cultivars. The analysis determined the quantities of 13 constituents found in each of the three types of wines.

Class distribution: 178 cases = [59, 71, 48] in Class 1-3;
13 continuous attributes: alcohol, malic-acid, ash, alkalinity, magnesium, phenols, flavanoids, nonanthocyanins, proanthocyanins, color, hue, OD280/D315, proline.

Method
Test
Reference
Leave-one-out test results
RDA 100 [1]
QDA 99.4 [1]
LDA 98.9 [1]
kNN, Manhattan, k=1 98.7 GM-WD, std data
1NN 96.1 [1] z-transformed data
kNN, Euclidean, k=1 95.5 GM-WD, std data
kNN, Chebyshev, k=1 93.3 GM-WD, std data
10xCV tests below
kNN, Manhattan, auto k=1-10 98.9± 2.3 GM-WD, 2D data, after MDS/PCA
IncNet, 10CV, def, Gauss 98.9± 2.4 GM-WD, std data, up to 3 neurons
10 CV SSV, opt prune 98.3± 2.7 GM-WD, 2D data, after MDS/PCA
10 CV SSV, node count 7 98.3± 2.7 GM-WD, 2D data, after MDS/PCA
kNN, Euclidean, k=1 97.8± 2.8 GM-WD, 2D data, after MDS/PCA
kNN, Manhattan, k=1 97.8± 2.9 GM-WD, 2D data, after MDS/PCA
kNN, Manhattan, auto k=1-10 97.8± 3.9 GM-WD
kNN, Euclidean, k=3, weighted features 97.8± 4.7 GM-WD
IncNet, 10CV, def, bicentral 97.2± 2.9 GM-WD, std data, up to 3 neurons
kNN, Euclidean, auto k=1-10 97.2± 4.0 GM-WD
10 CV SSV, opt node 97.2± 5.4 GM-WD, 2D data, after MDS/PCA
FSM a=.99, def 96.1± 3.7 GM-WD, 2D data, after MDS/PCA
FSM 10CV, Gauss, a=.999 96.1± 4.7 GM-WD, std data, 8-11 neurons
FSM 10CV, triang, a=.99 96.1± 5.9 GM-WD, raw data
kNN, Euclidean, k=1 95.5± 4.4 GM-WD
10 CV SSV, opt node, BFS 92.8± 3.7 GM-WD
10 CV SSV, opt node, BS 91.6± 6.5 GM-WD
10 CV SSV, opt prune, BFS 90.4± 6.1 GM-WD

UCI past usage:
[1] S. Aeberhard, D. Coomans and O. de Vel, Comparison of Classifiers in High Dimensional Settings, Tech. Rep. no. 92-02, (1992), Dept. of Computer Science and Dept. of Mathematics and Statistics, James Cook University of North Queensland (submitted to Technometrics).
[2] S. Aeberhard, D. Coomans and O. de Vel, "The classification performance of RDA" Tech. Rep. no. 92-01, (1992), Dept. of Computer Science and Dept. of Mathematics and Statistics, James Cook University of North Queensland (submitted to Journal of Chemometrics).


Other Data


Glass identification

Shang, Breiman CART 71.4% accuracy,  DB-CART 70.6%.

Leave-one-out results taken from C. Domeniconi, J. Peng, D. Gunopulos, "An adaptive metric for pattern classification".

Adaptive metric NN   75.2
Discriminant Adaptive NN, DANN   72.9
kNN   72.0
C4.5   68.2


DNA-Primate splice-junction gene sequences, with associated imperfect domain theory.

Stalog Data: splice junctions are points on a DNA sequence at which `superfluous' DNA is removed during the process of protein creation in higher organisms. The problem posed in this dataset is to recognize, given a sequence of DNA, the boundaries between exons (the parts of the DNA sequence retained after splicing) and introns (the parts of the DNA sequence that are spliced out).
This problem consists of two subtasks: recognizing exon/intron boundaries (referred to as EI sites), and recognizing intron/exon boundaries (IE sites). (In the biological community, IE borders are referred to a "acceptors'' while EI borders are referred to as "donors''.)

Number of Instances: 3190. Class distribution:

Class
Train
Test
1
464 (23.20%) 303 (25.55%)
2
485 (24.25%) 280 (23.61%)
3
1051 (52.55%) 603 (50.84%)
All
2000 (100%) 1186 (100%)

Number of attributes: originally 60 attributes {a,c,t,g}, usually converted to 180 binary indicator variables {(0,0,0), (0,0,1), (0,1,0), (1,0,0)}, or 240 binary variables.
Much better performance is generally observed if attributes closest to the junction are used (middle). In the StatLog version (180 variables), this means using attributes A61 to A120 only.

Method
% in training
% on test
Time train
Time test
RBF, 720 nodes 98.595.9   
k-NN GM, p(X|C), k=6, Euclid, raw96.895.5 0 short
Dipol92 99.395.2213 10
Alloc80 93.794.314394 --
QuaDisc 100.094.1 1581 809
LDA, Discrim 96.6 94.1 929 31
FSM, 8 Gaussians, 180 binary 95.4 94.0
Log DA, Disc 99.2 93.9 5057 76
SSV Tree, p(X|C), opt node, 4CV 94.8 93.4 short short
Naive Bayes 94.8 93.2 52 15
Castle, middle 90 binary var 93.9 92.8 397 225
IndCart, 180 binary96.092.7523516
C4.5, on 60 features96.092.4 9 2
CART, middle 90 binary var92.591.5 6159
MLP+BP 98.691.24094 9
Bayesian Tree 99.990.5 82 11
CN2 99.8 90.5 869 74
New ID 100.0 90.0 698 1
Ac2 100.0 90.0 12378 87
Smart 96.688.579676 16
Cal5 89.6 86.9 1616 8
Itrule 86.9 86.5 2212 6
k-NN 91.185.42428 882
Kohonen 89.6 66.1 - -
Default, majority 52.5 50.8

kNN GM - GhostMiner version of kNN (our group)
SSV Decision Tree - our results


Links to other Duch-Lab projects, many talks and to other papers on various subjects.

Maintained by Wlodzislaw Duch.