Logical rules extracted from data

Computational Intelligence Laboratory | Department of Informatics | Nicolaus Copernicus University

Look at datasets to find more results obtained using different classifiers.

You can edit this page yourself on Wikispaces here.


MedicalAppendictis | Breast cancer (Wisconsin) | Cleveland heart disease | Diabetes | Hepatitis | Hypothyroid | Ljubljana cancer | Statlog Heart |

OtherIonosphere | Iris flowers | Mushrooms | Monk 1 | Monk 2 | Monk 3 | Satellite image dataset (Statlog version) | NASA Shuttle | Sonar | Vovel |

Confusion matrices: column labels refer to the true class, row labels to the assigned class, for medical data healthy cases are first.


Appendicitis.

106 vectors, 8 attributes, two classes (88 acute +18 other),
obtained from Shalom Weiss.
Attribute names: WBC1, MNEP, MNEA, MBAP, MBAA, HNEP, HNEA

Rules found using PVM
Accuracy 89.6% in leave-one-out, 91.5% overall

C1: MNEA > 6600  OR  MBPA > 11
C2: ELSE

Rules found using C-MLP2LN, no optimization
Accuracy 89.6% in leave-one-out, 91.5% overall

C1: MNEA > 6650 OR MBPA > 12
C2: ELSE

Second neuron gets 3 more cases correctly using 2 rules, but we treat it as noise rather than an interesting rare case.
Using L-units another set of rules is generated with the overall 89.6% accuracy (11 errors).

C1: WBC1 > 8400 OR MBPA >= 42
C2: ELSE

Confusion matrix: Append.Other
Appendicitis   8410
Other 111

C4.5 generates 3 rules with overall 91.5% accuracy. It may also generate 7 rules for 97.2% accuraccy but this is strong overfitting, with each rule classifying only 1-2 cases.

Summary of accuracy (%) and references
Method
Accuracy
Reference
PVM
89.6
Weiss, Kapouleas
C-MLP2LN
89.6±?
our
RIAC rule induction
86.9
Hamilton et.al
CART, C4.5 (dec. trees)
84.9
Weiss, Kapouleas
FSM rules
???
our (RA)

S.M. Weiss, I. Kapouleas, "An empirical comparison of pattern recognition, neural nets and machine learning classification methods", in: J.W. Shavlik and T.G. Dietterich, Readings in Machine Learning, Morgan Kauffman Publ, CA 1990
H.J. Hamilton, N. Shan, N. Cercone, RIAC: a rule induction algorithm based on approximate classification, Tech. Rep. CS 96-06, Regina University 1996.
Duch W, Adamczak R, Gr¹bczewski K, A new methodology of extraction, optimization and application of crisp and fuzzy logical rules. IEEE Transactions on Neural Networks 12 (2001) 277-306


Wisconsin breast cancer.

From UCI repository, 699 cases, 9 attributes (1-10 integer values),
two classes, 458 benign (65.5%) & 241 malignant (34.5%).
For 16 instances one attribute is missing.

Attributes: from original database remove F0, id. number (warining: in some papers original feature numbers are given).
F1: Clump Thickness 1 - 10
F2: Uniformity of Cell Size 1 - 10
F3: Uniformity of Cell Shape 1 - 10
F4: Marginal Adhesion 1 - 10
F5: Single Epithelial Cell Size 1 - 10
F6: Bare Nuclei 1 - 10
F7: Bland Chromatin 1 - 10
F8: Normal Nucleoli 1 - 10
F9: Mitoses 1 - 10

C-MLP2LN results:

Rules S1: Single rule: IF f2 = [1,2] then benign else malignant

            Original class.
Calculated
        1      417     12
        2      41     229

Accuracy: 646 correct (92.42%), 53 errors; Sensitivity=0.9720, Specificity=0.8481

Rules S2: 5 rules for malignant, overall accuracy of 96%.

R1f1<6 &f3<4 &f6<2 &f7<5 100%
R2 f1<6 & f4<4 & f6<2 & f7<5 100%
R3 f1<6 & f3<4 & f4<4 & f6<2 100%
R4 f1=[6,8] & f3<4 & f4<4 & f6<2 & f7<5 100%
R5 f1<6 & f3<4 & f4<4 & f6=[2,7] & f7<5     92.3% (36 correct, 3 errors)
ELSE benign

3 benign cases wrongly classified as malignant and 25 malignant cases wrongly classified as benign.

Rules S3: 4 malignant rules, overall accuracy of 97.7%, confusion matrix
Confusion matrix: BenignMalignant
Benign 447 5
Malignant 11236

R1f3<3 &f4<4 &f6<6 &f9=1 99.5% (2 err)
R2f1<7 &f4<4 &f6<6 &f9=1 99.8% (5 err)
R3f1<7 &f3<3 &f6<6 &f9=1 99.5% (2 err)
R4f1<7 &f3<3 &f4<4 &f6<6 99.5% (2 err)
ELSE benign

3 benign cases wrongly classified as malignant and 25 malignant cases wrongly classified as benign.

Rules S4: Optimized rules: 1 benign vector classified as malignant (rule 1 and rule 5, the same vector).
ELSE condition makes 6 errors, giving 99.00% overall accuracy:

R1 f1<9 & f4<4 & f6<2 & f7<5 100%
R2 f1<10 & f3<4 & f4<4 & f6<3 100%
R3 f1<7 & f3<9 & f4<3 & f6=[4,9] & f7<4    100%
R4 f1=[3,4] & f3<9 & f4<10 & f6<6 & f7<8 99.8%
R5 f1<6 & f3<3 & f7<8 99.8%
ELSE benign (6 errors)

Other solutions: 100% reliable rules rejecting 51 cases (7.3%) of all vectors.
For malignant class these rules are:

R1 f1<9 & f3<4 & f6<3 & f7<6 100%
R2 f1<5 & f4<8 & f6<5 & f7<10     100%
R3 f1<4 & f3<2 & f4<3 & f6<7 100%
R4 f1<10 & f4<10 & f6=[1,5] & f7<2 100%

For the benign cases rules are: NOT (R5 OR R6 OR R7 OR R8), where:

R5 f1<8 & f3<5 & f7<4      100%
R6 f1<9 & f4<6 & f6<9 & f7<5 100%
R7 f1<9 & f3<6 & f4<8 & f6<9 100%
R8 f1=6 & f3<10 & f4<10 & f6<2 & f7<9    100%

Summary of results (rules discovered for the whole data set).
 
Method
Accuracy %
Reference Rules
C-MLP2LN
 99.0
   
FSM
98.3
our (RA)  
C4.5 (decision tree)
96.0
Hamilton et.al  
RIAC (prob. inductive) 
95.0
Hamilton et.al  

Duch W, Adamczak R, Gr¹bczewski K, ¯al G, Hybrid neural-global minimization method of logical rule extraction. Journal of Advanced Computational Intelligence 3 (5): 348-356.
Duch W, Adamczak R, Gr¹bczewski K, A new methodology of extraction, optimization and application of crisp and fuzzy logical rules. IEEE Transactions on Neural Networks 12 (2001) 277-306
H.J. Hamilton, N. Shan, N. Cercone, RIAC: a rule induction algorithm based on approximate classification, Tech. Rep. CS 96-06, Regina University 1996.

Papers on a smaller (569 cases) Wisconsin breast cancer dataset are on the O.L. Mangasarian page.


Cancer (Ljubljana data)

From UCI repository (restricted):  286 instances, 201 no-recurrence-events (70.3%), 85 recurrence-events (29.7%);
9 attributes, between 2-13 values each, 9 missing values

Rules found using PVM: 70% for training, 30% for test
Accuracy 77.4% train, 77.1% test

C1: Involved Nodes > 0 & Degree_malig = 3
C2: ELSE

C-MLP2LN more accurate rules: 78% overall accuracy
R1: deg_malig=3 & breast=left & node_caps=yes
R2: (deg_malig=3 OR breast=left) & NOT inv_nodes=[0,2] & NOT age=[50,59]

Method
Accuracy, % test
Reference
C-MLP2LN
77.4
our
CART
77.1
Weiss, Kapouleas
PVM
77.1
Weiss, Kapouleas
AQ15
66-72
Michalski et.al
Inductive
65-72 
Clark, Niblett

Michalski,R.S., Mozetic,I., Hong,J., & Lavrac,N. (1986). The Multi-Purpose Incremental Learning System AQ15 and its Testing Application to Three Medical Domains.  In Proceedings of the Fifth National Conference on Artificial Intelligence, 1041-1045, Philadelphia, PA: Morgan Kaufmann.

Clark,P. & Niblett,T. (1987). Induction in Noisy Domains.  In: Progress in Machine Learning (from the Proceedings of the 2nd European Working Session on Learning), 11-30, Bled,  Yugoslavia: Sigma Press.

CART & PVM 77.4% train, 77.1% test; S.M. Weiss, I. Kapouleas. An empirical comparison of pattern recognition, neural nets and machine learning classification methods, in: J.W. Shavlik and T.G. Dietterich, Readings in Machine Learning, Morgan Kauffman Publ, CA 1990

Duch W, Adamczak R, Gr¹bczewski K (1997) Extraction of crisp logical rules using constrained backpropagation networks, International Conference on Artificial Neural Networks (ICNN'97), Houston, 9-12.6.1997, pp. 2384-2389
Duch W, Adamczak R, Gr¹bczewski K, A new methodology of extraction, optimization and application of crisp and fuzzy logical rules. IEEE Transactions on Neural Networks 12 (2001) 277-306


Hepatitis.

From UCI repository, 155 vectors, 19 attributes, 13 binary, other integer, class is first.
Two classes, 32 die (20.6%), 123 live (79.4%)
Missing values (here F1=class): F4(1), F6(1), F7(1), F8(1), F9(10), F10(11), F11(5), F12(5), F13(5), F14(5), F15(6), F16(29), F17(4), F18(16), F19(67)

C-MLP2LN rule, overall accuracy 88.4%, using F2=age, F13=Ascites, F15=bilirubin, F20=histology,

R1: age > 52 & bilirubin > 3.5
R2: histology=yes & ascites=no & age = [30,51]

C-MLP2LN, lignuistic variables from L-units, overall accuracy 96.1%, looks good but uses F19=protime which has missing values in almost half of the cases.

age >= 30 & sex=male & antivirals=no & protime <= 50

Confusion matrix: LiveDie
Live 120 3
Die 329

Method
Accuracy %
Reference
C-MLP2LN
???
Our
FSM
90
Our
PVM
??
 
CART (decision tree)
82.7
 

Duch W, Adamczak R, Gr¹bczewski K, A new methodology of extraction, optimization and application of crisp and fuzzy logical rules. IEEE Transactions on Neural Networks 12 (2001) 277-306


Cleveland heart disease.

From UCI repository, 303 cases, 13 attributes (4 cont, 9 nominal), many missing values.
2 (no, yes) or 5 classes (no, degree 1, 2, 3, 4).
Class distribution: 164 (54.1%) no, 55+36+35+13 yes (45.9%) with disease degree 1-4.
 
C-MLP2LN simplified rules 85.5% overall accuracy. Rules for healthy class:

R1:  (thal=0 OR thal=1) &  ca=0.0     (88.5%)
R2:  (thal=0 OR ca=0.0) & cp NOT 2 (85.2%)
ELSE  sick  (89.2%)

Method
Accuracy %
Reference
C-MLP2LN
82.5
RA, estimated?
FSM
82.2
Rafa³ Adamczak


Statlog Heart disease.

13 attributes (extracted from 75), no missing values.
270=150+120 observations selected from the 303 cases (Cleveland Heart).

Cost Matrix =
Absence
Presence
0
1
5
0

Results without risk matrix
Method
Accuracy %
Reference
K*
76.7
WEKA, RA
C-MLP2LN
???
Our
1R
71.4
WEKA, RA
T2
68.1
WEKA, RA
FOIL
64.0
WEKA, RA
RBF
60.0
ToolDiag, RA
InductH
58.5
WEKA, RA

Duch W, Adamczak R, Gr¹bczewski K, A new methodology of extraction, optimization and application of crisp and fuzzy logical rules. IEEE Transactions on Neural Networks 12 (2001) 277-306


Diabetes.

From UCI repository, dataset "Pima Indian diabetes":
2 classes, 8 attributes, 768 instances, 500 (65.1%) healthy, 268 (34.9%) diabetes.

F2 is "Plasma glucose concentration (2 hours oral glucose tolerance) test"
F6 is "Body mass index (weight in kg/(height in m)^2)"

1 rule from SSV, overall accuracy 74.9%, Sensitivity=45.5, Spec.=90.6

IF F#2 > 144.5 then diabetes, else healthy

Rule from C-MLP2LN with L-units, overall accuracy 75%

IF ( F2<=151 AND F6<=47 ) THEN healthy, else diabetes

2 rules from SSV, overall accuracy 76.2%, Sensitivity=60.8, Spec.=84.4

IF F#2 > 144.5 OR (F#2 > 123.5 AND F#6 > 32.55) then diabetes, else healthy

Estimation of accuracy (4 leaves in SSV): average of 10 runs, each 10xCV, accuracy 75.2 ±0.6

Confusion matrix: HealthyDiabetes
Healthy 467159
Diabetes 33109

Results from crossvalidation.

Method
Accuracy % 
Reference
SSV 5 nodes/BF
75.3±4.8
WD, Ghostminer
SSV opt nodes/3CV/BF
74.7±3.5
WD, Ghostminer
SSV opt prune/3CV/BS
74.6±3.3
WD, Ghostminer
SSV opt prune/3CV/BF
74.0±4.1
WD, Ghostminer
SSV opt nodes/3CV/BS
72.9±4.3
WD, Ghostminer
SSV 5 nodes/BF
74.9±4.8
WD, Ghostminer
SSV 3 nodes/BF
74.6±5.2
WD, Ghostminer
CART
74.5±?
Stalog
DB-CART
74.4±?
Shang & Breiman
ASR
74.3±?
Ster & Dobnikar
CART
72.8±?
Ster & Dobnikar
C4.5
73.0±?
Stalog
Default
65.1±?
 
C-MLP2LN, overall
75.0±?
Our, 4/99

Duch W, Adamczak R, Gr¹bczewski K, A new methodology of extraction, optimization and application of crisp and fuzzy logical rules. IEEE Transactions on Neural Networks 12 (2001) 277-306


Hypothyroid.

Thyroid, From UCI repository, dataset "ann-train.data":
3772 learning and 3428 testing examples;
Training: 93+191+3488 or 2.47%, 5.06%, 92.47%
Test: 73+177+3178 or 2.13%, 5.16%, 92.71%
21 attributes (15 binary, 6 continuous); 3 classes

C-MLP2LN rules (all values of continuous features are multiplied here by 1000)

Initial rules:

primary hypothyroid:  TSH>6.1  &  FTI <65
compensated       :  TSH > 6 & TT4<149 & On_Tyroxin=FALSE & FTI>64 & surgery=False
ELSE normal

Optimized more accurate rules: 4 errors on the training set (99.89%), 22 errors on the test set (99.36%)

primary hypothyroid:  TSH>30.48  &  FTI <64.27  (97.06%)
primary hypothyroid:  TSH=[6.02,29.53]  &  FTI <64.27 & T3< 23.22 (100%)
compensated            :  TSH > 6.02 & FTI>[64.27,186.71] & TT4=[50, 150.5) & On_Tyroxin=no & surgery=no  (98.96%)
no hypothyroid         :  ELSE   (100%)

 
Method
 % training
 % test 
Reference
C-MLP2LN rules + ASA 
99.9
  99.36
Rafa³/Krzysztof/Grzegorz
CART
99.8
  99.36
Weiss
PVM
99.8
  99.33
Weiss
       
C-MLP2LN rules 
99.7
99.0
Rafa³/Krzysztof
       

3 crisp logical rules using TSH, FTI, T3, on_thyroxine, thyroid_surgery, TT4  give 99.3% of accuracy on the test set.
Duch W, Adamczak R, Gr¹bczewski K, A new methodology of extraction, optimization and application of crisp and fuzzy logical rules. IEEE Transactions on Neural Networks 12 (2001) 277-306


Other, non-medical data


Iris flowers

150 vectors, 50 in each class: setosa, virginica, versicolor
PL=x3=Petal Length;  PW=x4=Petal Width

PVM Rules: accuracy 98% in leave-one-out and overall
 
Setosa Petal Length <3
Virginica Petal length >4.9 OR Petal Width >1.6
Versicolor ELSE

C-MLP2LN rules:

7 errors, overall 95.3% accuracy
Setosa PL <2.5 100%
Virginica PL >4.8  92%
Versicolor ELSE  94%

Higher accuracy: overall 98%

Setosa PL <2.9 100%
Virginica PL>4.95 OR PW>1.65  94%
Versicolor PL=[2.9,4.95] & PW=[0.9,1.65] 100%

100% reliable rules reject 11 vectors, 8 virginica and 3 versicolor:

Setosa PL <2.9 100%
Virginica PL>5.25 OR PW>1.85 100%
Versicolor PL=[2.9,4.9] & PW<1.7 100%

Summary:
Method Accuracy Reference
PVM 1 rule 97.3 Weiss
CART (dec. tree) 96.0 Weiss
FuNN 95.7 Kasabov
NEFCLASS 96.7 Nauck et.al.
FuNe-I 96.7 Halgamuge
PVM 2 rules 98.0 Weiss, optimal result, corresponds to about 96% in CV tests
C-MLP2LN 98.0 Duch et.al.
SSV 98.0 Duch et.al.
Grobian (rough) 100 Browne; overfitting

References:
S.M. Weiss, I. Kapouleas, "An empirical comparison of pattern recognition, neural nets and machine learning classification methods", in: J.W. Shavlik and T.G. Dietterich, Readings in Machine Learning, Morgan Kauffman Publ, CA 1990
N. Kasabov, Connectionist methods for fuzzy rules extraction, reasoning and adaptation. In: Proc. of the Int. Conf. on Fuzzy Systems, Neural Networks and Soft Computing, Iizuka, Japan, World Scientific 1996, pp. 74-77
Duch W, Adamczak R, Gr¹bczewski K, A new methodology of extraction, optimization and application of crisp and fuzzy logical rules. IEEE Transactions on Neural Networks 12 (2001) 277-306
C. Browne, I. Duntsch, G. Gediga, IRIS revisited: A comparison of discriminant and enhanced rough set data analysis. In: L. Polkowski and A. Skowron, eds. Rough sets in knowledge discovery, vol. 2. Physica Verlag, Heidelberg, 1998, pp. 345-368
D. Nauck, U. Nauck and R. Kruse, Generating Classification Rules with the Neuro-Fuzzy System NEFCLASS. Proc. Biennial Conf. of the North American Fuzzy Information Processing Society (NAFIPS'96), Berkeley, 1996
S.K. Halgamuge and M. Glesner, Neural networks in designing fuzzy systems for real world applications. Fuzzy Sets and Systems 65:1-12, 1994


Mushrooms

8124 instances, 4208 (51.8%) edible and 3916 (48.2%) poisonous;
22 attributes (all symbolic): cap shape (6, e.g.. bell, conical,flat...), cap surface (4), cap color (10), bruises (2), odor (9), gill attachment (4), gill spacing (3), gill size (2), gill color (12), stalk shape (2), stalk root (7, many missing values), surface above the ring (4), surface below the ring (4), color above the ring (9), color below the ring (9), veil type (2), veil color (4), ring number (3), spore print color (9), population (6), habitat (7).
Together 118 logical input values.
2480 missing values for attribute 11

C-MLP2LN rules:

Disjunctive rules for poisonous mushrooms, from most general to most specific:

No. Rule Accuracy
1
odor=NOT(almond.OR.anise.OR.none) 98.52%, 120 poisonous cases missed 
2
spore-print-color=green 99.41%, 48 cases missed
3
odor=none.AND.stalk-surface-below-ring=scaly. 
AND.(stalk-color-above-ring=NOT.brown)
99.90%, 8 cases missed
4
habitat=leaves.AND.cap-color=white
100% accuracy

Alternative R4' rule:  population=clustered.AND.cap_color=white

These rule involve 6 attributes (out of 22).  Rule 1 may be replaced by:

odor = creosote.OR.fishy.OR.foul.OR.musty.OR.pungent.OR.spicy

Rules for edible mushrooms are obtained as negation of the rules given above, for example rule:

Re1: odor=(almond.OR.anise.OR.none).AND.spore-print-color=NOT.green

makes 48 errors, giving 99.41% accuracy on the whole dataset.
Several slightly more complex variations on these rules exist, involving other attributes, such as gill_size, gill_spacing, stalk_surface_above_ring, but the rules given above are the simplest found so far.

Other methods:

[1] BRAINNE: 300 rules, > 8000 antecedents, 91%
[2] STAGGER: asymptoted to 95% classification accuracy after reviewing 1000 instances.
[3] HILLARY algorithm, about 95%

References:

Duch W, Adamczak R, Grabczewski K (1996) Extraction of logical rules from training data using backpropagation networks, in: Proc. of the The 1st Online Workshop on Soft Computing, 19-30.Aug.1996, pp. 25-30, available on-line at: http://www.bioele.nuee.nagoya-u.ac.jp/wsc1/
Duch W, Adamczak R, Grabczewski K, Ishikawa M, Ueda H, Extraction of crisp logical rules using constrained backpropagation networks - comparison of two new approaches, in: Proc. of the European Symposium on Artificial Neural Networks (ESANN'97), Bruge, Belgium 16-18.4.1997, pp. 109-114
Duch W, Adamczak R, Gr¹bczewski K, A new methodology of extraction, optimization and application of crisp and fuzzy logical rules. IEEE Transactions on Neural Networks 12 (2001) 277-306
Schlimmer,J.S. (1987). Concept Acquisition Through Representational Adjustment (Technical Report 87-19), Doctoral disseration, Department of Information and Computer Science, University of California, Irvine.
Iba,W., Wogulis,J., & Langley,P. (1988).  Trading off Simplicity and Coverage in Incremental Concept Learning. In Proceedings of  the 5th International Conference on Machine Learning, 73-79,  Ann Arbor, Michigan: Morgan Kaufmann.


Monk 1

Original rule is: head shape = body shape OR jacket color = red

C-MLP2LN:
100% accuracy with 4 rules + 2 exception, 14 atomic formulae.

Other systems: see the original paper:
S. Thrun, J. Bala, E. Bloedorn, I. Bratko, B. Cestnik, J. Cheng, K. De Jong, S. Dzeroski, R. Hamann, K. Kaufman, S. Keller, I. Kononenko, J. Kreuziger, R.S. Michalski, T. Mitchell, P. Pachowicz, B. Roger, H. Vafaie, W. Van de Velde, W. Wenzel, J. Wnek, and J. Zhang.
The MONK's problems: A performance comparison of different learning algorithms. Technical Report CMU-CS-91-197, Carnegie Mellon University, Computer Science Department, Pittsburgh, PA, 1991.


Monk 2

Original rule: exactly two of the six features have their first values

C-MLP2LN:
100% accuracy with 16 rules and 8 exceptions, 132 atomic formulae.
Other systems: see the Thrun et al. original paper: The MONK's problems


Monk 3

Original rule:

NOT (body shape = octagon OR jacket color = blue) OR (holding = sward AND jacket color = green)
was corrupted by 5% noise.

C-MLP2LN:
100% accuracy with 33 atomic formulae.
Other systems: see the Thrun et al. original paper: The MONK's problems
Duch W, Adamczak R, Gr¹bczewski K, A new methodology of extraction, optimization and application of crisp and fuzzy logical rules. IEEE Transactions on Neural Networks 12 (2001) 277-306

Comparison of results:

Method Monk-1Monk-2Monk-3Remarks
AQ17-DCI 100 10094.2Michalski
AQ17-HCI 100 93.1100Michalski
AQ17-GA 100 86.8100Michalski
Assistant Pro. 100 81.5100Monk paper
mFOIL 100 69.2100Monk paper
ID5R 79.7 69.295.2Monk paper
IDL 97.2 66.2--Monk paper
ID5R-hat 90.3 65.7--Monk paper
TDIDT 75.7 66.7--Monk paper
ID3 98.6 67.994.4Monk paper
AQR 95.9 79.787.0Monk paper
CLASSWEB 0.1071.8 64.880.8Monk paper
CLASSWEB 0.1565.7 61.685.4Monk paper
CLASSWEB 0.2063.0 57.275.2Monk paper
PRISM 86.3 72.790.3Monk paper
ECOWEB 82.7 71.368.0Monk paper
Neural methods
MLP 100 10093.1Monk paper
MLP+reg. 100 10097.2Monk paper
Cascade correlation 100 10097.2Monk paper
FSM, Gaussians 94.5 79.395.5Duch et.al.
SSV 100 80.697.2Duch et.al.
C-MLP2LN 100 100100Duch et.al.
Other methods
kNN, with VDM metric -- --98.0K. Grudziñski


NASA Shuttle

Training set 43500, test set 14500, 9 attributes, 7 classes
Approximately 80% of the data belongs to class 1.

Rules obtained from FSM, without optimization:

Class 15 rules, train 99.89%, test 99.81% accuracy Correct/False
C1 F9 [-14,0] 
F1 [27,39] and  F2 [-16,13] 
F2 [-22,110] and  F9 [-14,2] 
F2 [-25,7] and  F3 [76,83] and  F7 [36,58]
15043/0 
11612/0 
26014/0 
11648/0
C2 F2 [18,110] and  F4 = 0 and  F5 [-188,12] 
F1 [42,  59] and  F2 [10,50] and  F6 [0,59] and  F7 [19,37] and  F9 [2,24]
25/0 
10/0
C3 F2 [-118,-22] and  F7 [5,71] and  F8 [73,103] and  F9 [16,86] 
F2 [-318,-31] and  F5 [-188,34] 
F2 [-177,-19] and  F5 [36,72] and  F9 [6,54] 
F2 [-42,-17] and  F3 [71,78] and  F6 [-14,24] and  F9 [2,26]
58/0 
82/0 
27/0 
9/5
C4 F1 [51, 67] and  F2 [-18,17] and  F9 [4,70] 
F1 [53, 66] and  F2 [-60,24] and  F4 [-29,30] and  F9 [8,266] 
F2 [-12,18] and  F3 [64, 79] and  F7 [  4, 26] and  F9 [8,  82]
6063/0 
5564/0 
2634/0
C5 F7 [-48, 5] 2458/2
C6 F2 [-4821,-386] and  F5 [-46,34] 9/0

Rules obtained from FSM, without optimization:

Class 19 rules, train 99.94%, test 99.87% accuracy Correct/False
C1 F9 [-14,0] 
F1 [27,44] and  F2 [-20,18] 
F2 [-15,51] and  F9 [-14,2] 
F6 [-13839,-41] and  F9 [-356,10] 
F1 [27,50] and  F2 [-27,8] and  F9 [-14,24]
15043/0 
19316/0 
26003/0 
36/0 
25563/1
C2 F2 [21,110] and  F4 [  0,  0] and  F5 [-188,26] 
F1 [40,  57] and  F2 [14,59] and  F9 [     8,22]
25/0 
12/0
C3 F2 [-102,-37] and  F9 [2,28] 
F1 [    27, 81] and  F2 [-138,-24] and  F9 [22,88] 
F2 [ -64, -21] and  F4 [-2,1] and  F6 [-37,27] and  F9 [2,48]
46/0 
60/0 
67/8
C4 F1 [53,61] and  F2 [  -46,    45] and  F7 [  1,   40] and  F9 [18,126] 
F1 [53,59] and  F2 [-4821,275] and  F5 [-188,46] and  F7 [-48,28] 
F1 [53,63] and  F2 [  -19,    26] and  F4 [ -21, 50] and  F9 [4,126]
3805 
3512/2 
6735/0
C5 F4 [-2044,769] and  F7 [-48,   2] 
F7 [ - 19,      5] and  F9 [44,196] 
F6 [    -4,      4] and  F8 [36,  38] and  F9 [30,38]
690/0 
1772/0 
203/0
C6 F2 [-4821,-4475] 
F2 [-4821,-908] and  F5 [8,34] 
F2 [   275,1958] and  F7 [1,54]
3/0 
9/0 
6/2

17 optimized FSM rules make only 3 errors on the training set (99.99\% accuracy), leaving 8 vectors unclassified, and no errors on the test set but leaving 9 vectors unclassified (99.94\%). After Gaussian fuzzification of inputs (very small, 0.05\%) only 3 errors and 5 unclassified vectors are obtained for the training and 3 vectors are unclassified and 1 error is made (with the probability of correct class for this case close to 50\%) for the test set.

32 rules from SSV gave even better results: 100\% correct on the training and only 1 error on the test set.


Satellite image dataset (STATLOG version)

Training 4435 test 2000 cases, 36 semi-continous [0 to 255] attributes (= 4 spectral bands x 9 pixels in neighbourhood) and 6 decision classes: 1,2,3,4,5 and 7 (class 6 has been removed because of doubts about the validity of this class).

Method
% train
% test
Time 
train
Time test
Dipol92
94.9
88.9
746
111
Radial
88.9
87.9
564
74
CART
92.1
86.2
330
14
Bayesian Tree
98.0
85.3
248
10
C4.5
96.0
85.0
434
1
New ID
93.3
85.0
226
53

Duch W, Adamczak R, Gr¹bczewski K, A new methodology of extraction, optimization and application of crisp and fuzzy logical rules. IEEE Transactions on Neural Networks 12 (2001) 277-306


Ionosphere

200 training, 150 test cases, 34 continuous attributes, 2 classes

Method
Accuracy %
Reference
3-NN + simplex
98.7
Our ???
3-NN
96.7
our
IB3
96.7
Aha
MLP+BP
96.0
Sigillito
C4.5
94.9
Hamilton
RIAC
94.6
Hamilton
C4 (no windowing) 
94.0
Aha
Non-linear perceptron
92.0
Sigillito
FSM + rotation
92.8
our
1-NN
92.1
Aha
DB-CART
91.3
Shang, Breiman
Linear perceptron
90.7
Sigillito
CART
88.9
Shang, Breiman

N. Shang, L. Breiman, ICONIP'96, p.133
David Aha: k-NN+C4+IB3 (Aha \& Kibler, IJCAI-1989), IB3 parameter settings: 70% and 80% for acceptance and dropping respectively.
RIAC, C4.5 from: H.J. Hamilton, N. Shan, N. Cercone, RIAC: a rule induction algorithm based on approximate classification, Tech. Rep. CS 96-06, Regina University 1996.


Sonar

208 cases, 60 continuous attributes, 2 classes
From the CMU benchmark repository

Method
Train %
Test %
Reference
MLP+BP, 12 hidden
99.8±0.1
84.7±5.7
Gorman, Sejnowski
MLP+BP, 24 hidden
99.8±0.1
84.5±5.7
Gorman, Sejnowski
1-NN, Manhattan  
84.2±1.0
our (KG)
MLP+BP, 6 hidden
99.7±0.2
83.5±5.6
Gorman, Sejnowski
FSM - methodology ?   
83.6
our (RA)
1-NN Euclidean  
82.2±0.6
our (KG)
DB-CART, 10xCV   
81.8
Shang, Breiman
CART, 10xCV 
 
67.9
Shang, Breiman
Our results: kNN also from 13xCV, results from 10xCV are quite similar, for example 1-NN Manhattan 84.5±0.9


Vovel

528 training, 462 test cases, 10 continous attributes, 11 classes
From the CMU benchmark repository

Method
Train
Test
Reference
CART-DB, 10xCV on total set 
90.0
Shang, Breiman
CART, 10xCV on total set
78.2
Shang, Breiman
   
FSM initialization, methodology ?   
84.4
our (RA)
9-NN  
56.5
our ?
Square node network, 88 units  
54.8
UCI
Gaussian node network, 528 units  
54.6
UCI
1-NN  
54.1
UCI
Radial Basis Function, 528 units  
53.5
UCI
Gaussian node network, 88 units  
53.5
UCI
Square node network, 22  
51.1
UCI
Multi-layer perceptron, 88 hidden  
50.6
UCI
Modified Kanerva Model, 528 units  
50.0
UCI
Radial Basis Function, 88 units  
47.6
UCI
Single-layer perceptron, 88 hidden  
33.3
UCI

N. Shang, L. Breiman, ICONIP'96, p.133, made 10xCv instead of using the test set.


Other Data

Glass: Shang, Breiman CART 28.6% error,  DB-CART 29.4%


DNA-Primate splice-junction gene sequence


W³odzis³aw Duch, last modification 28.11.2000