Understanding the data


SSV, Separability Split Value decision tree


SSV separability criterion: choose the feature for which a split value is found that allows to

Note: this is a heuristic criterion.
Separability is maximized, not the number of errors system makes.

Define the left set and the right set:

The SSV criterion for the test s counts the no. of elements from class c in the left set and from all other clases in the right set, summing over all classes:

The second term sums the number of cases from the same class; factor 2 is added to ensure that the first term dominates and the second counts only when the first terms are equal.

Applications: discretization, feature selection, rules, decision trees.

Each node of the tree is described by:

The SSV plot shows criterion values against split values for the feature selected in the list on the left. The plot lines show the following:

Remarks:

Some results from the SSV tree and rules.


Włodzisław Duch