Understanding the data

SSV, Separability Split Value decision tree

SSV separability criterion: choose the feature for which a split value is found that allows to

separate maximum number of vector pairs from different classes
if different splits lead to the same result minimize the number of separated pairs from the same class.

Note: this is a heuristic criterion.
Separability is maximized, not the number of errors system makes.

Define the left set and the right set:

The SSV criterion for the test s counts the no. of elements from class c in the left set and from all other clases in the right set, summing over all classes:

The second term sums the number of cases from the same class; factor 2 is added to ensure that the first term dominates and the second counts only when the first terms are equal.

Simple criterion;
automatic - no paramters;
gives useful linguistic variables;
deals with symbolic, discrete and continuous features;
handles missing values - as ?, one of the symbolic values.

Applications: discretization, feature selection, rules, decision trees.

Each node of the tree is described by:

the split condition
the number of vectors in the node (satisfying the condition)
the number of missing values within that vectors for the split feature
the number of erroneously classified vectors.

The SSV plot shows criterion values against split values for the feature selected in the list on the left. The plot lines show the following:

red - the number of errors if we add the split to the tree
green - the first part of SSV - the number of correctly separated pairs
blue - the second part of SSV - the number of separated pairs from the same class

Remarks:

the numbers above the SSV plot lines show the values of the red, green and blue curves for the best split value for the presented feature
the value below the plot is the best split value for the presented feature
SSV estimates separability, so it can significantly differ from the error curve (red line)

Some results from the SSV tree and rules.

Włodzisław Duch