Exploratory data analysis: visualization of data.
Data topography preserving mapping method: MDS (MultiDimensional Scaling)
Idea: try to place images of data vectors in such a way that preserves the distances.
where wij are weights allowing to control which distances are to be better preserved.
using a gradient descent method (steepest descent, conjugate gradient, quasi-Newton, ...)
Our choice: steepest descent with 2nd order optimization of the step-size along the gradient
Multiply previous weight wij by a Gaussian-like term centered on Pc, decreasing when the mean distance
Dcij = (Dci + Dcj)/2
between Dij end points and point Pc is increasing:
Psychometric MMPI test: patients as samples, numerical factors as attributes
Two datasets: Men / Women.
Women dataset
Metric MDS mapping of the Women database.
|
|
Focusing on data point 'p554' from class 'organika'
- Purpose: View (Understand) why this data is classified into class 'organika'.
- Classified using IncNet neural network, for which features 2, 4 and 7 are sufficient to classify correctly class 'organika'.
- To avoid interference from noisy dimensions, only those dimensions (2,4,7) were used for the MDS mapping,
200 nearest neighbors |
100 nearest neighbors |
Sconv = 0.02695 (random initialization, trial 6) |
Sconv = 0.14635 (random initialization, trial 24) |
50 nearest neighbors |
20 nearest neighbors |
Sconv = 0.02849 (random initialization, trial 2) |
Sconv = 0.01899 (random initialization, trial 1) |
Visualization of IncNet classifier's decision borders
The 50 nearest neighbors with 100 Gaussian (s =1) points classified |
The 50 nearest neighbors with 100 Gaussian (s =2) points classified |
1 - Generation of 100 new points from a Gaussian distribution centered at p554,
2 - Classification of the new points using IncNet classifier,
3 - Addition of the new points to the 100 nearest neighbors map using relative mapping (each point is mapped separately).
Sensitivity to initial configuration:
Initialization of the configuration:
Our strategy: Initialize using PCA and 20 random trials and then keep the best run.
3 mappings of the 10 nearest neighbors of point p554
Sconv = 0.03904 (PCA initialization) |
Sconv = 0.023181 (random initialization, trial 1) |
Sconv = 0.023176 (random initialization, trial 2) |
Features of MDS mapping for database visualization
Features of our MDS mapping software (prototype GUI)