Unsupervised QPC - Clusterization

Indeks UQPC:

<latex> UQPC(\vec{w})=\sum_{i=1}^n\sum_{j=1}^k \alpha_{ij} G\left( \vec{w}(\vec{x}_i - \vec{t}_j) \right) </latex>

Współczynniki <latex>\alpha_{ij}</latex> zależą od odległości wektora <latex>\vec{x}_i</latex> od prototypu (po projekcji na kierunek <latex>\vec{w}</latex>).
Wektorom przypisywane są etykiety najbliższego prototypu, następnie obliczenie indeksu dokonywane jest standardową metodą QPC.

<latex> \alpha_{ij}>0 \qquad \text{if} \qquad \vec{t}_j : j=\arg \min_l{|\vec{w}(\vec{x}_i-\vec{t}_l) |} </latex>

<latex> \alpha_{ij}<0 \qquad \text{if} \qquad \vec{t}_j : j\ne\arg \min_l{|\vec{w}(\vec{x}_i-\vec{t}_l) |} </latex>

Inna wersja może uwzględniać pozycje prototypów w oryginalnej przestrzeni <latex>R^n</latex>, np.:

<latex> \alpha_{ij}>0 \qquad \text{if} \qquad \vec{t}_j : j=\arg \min_l{||\vec{x}_i-\vec{t}_l ||} </latex>

<latex> \alpha_{ij}<0 \qquad \text{if} \qquad \vec{t}_j : j\ne\arg \min_l{||\vec{x}_i-\vec{t}_l ||} </latex>

Testy na prostych danych

Config : przykładowa konfiguracja. Jednymy zmienianym parametrem w tych testach była liczba prototypów K.

>> [w q p]=uqpc_train(data,2,'dataName','iris')

uqpc_parameters = 
                        K: 2
         uqpc_initiations: 10

qpc_parameters = 
                     beta: 0.1000
              checkPeriod: 5
                 dataName: 'dataname'
               directions: 2
                  display: 'none'
                      eps: 1.0000e-03
                 function: 'gauss'
                  indGmax: []
              initiations: 10
              initWeights: []
               killPeriod: 10
                killRatio: 0.5000
                   lambda: 0.1000
             learningRate: 0.1000
                      log: 'off'
              logFileName: []
            maxIterations: 1000
               multistart: 'no'
                  OptConf: []
                OptMethod: 'gd'
  orthogonalizationMethod: 'projection'
              ortoWeights: []
                     plot: 'none'
                      plr: 0.1000
               prototypes: []
                QPCMethod: 'uqpc1'
                     save: 'none'
                  savedir: []
            stopCriterium: 2

Gauss2 - Pierwsze dwie projekcje

Sztuczne dane zawierające wektory z rozkładu normalnego

Features: 3
Instances: 400
Source: artificial data

Description: two Gaussian clusters (no overlaping). 200 vectors drawn with distribution N([-1 -1 -1];[0.4 0.4 0.4]) and another 200 using distribution N([+1 +1 +1];[0.4 0.4 0.4]).

K=2
w =
 -0.6774   -0.1991   -0.7081
  0.1694   -0.9790    0.1133

qpc =
  0.6528
  0.7564

prototypes =
 -1.6026   -0.8391    1.0000
  1.5947    0.8877    2.0000
K=3
w =
   -0.6594   -0.0136   -0.7517
   -0.2458   -0.9410    0.2326

qpc =
    0.7164
    0.7636

prototypes =
   -1.6482   -1.4852    1.0000
   -0.9964   -0.6550    2.0000
    1.4212    1.0054    3.0000
K=4
w =
   -0.8863   -0.4130   -0.2097
    0.4008   -0.4568   -0.7942

qpc =
    0.7706
    0.7472

prototypes =
   -2.0251   -0.5340    1.0000
   -1.4904   -1.2452    2.0000
   -0.9402    0.6215    3.0000
    1.5231    1.3934    4.0000
K=5
w =
   -0.7445   -0.5505   -0.3776
    0.6299   -0.3918   -0.6706

qpc =
    0.7945
    0.7679

prototypes =
   -2.2472   -0.0997    1.0000
   -1.7423   -1.2027    2.0000
   -1.2746   -0.7011    3.0000
   -0.9240    0.5470    4.0000
    1.6828    1.0826    5.0000
K=7
w =
   -0.6915   -0.4394   -0.5734
    0.4090   -0.8924    0.1905

qpc =
    0.8374
    0.4182

prototypes =
   -2.6780   -1.1170    1.0000
   -2.2099   -1.9602    2.0000
   -1.7416   -1.9625    3.0000
   -1.3012   -1.9600    4.0000
   -0.9648   -0.6108    5.0000
   -0.6844    0.0602    6.0000
    1.7147    0.7292    7.0000

Gauss3a - Pierwsze dwie projekcje

Sztuczne dane zawierające wektory z rozkładu normalnego

Features: 3
Instances: 600
Source: artificial data

Description: three Gaussian clusters (no overlaping and week overlaping). 400 vectors identical as in Gauss2 data.
Additional 200 vectors drawn with distribution N([0 3 3];[1 1 1]).

K=2
w =
   -0.7692   -0.1555   -0.6198
    0.5675   -0.6122   -0.5506

qpc =
    0.8484
    0.8040

prototypes =
   -0.0668   -0.6088    1.0000
    0.9765    0.4789    2.0000
K=3
w =
   -0.1569   -0.6702   -0.7254
   -0.9876    0.1003    0.1210

qpc =
    0.8547
    0.7066

prototypes =
   -0.5836    0.8915    1.0000
    0.2209   -0.2944    2.0000
    1.1085    0.4466    3.0000
K=4
w =
   -0.2446   -0.7543   -0.6092
   -0.9695    0.1826    0.1632

qpc =
    0.8474
    0.7416

prototypes =
   -0.8046   -0.3162    1.0000
   -0.3981    0.4073    2.0000
    0.2201   -0.7951    3.0000
    1.1300    0.9064    4.0000
K=5
w =
   -0.2985   -0.7124   -0.6351
    0.9475   -0.1412   -0.2870

qpc =
    0.8501
    0.7457

prototypes =
   -1.0304   -0.7514    1.0000
   -0.7630    0.7096    2.0000
   -0.3741   -1.0696    3.0000
    0.2116   -0.3223    4.0000
    1.1433    0.3275    5.0000

Iris - Pierwsze dwie projekcje

K =2
w =
  0.1368    0.1681   -0.9666    0.1369
 -0.3418    0.5856   -0.0504   -0.7333

q =

  0.9010
  0.8841

p =
 -0.3124   -0.4709    1.0000
  0.6586    1.0459    2.0000
K=3
w =
  0.1986    0.1252   -0.8541   -0.4640
 -0.6044    0.2767    0.2770   -0.6939

q =
  0.8516
  0.7963

p =
 -0.7688   -0.6718    1.0000
 -0.0887   -0.0270    2.0000
  1.0320    0.8203    3.0000
K=4
w =
 -0.0979   -0.0137   -0.8268   -0.5538
 -0.7768   -0.6135    0.1341   -0.0477

q =
  0.8172
  0.7590

p =
 -0.9725   -0.9724    1.0000
 -0.3340    0.4432    2.0000
  0.1701    0.9497    3.0000
  1.2426   -0.2903    4.0000

Gauss2n2 - Pierwsze dwie projekcje

Sztuczne dane zawierające wektory z roskładu normalnego oraz jednostajny szum.

Features: 4
Instances: 600
Source: artificial data

Description: two Gaussian clusters (weak overlapping) and uniform noise.
Feature 1 and 2 was drawn with distribution N(-1.3,1) and N(+1.3,1).
Feature 2 and 4 was drawn from uniform distribution with range [-4,+4].

K=2
w =
 -0.5523   -0.0285   -0.8326   -0.0303
  0.1305   -0.0569   -0.0487   -0.9886

q =
  0.7557
  0.7203

p =
 -0.5532   -0.6354    1.0000
  0.4429    0.6202    2.0000
K=3
w =
 -0.5920   -0.5247   -0.6050   -0.0910
 -0.1031    0.0999    0.1612   -0.9764

q =
  0.7224
  0.7346

p =
 -0.7178   -0.7449    1.0000
  0.0330   -0.0118    2.0000
  0.7115    0.7362    3.0000
K=4
w =
 -0.1100   -0.9853   -0.0730   -0.1085
 -0.3473    0.1587   -0.3430   -0.8582
q =
  0.7451
  0.7415
    
p =
 -0.8747   -0.9069    1.0000
 -0.3419   -0.2859    2.0000
  0.2004    0.3974    3.0000
  0.7912    0.9605    4.0000
K=5
w =
 -0.4693   -0.8418   -0.2383   -0.1195
 -0.3406    0.3571   -0.1619   -0.8545

q =
  0.7598
  0.7474

p =
 -1.1052   -0.5754    1.0000
 -0.7851   -1.0211    2.0000
 -0.2884   -0.0100    3.0000
  0.3211    0.5271    4.0000
  0.8709    1.0255    5.0000

* Problem z pozycjami prototypów przy składaniu projekcji. Metoda klasteryzacji za pomocą prototypów wymaga dopracowania. Czy taki sposób wyznacania kastrów ma sens? Marek Grochowski 2011/02/04 11:28