Skip to content

Commit 1a268f5

Browse files
committed
demo kmeans
1 parent bb8de26 commit 1a268f5

File tree

2 files changed

+46
-9
lines changed

2 files changed

+46
-9
lines changed

lib/PDL/Demos/Stats.pm

Lines changed: 15 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -37,6 +37,21 @@ print "m=$m\nms=$ms";
3737
random(100)->plot_acf( 50, { win=>$w } );
3838
|],
3939

40+
[act => q|
41+
# PDL::Stats::Kmeans clusters data points into "k" (a supplied number) groups
42+
$data = grandom(200, 2); # two rows = two dimensions
43+
%k = $data->kmeans; # use default of 3 clusters
44+
print "$_\t$k{$_}\n" for sort keys %k;
45+
$w->plot(
46+
(map +(with=>'points', style=>$_+1, ke=>"Cluster ".($_+1),
47+
$data->dice_axis(0,which($k{cluster}->slice(",$_")))->dog),
48+
0 .. $k{cluster}->dim(1)-1),
49+
(map +(with=>'circles', style=>$_+1, ke=>"Centroid ".($_+1), $k{centroid}->slice($_)->dog, 0.1),
50+
0 .. $k{centroid}->dim(0)-1),
51+
{le=>'tr'},
52+
);
53+
|],
54+
4055
[comment => q|
4156
This concludes the demo.
4257

lib/PDL/Stats/Kmeans.pd

Lines changed: 31 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -360,13 +360,28 @@ pp_addpm pp_line_numbers(__LINE__, <<'EOD');
360360

361361
=for ref
362362

363-
Implements classic k-means cluster analysis. Given a number of
364-
observations with values on a set of variables, kmeans puts the
365-
observations into clusters that maximizes within-cluster similarity with
366-
respect to the variables. Tries several different random seeding and
367-
clustering in parallel. Stops when cluster assignment of the observations
368-
no longer changes. Returns the best result in terms of R2 from the
369-
random-seeding trials.
363+
Implements classic k-means cluster analysis.
364+
365+
=for example
366+
367+
$data = grandom(200, 2); # two rows = two dimensions
368+
%k = $data->kmeans; # use default of 3 clusters
369+
print "$_\t$k{$_}\n" for sort keys %k;
370+
$w->plot(
371+
(map +(with=>'points', style=>$_+1, ke=>"Cluster ".($_+1),
372+
$data->dice_axis(0,which($k{cluster}->slice(",$_")))->dog),
373+
0 .. $k{cluster}->dim(1)-1),
374+
(map +(with=>'circles', style=>$_+1, ke=>"Centroid ".($_+1), $k{centroid}->slice($_)->dog, 0.1),
375+
0 .. $k{centroid}->dim(0)-1),
376+
{le=>'tr'},
377+
);
378+
379+
Given a number of observations with values on a set of variables,
380+
kmeans puts the observations into clusters that maximizes within-cluster
381+
similarity with respect to the variables. Tries several different random
382+
seeding and clustering in parallel. Stops when cluster assignment of the
383+
observations no longer changes. Returns the best result in terms of R2
384+
from the random-seeding trials.
370385

371386
Instead of random seeding, kmeans also accepts manual seeding. This is
372387
done by providing a centroid to the function, in which case clustering
@@ -661,7 +676,12 @@ sub PDL::iv_cluster {
661676

662677
=head2 pca_cluster
663678

664-
Assign variables to components ie clusters based on pca loadings or scores. One way to seed kmeans (see Ding & He, 2004, and Su & Dy, 2004 for other ways of using pca with kmeans). Variables are assigned to their most associated component. Note that some components may not have any variable that is most associated with them, so the returned number of clusters may be smaller than NCOMP.
679+
Assign variables to components ie clusters based on pca loadings or
680+
scores. One way to seed kmeans (see Ding & He, 2004, and Su & Dy, 2004
681+
for other ways of using pca with kmeans). Variables are assigned to
682+
their most associated component. Note that some components may not have
683+
any variable that is most associated with them, so the returned number
684+
of clusters may be smaller than NCOMP.
665685

666686
Default options (case insensitive):
667687

@@ -670,6 +690,7 @@ Default options (case insensitive):
670690
NCOMP => undef, # max number of components to consider. determined by
671691
# scree plot black magic if not specified
672692
PLOT => 0, # pca scree plot with cutoff at NCOMP
693+
WIN => undef, # pass pgswin object for more plotting control
673694

674695
Usage:
675696

@@ -700,6 +721,7 @@ sub PDL::pca_cluster {
700721
NCOMP => undef, # max number of components to consider. determined by
701722
# scree plot black magic if not specified
702723
PLOT => 0, # pca scree plot with cutoff at NCOMP
724+
WIN => undef, # pass pgswin object for more plotting control
703725
);
704726
if ($opt) { $opt{uc $_} = $opt->{$_} for keys %$opt; }
705727

@@ -714,7 +736,7 @@ sub PDL::pca_cluster {
714736
}
715737
$opt{PLOT} and do {
716738
require PDL::Stats::GLM;
717-
$var->plot_scree( {NCOMP=>$var->dim(0), CUT=>$opt{NCOMP}} );
739+
$var->plot_screes({NCOMP=>$var->dim(0), CUT=>$opt{NCOMP}, WIN=>$opt{WIN}});
718740
};
719741

720742
my $c = $self->slice(':',[0,$opt{NCOMP}-1])->transpose->abs->maximum_ind;

0 commit comments

Comments
 (0)