Skip to content

Commit 7209277

Browse files
committed
Initial SVN import
git-svn-id: https://python-cluster.svn.sourceforge.net/svnroot/python-cluster/trunk@1 57eab859-f816-0410-af72-e61ffa1cc713
0 parents  commit 7209277

File tree

11 files changed

+1539
-0
lines changed

11 files changed

+1539
-0
lines changed

CHANGELOG

Lines changed: 46 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,46 @@
1+
1.1.1b2
2+
- Fixed bug #1604859 (thanks to Willi Richert for reporting it)
3+
4+
1.1.1b1
5+
- Applied patch [1535137] (thanks ajaksu)
6+
--> Topology output supported
7+
--> data and raw_data are now properties.
8+
9+
1.1.0b1
10+
- KMeans Clustering implemented for simple numeric tuples.
11+
Data in the form [(1,1), (2,1), (5,3), ...]
12+
can be clustered.
13+
14+
Usage:
15+
16+
>>> from cluster import KMeansClustering
17+
>>> cl = KMeansClustering([(1,1), (2,1), (5,3), ...])
18+
>>> clusters = cl.getclusters(2)
19+
20+
the method "getclusters" takes the amount of clusters you would like to
21+
have as parameter.
22+
23+
Only numeric values are supported in the tuples. The reason for this is
24+
that the "centroid" method which I use, essentially returns a tuple of
25+
floats. So you will lose any other kind of metadata. Once I figure out a
26+
way how to recode that method, other types should be possible.
27+
28+
1.0.1b2
29+
- Optimized calculation of the hierarchical clustering by using the fact, that
30+
the generated matrix is symmetrical.
31+
32+
1.0.1b1
33+
- Implemented complete-, average-, and uclus-linkage methods. You can select
34+
one by specifying it in the constructor, for example:
35+
36+
cl = HierarchicalClustering(data, distfunc, linkage='uclus')
37+
38+
or by setting it before starting the clustering process:
39+
40+
cl = HierarchicalClustering(data, distfunc)
41+
cl.setLinkageMethod('uclus')
42+
cl.cluster()
43+
44+
- Clustering is not executed on object creation, but on the first call of
45+
"getlevel". You can force the creation of the clusters by calling the
46+
"cluster" method as shown above.

INSTALL

Lines changed: 44 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,44 @@
1+
INSTALLATION
2+
============
3+
4+
Linux
5+
-----
6+
7+
RPM-Installation
8+
~~~~~~~~~~~~~~~~
9+
10+
I'm not familiar with RPM-distributions but as far as I know it should be
11+
something like::
12+
13+
rpm -i <filename.rpm>
14+
15+
RPM-source Installation
16+
~~~~~~~~~~~~~~~~~~~~~~~
17+
18+
This is something I don't know. If somebody can enlighten me, please do!
19+
20+
Binary/Source installation
21+
~~~~~~~~~~~~~~~~~~~~~~~~~~
22+
23+
Untar the package with you favourite archive tool. On the console it will be
24+
something along the lines::
25+
26+
tar xzf <filename.tar.gz>
27+
28+
Next, go to the folder just created. It will have the same name as the package
29+
(for example "cluster-1.0.0b1") and run::
30+
31+
python setup.py install
32+
33+
For this step you need root-priviledges
34+
35+
Windows
36+
-------
37+
38+
Execute the executable file and follow the instructions displayed. Default
39+
values will be fine in most cases.
40+
41+
MacOS-X
42+
-------
43+
44+
Simply follow the same instructions as with the Linux-Source installation.

LICENSE

Lines changed: 505 additions & 0 deletions
Large diffs are not rendered by default.

MANIFEST.in

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,2 @@
1+
include README LICENSE CHANGELOG
2+
include *.py cluster.bmp MANIFEST.in

README

Lines changed: 42 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,42 @@
1+
DESCRIPTION
2+
===========
3+
4+
python-cluster is a "simple" package that allows to create several groups
5+
(clusters) of objects from a list. It's meant to be flexible and able to
6+
cluster any object. To ensure this kind of flexibility, you need not only to
7+
supply the list of objects, but also a function that calculates the similarity
8+
between two of those objects. For simple datatypes, like integers, this can be
9+
as simple as a subtraction, but more complex calculations are possible. Right
10+
now, it is possible to generate the clusters using a hierarchical clustering
11+
and the popular K-Means algorithm. For the hierarchical algorithm there are
12+
different "linkage" (single, complete, average and uclus) methods available. I
13+
plan to implement other algoithms as well on an
14+
"as-needed" or "as-I-have-time" basis.
15+
16+
Algorithms are based on the document found at
17+
http://www.elet.polimi.it/upload/matteucc/Clustering/tutorial_html/
18+
19+
USAGE
20+
=====
21+
22+
A simple python program could look like this::
23+
24+
>>> from cluster import *
25+
>>> data = [12,34,23,32,46,96,13]
26+
>>> cl = HierarchicalClustering(data, lambda x,y: abs(x-y))
27+
>>> cl.getlevel(10) # get clusters of items closer than 10
28+
[96, 46, [12, 13, 23, 34, 32]]
29+
>>> cl.getlevel(5) # get clusters of items closer than 5
30+
[96, 46, [12, 13], 23, [34, 32]]
31+
32+
Note, that when you retrieve a set of clusters, it immediately starts the
33+
clustering process, which is quite complex. If you intend to create clusters
34+
from a large dataset, consider doing that in a separate thread.
35+
36+
For K-Means clustering it would look like this:
37+
38+
>>> from cluster import KMeansClustering
39+
>>> cl = KMeansClustering([(1,1), (2,1), (5,3), ...])
40+
>>> clusters = cl.getclusters(2)
41+
42+
The parameter passed to getclusters is the count of clusters generated.

cluster.bmp

116 KB
Binary file not shown.

0 commit comments

Comments
 (0)