-
Notifications
You must be signed in to change notification settings - Fork 20
Home
Chris De Vries edited this page Feb 3, 2014
·
26 revisions
Welcome to the lmw-tree wiki!
See http://ktree.sourceforge.net/emtree/clueweb09/ and http://ktree.sourceforge.net/emtree/clueweb12/ for examples of clusters produced by the EM-tree algorithm. The ClueWeb09 dataset contains 500 million documents and was clustered into 700,000 clusters. The ClueWeb12 datasets contains 733 million documents and was clustered into 600,000 clusters. The document to cluster mappings and other related files area available at http://sourceforge.net/projects/ktree/files/clueweb_clusters/.
TODO:
- Cpp idioms
- Memory overhead
- Vector operations
- Fix the K-tree implementation -- it was broken somewhere along the way
- Testing
- Finish implementing indexer
- Add random projections
- Add loading standard formats of vector data; i.e. SVM-light formats and others
- Create a collection of examples of using the library
- Write a users guide