Skip to content

Commit 32fd062

Browse files
committed
update readme
1 parent a76a093 commit 32fd062

File tree

10 files changed

+60
-28
lines changed

10 files changed

+60
-28
lines changed

ALS/README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -3,7 +3,7 @@
33
## Algorithm: Alternating Least Squares (ALS) Algorithm
44

55
## Task:
6-
The task is to modify the parallel implementation of ALS (alternating least squares) algorithm in Spark, so that it takes a utility matrix as the input, and output the root-mean-square deviation (RMSE) into standard output or a file after each iteration. The code for the algorithm is als.py under the <spark-2.1.0 installation directory>/examples/src/main/python.
6+
The task is to modify the parallel implementation of ALS (alternating least squares) algorithm in Spark, so that it takes a utility matrix as the input and process by UV decomposition, and output the root-mean-square deviation (RMSE) into standard output or a file after each iteration. The code for the algorithm is als.py under the <spark-2.1.0 installation directory>/examples/src/main/python.
77

88
#### Usage: bin/spark-submit ALS.py input-matrix n m f k p [output-file]
99
1. n is the number of rows (users) of the matrix

Matrix_Multiplication/README.md

Lines changed: 17 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,17 @@
1+
## This is an implementation of Two Phases Matrix Multiplication algorithm in Spark 2.1.1 with Python 2.7
2+
Matrix Multiplication: Two Phases approach to deal with huge matrix multiplication on spark platform
3+
4+
## Algorithm: Matrix Multiplication: Two Phases approach
5+
6+
## Task:
7+
The task is to implement SON algorithm in Apache Spark using Python.
8+
Given a set of baskets, SON algorithm divides them into chunks/partitions and then proceed in two stages.
9+
First, local frequent itemsets are collected, which form candidates;
10+
next, it makes second pass through data to determine which candidates are globally frequent.
11+
12+
#### Usage: bin/spark-submit TwoPhase_Matrix_Multiplication.py <mat-A/values.txt> <mat-B/values.txt> <output.txt>
13+
14+
15+
#### Input: Takes two folders with mat-A/values.txt or mat-B/values.txt as the input
16+
17+
#### Output: Save all results into one text file.

MinHash_LSH/README.md

Lines changed: 21 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,21 @@
1+
## This is an implementation of TF-IDF algorithm with cosin similarity algorithm in Spark 2.1.1 with Python 2.7
2+
A similarity algorithm implementation of TF-IDF algorithm with cosin similarity implementation on spark platform as the measure of K-Means. The implementation of k-means is provided by Spark in examples/src/main/python/ml/kmeans_example.py.
3+
4+
## Algorithm: TF-IDF algorithm with cosin similarity
5+
6+
## Task:
7+
The task is to implement TF-IDF algorithm with cosin similarity in Apache Spark using Python.
8+
Given a set of vectors to present a document as input, calculating the TF-IDF with cosin similarity to cluster those documents via similarity.
9+
10+
#### Usage: bin/spark-submit kmeans <file> <k> <convergeDist> [outputfile.txt]
11+
k - the number of clusters
12+
convergDist - The converge distance/similarity to stop program iterations.
13+
14+
example: bin\spark-submit .\kmeans.py .\docword.enron_s.txt 10 0.00001 kmeans_output.txt
15+
16+
#### Input: Takes input file from folder as the input
17+
18+
19+
#### Output: Save all results into one text file.
20+
21+
kmeans_output.txt

TF-IDF_KMeans/README.md

Lines changed: 21 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,21 @@
1+
## This is an implementation of TF-IDF algorithm with cosin similarity algorithm in Spark 2.1.1 with Python 2.7
2+
A similarity algorithm implementation of TF-IDF algorithm with cosin similarity implementation on spark platform as the measure of K-Means. The implementation of k-means is provided by Spark in examples/src/main/python/ml/kmeans_example.py.
3+
4+
## Algorithm: TF-IDF algorithm with cosin similarity
5+
6+
## Task:
7+
The task is to implement TF-IDF algorithm with cosin similarity in Apache Spark using Python.
8+
Given a set of vectors to present a document as input, calculating the TF-IDF with cosin similarity to cluster those documents via similarity.
9+
10+
#### Usage: bin/spark-submit kmeans <file> <k> <convergeDist> [outputfile.txt]
11+
k - the number of clusters
12+
convergDist - The converge distance/similarity to stop program iterations.
13+
14+
example: bin\spark-submit .\kmeans.py .\docword.enron_s.txt 10 0.00001 kmeans_output.txt
15+
16+
#### Input: Takes input file from folder as the input
17+
18+
19+
#### Output: Save all results into one text file.
20+
21+
kmeans_output.txt

TF-IDF_KMeans/Readme.txt

Lines changed: 0 additions & 8 deletions
This file was deleted.

TF-IDF_KMeans/command.txt

Lines changed: 0 additions & 8 deletions
This file was deleted.
File renamed without changes.

TF-IDF_KMeans/hca_output.txt

Lines changed: 0 additions & 5 deletions
This file was deleted.

TF-IDF_KMeans/sample.txt

Lines changed: 0 additions & 6 deletions
This file was deleted.

TF-IDF_KMeans/test.xlsx

-12.9 KB
Binary file not shown.

0 commit comments

Comments
 (0)