Cheng-Lin-Li
diff --git a/‎ALS/README.md‎
Lines changed: 1 addition & 1 deletion b/‎ALS/README.md‎
Lines changed: 1 addition & 1 deletion
diff --git a/‎Matrix_Multiplication/README.md‎
Lines changed: 17 additions & 0 deletions b/‎Matrix_Multiplication/README.md‎
Lines changed: 17 additions & 0 deletions
diff --git a/‎MinHash_LSH/README.md‎
Lines changed: 21 additions & 0 deletions b/‎MinHash_LSH/README.md‎
Lines changed: 21 additions & 0 deletions
diff --git a/‎TF-IDF_KMeans/README.md‎
Lines changed: 21 additions & 0 deletions b/‎TF-IDF_KMeans/README.md‎
Lines changed: 21 additions & 0 deletions
diff --git a/‎TF-IDF_KMeans/Readme.txt‎
Lines changed: 0 additions & 8 deletions b/‎TF-IDF_KMeans/Readme.txt‎
Lines changed: 0 additions & 8 deletions
diff --git a/‎TF-IDF_KMeans/command.txt‎
Lines changed: 0 additions & 8 deletions b/‎TF-IDF_KMeans/command.txt‎
Lines changed: 0 additions & 8 deletions
diff --git a/‎TF-IDF_KMeans/docword.enron_s - test.txt‎ renamed to ‎TF-IDF_KMeans/docword.enron_s-test.txt‎ b/‎TF-IDF_KMeans/docword.enron_s - test.txt‎ renamed to ‎TF-IDF_KMeans/docword.enron_s-test.txt‎
diff --git a/‎TF-IDF_KMeans/hca_output.txt‎
Lines changed: 0 additions & 5 deletions b/‎TF-IDF_KMeans/hca_output.txt‎
Lines changed: 0 additions & 5 deletions
diff --git a/‎TF-IDF_KMeans/sample.txt‎
Lines changed: 0 additions & 6 deletions b/‎TF-IDF_KMeans/sample.txt‎
Lines changed: 0 additions & 6 deletions
diff --git a/‎TF-IDF_KMeans/test.xlsx‎
-12.9 KB b/‎TF-IDF_KMeans/test.xlsx‎
-12.9 KB
@@ -3,7 +3,7 @@
 ## Algorithm: Alternating Least Squares (ALS) Algorithm
 
 ## Task:
-The task is to modify the parallel implementation of ALS (alternating least squares) algorithm in Spark, so that it takes a utility matrix as the input, and output the root-mean-square deviation (RMSE) into standard output or a file after each iteration. The code for the algorithm is als.py under the <spark-2.1.0 installation directory>/examples/src/main/python.
+The task is to modify the parallel implementation of ALS (alternating least squares) algorithm in Spark, so that it takes a utility matrix as the input and process by UV decomposition, and output the root-mean-square deviation (RMSE) into standard output or a file after each iteration. The code for the algorithm is als.py under the <spark-2.1.0 installation directory>/examples/src/main/python.
 
 #### Usage: bin/spark-submit ALS.py input-matrix n m f k p [output-file]
   1. n is the number of rows (users) of the matrix
 
@@ -0,0 +1,17 @@
+## This is an implementation of Two Phases Matrix Multiplication algorithm in Spark 2.1.1 with Python 2.7
+Matrix Multiplication: Two Phases approach to deal with huge matrix multiplication on spark platform
+
+## Algorithm: Matrix Multiplication: Two Phases approach
+
+## Task:
+The task is to implement SON algorithm in Apache Spark using Python. 
+Given a set of baskets, SON algorithm divides them into chunks/partitions and then proceed in two stages. 
+First, local frequent itemsets are collected, which form candidates; 
+next, it makes second pass through data to determine which candidates are globally frequent.
+
+#### Usage: bin/spark-submit TwoPhase_Matrix_Multiplication.py <mat-A/values.txt> <mat-B/values.txt> <output.txt>
+ 
+
+#### Input: Takes two folders with mat-A/values.txt or mat-B/values.txt as the input
+
+#### Output: Save all results into one text file. 
@@ -0,0 +1,21 @@
+## This is an implementation of TF-IDF algorithm with cosin similarity algorithm in Spark 2.1.1 with Python 2.7
+A similarity algorithm implementation of TF-IDF algorithm with cosin similarity implementation on spark platform as the measure of K-Means. The implementation of k-means is provided by Spark in examples/src/main/python/ml/kmeans_example.py.
+
+## Algorithm: TF-IDF algorithm with cosin similarity
+
+## Task:
+The task is to implement TF-IDF algorithm with cosin similarity in Apache Spark using Python. 
+Given a set of vectors to present a document as input, calculating the TF-IDF with cosin similarity to cluster those documents via similarity.
+
+#### Usage: bin/spark-submit kmeans <file> <k> <convergeDist> [outputfile.txt]
+	k - the number of clusters
+	convergDist - The converge distance/similarity to stop program iterations.
+	
+	example: 	bin\spark-submit .\kmeans.py .\docword.enron_s.txt 10 0.00001 kmeans_output.txt
+
+#### Input: Takes input file from folder as the input
+
+		
+#### Output: Save all results into one text file. 
+
+kmeans_output.txt
@@ -0,0 +1,21 @@
+## This is an implementation of TF-IDF algorithm with cosin similarity algorithm in Spark 2.1.1 with Python 2.7
+A similarity algorithm implementation of TF-IDF algorithm with cosin similarity implementation on spark platform as the measure of K-Means. The implementation of k-means is provided by Spark in examples/src/main/python/ml/kmeans_example.py.
+
+## Algorithm: TF-IDF algorithm with cosin similarity
+
+## Task:
+The task is to implement TF-IDF algorithm with cosin similarity in Apache Spark using Python. 
+Given a set of vectors to present a document as input, calculating the TF-IDF with cosin similarity to cluster those documents via similarity.
+
+#### Usage: bin/spark-submit kmeans <file> <k> <convergeDist> [outputfile.txt]
+	k - the number of clusters
+	convergDist - The converge distance/similarity to stop program iterations.
+	
+	example: 	bin\spark-submit .\kmeans.py .\docword.enron_s.txt 10 0.00001 kmeans_output.txt
+
+#### Input: Takes input file from folder as the input
+
+		
+#### Output: Save all results into one text file. 
+
+kmeans_output.txt