File tree Expand file tree Collapse file tree 1 file changed +45
-0
lines changed Expand file tree Collapse file tree 1 file changed +45
-0
lines changed Original file line number Diff line number Diff line change 1+ ## This is an implementation of Savasere, Omiecinski, and Navathe (SON) algorithm in Spark 2.1.1 with Python 2.7
2+ Finding Frequent Itemsets: SON Algorithm by A-Priori algorithm in stage 1
3+
4+ ## Algorithm: Savasere, Omiecinski, and Navathe (SON) Algorithm, A-Priori algorithm
5+
6+ ## Task:
7+ The task is to implement SON algorithm in Apache Spark using Python.
8+ Given a set of baskets, SON algorithm divides them into chunks/partitions and then proceed in two stages.
9+ First, local frequent itemsets are collected, which form candidates;
10+ next, it makes second pass through data to determine which candidates are globally frequent.
11+
12+ #### Usage: bin/spark-submit A-Priori_SON.py <baskets.txt> <.3> <output.txt>
13+
14+ 1 . baskets.txt is a text file which contains a basket (a list of comma-separated item numbers) per line.
15+ For example
16+ 1,2,3
17+ 1,2,5
18+ 1,3,4
19+ 2,3,4
20+ 1,2,3,4
21+ 2,3,5
22+ 1,2,4
23+ 1,2
24+ 1,2,3
25+ 1,2,3,4,5
26+
27+ 2 . <.3> = minimum support ratio (that is, for an itemset to be frequent, it should appear in at least 30% of the baskets).
28+
29+ 3 . output.txt is the output result file.
30+
31+
32+ #### Input: Take a utility matrix (mat.dat) as the input
33+
34+ #### Output: Output root-mean-square deviation (RMSE) into standard output or a file after each iteration
35+ After each iteration, output RMSE with 4 floating points.
36+ The "%.4f" % RMSE is adapted to format the RMSE value, and save into file as follows.
37+
38+ 1.0019
39+
40+ 0.9794
41+
42+ 0.8464
43+
44+ ...
45+
You can’t perform that action at this time.
0 commit comments