Skip to content

Commit 2556575

Browse files
authored
Create README.md
1 parent d3e8f48 commit 2556575

File tree

1 file changed

+45
-0
lines changed

1 file changed

+45
-0
lines changed

A-Priori_SON/README.md

Lines changed: 45 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,45 @@
1+
## This is an implementation of Savasere, Omiecinski, and Navathe (SON) algorithm in Spark 2.1.1 with Python 2.7
2+
Finding Frequent Itemsets: SON Algorithm by A-Priori algorithm in stage 1
3+
4+
## Algorithm: Savasere, Omiecinski, and Navathe (SON) Algorithm, A-Priori algorithm
5+
6+
## Task:
7+
The task is to implement SON algorithm in Apache Spark using Python.
8+
Given a set of baskets, SON algorithm divides them into chunks/partitions and then proceed in two stages.
9+
First, local frequent itemsets are collected, which form candidates;
10+
next, it makes second pass through data to determine which candidates are globally frequent.
11+
12+
#### Usage: bin/spark-submit A-Priori_SON.py <baskets.txt> <.3> <output.txt>
13+
14+
1. baskets.txt is a text file which contains a basket (a list of comma-separated item numbers) per line.
15+
For example
16+
1,2,3
17+
1,2,5
18+
1,3,4
19+
2,3,4
20+
1,2,3,4
21+
2,3,5
22+
1,2,4
23+
1,2
24+
1,2,3
25+
1,2,3,4,5
26+
27+
2. <.3> = minimum support ratio (that is, for an itemset to be frequent, it should appear in at least 30% of the baskets).
28+
29+
3. output.txt is the output result file.
30+
31+
32+
#### Input: Take a utility matrix (mat.dat) as the input
33+
34+
#### Output: Output root-mean-square deviation (RMSE) into standard output or a file after each iteration
35+
After each iteration, output RMSE with 4 floating points.
36+
The "%.4f" % RMSE is adapted to format the RMSE value, and save into file as follows.
37+
38+
1.0019
39+
40+
0.9794
41+
42+
0.8464
43+
44+
...
45+

0 commit comments

Comments
 (0)