Commit 3652e71
committed
In apriori, ensure that X is stored in major column format
X = df.values where df is a pandas DataFrame of boolean type.
Since all columns have the same shape and type, df.values is a 2-d
Numpy array which points to the same memory location as df.
But it can be stored either in row or column major format. It seems
that if df had been initialized with a dictionary of 1d arrays, order
is 'F' (column major format) whereas it is 'C' (row major format)
when df is initialized from a 2d array.
Profiling shows that in most situations, 80% of apriori time is spent
in single statement
_bools = np.all(X[:, combin], axis=2)
And for dense matrix, it runs much faster when X is stored in major
column format. Sparse matrix uses CSC storage, so this was already
the case.1 parent 4349b20 commit 3652e71
1 file changed
+1
-1
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
264 | 264 | | |
265 | 265 | | |
266 | 266 | | |
267 | | - | |
| 267 | + | |
268 | 268 | | |
269 | 269 | | |
270 | 270 | | |
| |||
0 commit comments