Skip to content

Commit 3652e71

Browse files
committed
In apriori, ensure that X is stored in major column format
X = df.values where df is a pandas DataFrame of boolean type. Since all columns have the same shape and type, df.values is a 2-d Numpy array which points to the same memory location as df. But it can be stored either in row or column major format. It seems that if df had been initialized with a dictionary of 1d arrays, order is 'F' (column major format) whereas it is 'C' (row major format) when df is initialized from a 2d array. Profiling shows that in most situations, 80% of apriori time is spent in single statement _bools = np.all(X[:, combin], axis=2) And for dense matrix, it runs much faster when X is stored in major column format. Sparse matrix uses CSC storage, so this was already the case.
1 parent 4349b20 commit 3652e71

File tree

1 file changed

+1
-1
lines changed

1 file changed

+1
-1
lines changed

mlxtend/frequent_patterns/apriori.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -264,7 +264,7 @@ def _support(_x, _n_rows, _is_sparse):
264264
is_sparse = True
265265
else:
266266
# dense DataFrame
267-
X = df.values
267+
X = np.asfortranarray(df.values)
268268
is_sparse = False
269269
support = _support(X, X.shape[0], is_sparse)
270270
ary_col_idx = np.arange(X.shape[1])

0 commit comments

Comments
 (0)