You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
myFM is an implementation of Bayesian [Factorization Machines](https://ieeexplore.ieee.org/abstract/document/5694074/) based on Gibbs sampling, which I believe is a wheel worth reinventing.
4
-
5
-
The goal of this project is to
6
9
7
-
1. Implement Gibbs sampler easy to use from Python.
8
-
2. Use modern technology like [Eigen](http://eigen.tuxfamily.org/index.php?title=Main_Page) and [pybind11](https://github.com/pybind/pybind11) for simpler and faster implementation.
10
+
myFM is an implementation of Bayesian [Factorization Machines](https://ieeexplore.ieee.org/abstract/document/5694074/) based on Gibbs sampling, which I believe is a wheel worth reinventing.
9
11
10
12
Currently this supports most options for libFM MCMC engine, such as
11
13
@@ -19,33 +21,25 @@ There are also functionalities not present in libFM:
19
21
20
22
Tutorial and reference doc is provided at https://myfm.readthedocs.io/en/latest/.
21
23
22
-
# Requirements
23
-
24
-
Python >= 3.6 and recent version of gcc/clang with C++ 11 support.
25
-
26
24
# Installation
27
25
28
-
For Linux / Mac OSX, type
26
+
The package is pip-installable.
29
27
30
28
```
31
29
pip install myfm
32
30
```
33
31
34
-
In addition to installing python dependencies (`numpy`, `scipy`, `pybind11`, ...), the above command will automatically download eigen (ver 3.3.7) to its build directory and use it for the build.
35
-
36
-
If you want to use another version of eigen, you can also do
If you are working with less popular OS/architecture, pip will attempt to build myFM from the source (you need a decent C++ compiler!). In that case, in addition to installing python dependencies (`numpy`, `scipy`, `pandas`, ...), the above command will automatically download eigen (ver 3.4.0) to its build directory and use it during the build.
41
35
42
36
# Examples
43
37
44
38
## A Toy example
45
39
46
40
This example is taken from [pyfm](https://github.com/coreylynch/pyFM) with some modification.
47
41
48
-
```Python
42
+
```python
49
43
import myfm
50
44
from sklearn.feature_extraction import DictVectorizer
51
45
import numpy as np
@@ -75,7 +69,7 @@ This example will require `pandas` and `scikit-learn`. `movielens100k_loader` is
75
69
76
70
You will be able to obtain a result comparable to SOTA algorithms like GC-MC. See `examples/ml-100k.ipynb` for the detailed version.
77
71
78
-
```Python
72
+
```python
79
73
import numpy as np
80
74
from sklearn.preprocessing import OneHotEncoder
81
75
from sklearn import metrics
@@ -133,7 +127,7 @@ Below is a toy movielens-like example which utilizes relational data format prop
133
127
134
128
This example, however, is too simplistic to exhibit the computational advantage of this data format. For an example with drastically reduced computational complexity, see `examples/ml-100k-extended.ipynb`;
**myFM** is an unofficial implementation of Bayesian Factorization Machines. Its goals are to
10
+
**myFM** is an unofficial implementation of Bayesian Factorization Machines in Python/C++.
11
+
Notable features include:
11
12
12
-
* implement a `libFM <http://libfm.org/>`_ - like functionality that is easy to use from Python
13
-
* provide a simpler and faster implementation with `Pybind11 <https://github.com/pybind/pybind11>`_ and `Eigen <http://eigen.tuxfamily.org/index.php?title=Main_Page>`_
13
+
* Implementation most functionalities of `libFM <http://libfm.org/>`_ MCMC engine (including grouping & relation block)
14
+
* A simpler and faster implementation with `Pybind11 <https://github.com/pybind/pybind11>`_ and `Eigen <http://eigen.tuxfamily.org/index.php?title=Main_Page>`_
15
+
* Gibbs sampling for **ordinal regression** with probit link function. See :ref:`the tutorial <OrdinalRegression>` for its usage.
16
+
* Variational inference which converges faster and requires lower memory (but usually less accurate than the Gibbs sampling).
14
17
15
-
If you have a standard Python environment on MacOS/Linux, you can install the library from PyPI: ::
18
+
19
+
In most cases, you can install the library from PyPI: ::
16
20
17
21
pip install myfm
18
22
19
23
It has an interface similar to sklearn, and you can use them for wide variety of prediction tasks.
20
-
For example, ::
24
+
For example,
25
+
26
+
.. testcode::
21
27
22
28
from sklearn.datasets import load_breast_cancer
23
29
from sklearn.model_selection import train_test_split
Try out the following :ref:`examples <MovielensIndex>` to see how Bayesian approaches to explicit collaborative filtering
42
-
are still very competitive (almost unbeaten)!
47
+
.. testoutput::
48
+
:hide:
49
+
:options: +ELLIPSIS
43
50
44
-
One of the distinctive features of myFM is the support for ordinal regression with probit link function.
45
-
See :ref:`the tutorial <OrdinalRegression>` for its usage.
51
+
0.99...
46
52
47
-
In version 0.3, we have also implemented Variational Inference, which converges faster and requires lower memory (as we don't have to keep numerous samples).
53
+
54
+
Try out the following :ref:`examples <MovielensIndex>` to see how Bayesian approaches to explicit collaborative filtering
55
+
are still very competitive (almost unbeaten)!
48
56
49
57
.. toctree::
50
58
:caption:Basic Usage
@@ -59,7 +67,6 @@ In version 0.3, we have also implemented Variational Inference, which converges
Copy file name to clipboardExpand all lines: doc/source/movielens.rst
+58-20Lines changed: 58 additions & 20 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -30,10 +30,10 @@ This formulation is equivalent to Factorization Machines with
30
30
So you can efficiently use encoder like sklearn's `OneHotEncoder <https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.OneHotEncoder.html>`_
31
31
to prepare the input matrix.
32
32
33
-
::
33
+
.. testcode ::
34
34
35
35
import numpy as np
36
-
from sklearn.preprocessing import OneHotEncoder
36
+
from sklearn.preprocessing import MultiLabelBinarizer, OneHotEncoder
37
37
from sklearn import metrics
38
38
39
39
import myfm
@@ -60,6 +60,12 @@ to prepare the input matrix.
60
60
mae = np.abs(y_test - prediction).mean()
61
61
print(f'rmse={rmse}, mae={mae}')
62
62
63
+
.. testoutput::
64
+
:hide:
65
+
:options: +ELLIPSIS
66
+
67
+
rmse=..., mae=...
68
+
63
69
The above script should give you RMSE=0.8944, MAE=0.7031 which is already
64
70
impressive compared with other recent methods.
65
71
@@ -78,7 +84,9 @@ user vectors and item vectors are drawn from separate normal priors:
78
84
79
85
However, we haven't provided any information about which columns are users' and items'.
80
86
81
-
You can tell :py:class:`myfm.MyFMRegressor` these information (i.e., which parameters share a common mean and variance) by ``group_shapes`` option: ::
87
+
You can tell :py:class:`myfm.MyFMRegressor` these information (i.e., which parameters share a common mean and variance) by ``group_shapes`` option:
88
+
89
+
.. testcode ::
82
90
83
91
fm_grouped = myfm.MyFMRegressor(
84
92
rank=FM_RANK, random_seed=42,
@@ -93,6 +101,13 @@ You can tell :py:class:`myfm.MyFMRegressor` these information (i.e., which para
93
101
mae = np.abs(y_test - prediction_grouped).mean()
94
102
print(f'rmse={rmse}, mae={mae}')
95
103
104
+
.. testoutput::
105
+
:hide:
106
+
:options: +ELLIPSIS
107
+
108
+
rmse=..., mae=...
109
+
110
+
96
111
This will slightly improve the performance to RMSE=0.8925, MAE=0.7001.
97
112
98
113
@@ -102,23 +117,32 @@ Adding Side information
102
117
103
118
It is straightforward to include user/item side information.
104
119
105
-
First we retrieve the side information from ``Movielens100kDataManager``: ::
120
+
First we retrieve the side information from ``Movielens100kDataManager``:
Note that the way movie genre information is represented in ``movie_info`` DataFrame is a bit tricky (it is already binary encoded).
120
142
121
-
We can then augment ``X_train`` / ``X_test`` with auxiliary information. The `hstack <https://docs.scipy.org/doc/scipy/reference/generated/scipy.sparse.hstack.html>`_ function of ``scipy.sparse`` is very convenient for this purpose: ::
143
+
We can then augment ``X_train`` / ``X_test`` with auxiliary information. The `hstack <https://docs.scipy.org/doc/scipy/reference/generated/scipy.sparse.hstack.html>`_ function of ``scipy.sparse`` is very convenient for this purpose:
144
+
145
+
.. testcode ::
122
146
123
147
import scipy.sparse as sps
124
148
X_train_extended = sps.hstack([
@@ -127,9 +151,11 @@ We can then augment ``X_train`` / ``X_test`` with auxiliary information. The `hs
0 commit comments