|
2 | 2 |
|
3 | 3 | For easy experimentation, Cornac offers access to a number of popular recommendation benchmark datasets. These are listed below along with their basic characteristics, followed by a usage example. In addition to preference feedback, some of these datasets come with item and/or user auxiliary information, which are grouped into three main categories: |
4 | 4 | - **Text** refers to textual information associated with items or users. The usual format of this data is `(item_id, text)`, or `(user_id, text)`. Concrete examples of such information are item textual descriptions, product reviews, movie plots, and user reviews, just to name a few. |
5 | | -- **Graph**, for items, corresponds to a network where nodes (or vertices) are items, and links (or edges) represent relations among items. This information is typically represented by an adjacency matrix in the sparse triplet format: `(item_id, item_id, weight)`, or simply `(item_id, item_id)` in the case of unweighted edges. Relations between users (e.g., social network) are represented similarly. |
| 5 | +- **Graph**, for items, corresponds to a network where nodes (or vertices) are items, and links (or edges) represent relations among items. This information is typically represented by an adjacency matrix in the sparse triplet format: `(item_id, item_id, weight)`, or simply `(item_id, item_id)` in the case of unweighted edges. Relations between users (e.g., social network) are represented similarly. |
6 | 6 | - **Image** consists of visual information paired with either users or items. The common format for this type of auxiliary data is `(object_id, ndarray)`, where `object_id` could be one of `user_id` or `item_id`, the `ndarray` may contain the raw images (pixel intensities), or some visual feature vectors extracted from the images, e.g., using deep neural nets. For instance, the Amazon clothing dataset includes product CNN visual features. |
7 | 7 |
|
8 | | -**How to cite.** If you are using one of the datasets listed below in your research, please follow the citation guidelines by the authors (the "source" link below) of each respective dataset. |
| 8 | +**How to cite.** If you are using one of the datasets listed below in your research, please follow the citation guidelines by the authors (the "source" link below) of each respective dataset. |
9 | 9 | <table> |
10 | 10 | <tr> |
11 | 11 | <th rowspan="2">Dataset</th> |
@@ -191,7 +191,7 @@ trust = filmtrust.load_trust() |
191 | 191 |
|
192 | 192 | The ranting values are in the range `[0.5,4]`, and the trust network is undirected. Here are samples from our dataset, |
193 | 193 | ``` |
194 | | -Samples from ratings: [('1', '1', 2.0), ('1', '2', 4.0), ('1', '3', 3.5)] |
| 194 | +Samples from ratings: [('1', '1', 2.0), ('1', '2', 4.0), ('1', '3', 3.5)] |
195 | 195 | Samples from trust: [('2', '966', 1.0), ('2', '104', 1.0), ('5', '1509', 1.0)] |
196 | 196 | ``` |
197 | 197 | Our dataset is now ready to use for model training and evaluation. A concrete example is [sorec_filmtrust](../../examples/sorec_filmtrust.py), which illustrates how to perform an experiment with the [SoRec](../models/sorec/) model on FilmTrust. More details regarding the other datasets are available in the [documentation](https://cornac.readthedocs.io/en/latest/datasets.html). |
@@ -220,4 +220,51 @@ Our dataset is now ready to use for model training and evaluation. A concrete ex |
220 | 220 | <td align="right">817,741</td> |
221 | 221 | <td align="center">price, quantity</td> |
222 | 222 | </tr> |
223 | | -</table> |
| 223 | +</table> |
| 224 | + |
| 225 | +--- |
| 226 | + |
| 227 | +## Next-Item Datasets |
| 228 | + |
| 229 | +<table> |
| 230 | + <tr> |
| 231 | + <th>Dataset</th> |
| 232 | + <th>Users</th> |
| 233 | + <th>#Items</th> |
| 234 | + <th>#Sessions</th> |
| 235 | + <th>#Interactions</th> |
| 236 | + <th>Extra Info.</th> |
| 237 | + </tr> |
| 238 | + <tr> |
| 239 | + <td><a href="https://cornac.readthedocs.io/en/latest/datasets.html#module-cornac.datasets.gowalla">Gowalla</a><br>(<a href="https://snap.stanford.edu/data/loc-gowalla.html">source</a>)</td> |
| 240 | + <td align="center">107,092</td> |
| 241 | + <td align="right">1,280,969</td> |
| 242 | + <td align="right">2,710,119</td> |
| 243 | + <td align="right">6,442,892</td> |
| 244 | + <td align="center">Check-ins location (longitude, latitude)</td> |
| 245 | + </tr> |
| 246 | + <tr> |
| 247 | + <td><a href="https://cornac.readthedocs.io/en/latest/datasets.html#module-cornac.datasets.yoochoose">YooChoose (buy)</a><br>(<a href="https://2015.recsyschallenge.com/">source</a>)</td> |
| 248 | + <td align="center">N/A</td> |
| 249 | + <td align="right">19,949</td> |
| 250 | + <td align="right">509,696</td> |
| 251 | + <td align="right">1,150,753</td> |
| 252 | + <td align="center">N/A</td> |
| 253 | + </tr> |
| 254 | + <tr> |
| 255 | + <td><a href="https://cornac.readthedocs.io/en/latest/datasets.html#module-cornac.datasets.yoochoose">YooChoose (click)</a></td> |
| 256 | + <td align="center">N/A</td> |
| 257 | + <td align="right">52,739</td> |
| 258 | + <td align="right">9,249,729</td> |
| 259 | + <td align="right">33,003,944</td> |
| 260 | + <td align="center">N/A</td> |
| 261 | + </tr> |
| 262 | + <tr> |
| 263 | + <td><a href="https://cornac.readthedocs.io/en/latest/datasets.html#module-cornac.datasets.yoochoose">YooChoose (test)</a></td> |
| 264 | + <td align="center">N/A</td> |
| 265 | + <td align="right">42,155</td> |
| 266 | + <td align="right">2,312,432</td> |
| 267 | + <td align="right">8,251,791</td> |
| 268 | + <td align="center">N/A</td> |
| 269 | + </tr> |
| 270 | +</table> |
0 commit comments