Skip to content

Commit aa75197

Browse files
authored
fix: epinions dataset (#682)
1 parent 828eac8 commit aa75197

File tree

2 files changed

+61
-8
lines changed

2 files changed

+61
-8
lines changed

cornac/datasets/README.md

Lines changed: 51 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -2,10 +2,10 @@
22

33
For easy experimentation, Cornac offers access to a number of popular recommendation benchmark datasets. These are listed below along with their basic characteristics, followed by a usage example. In addition to preference feedback, some of these datasets come with item and/or user auxiliary information, which are grouped into three main categories:
44
- **Text** refers to textual information associated with items or users. The usual format of this data is `(item_id, text)`, or `(user_id, text)`. Concrete examples of such information are item textual descriptions, product reviews, movie plots, and user reviews, just to name a few.
5-
- **Graph**, for items, corresponds to a network where nodes (or vertices) are items, and links (or edges) represent relations among items. This information is typically represented by an adjacency matrix in the sparse triplet format: `(item_id, item_id, weight)`, or simply `(item_id, item_id)` in the case of unweighted edges. Relations between users (e.g., social network) are represented similarly.
5+
- **Graph**, for items, corresponds to a network where nodes (or vertices) are items, and links (or edges) represent relations among items. This information is typically represented by an adjacency matrix in the sparse triplet format: `(item_id, item_id, weight)`, or simply `(item_id, item_id)` in the case of unweighted edges. Relations between users (e.g., social network) are represented similarly.
66
- **Image** consists of visual information paired with either users or items. The common format for this type of auxiliary data is `(object_id, ndarray)`, where `object_id` could be one of `user_id` or `item_id`, the `ndarray` may contain the raw images (pixel intensities), or some visual feature vectors extracted from the images, e.g., using deep neural nets. For instance, the Amazon clothing dataset includes product CNN visual features.
77

8-
**How to cite.** If you are using one of the datasets listed below in your research, please follow the citation guidelines by the authors (the "source" link below) of each respective dataset.
8+
**How to cite.** If you are using one of the datasets listed below in your research, please follow the citation guidelines by the authors (the "source" link below) of each respective dataset.
99
<table>
1010
<tr>
1111
<th rowspan="2">Dataset</th>
@@ -191,7 +191,7 @@ trust = filmtrust.load_trust()
191191

192192
The ranting values are in the range `[0.5,4]`, and the trust network is undirected. Here are samples from our dataset,
193193
```
194-
Samples from ratings: [('1', '1', 2.0), ('1', '2', 4.0), ('1', '3', 3.5)]
194+
Samples from ratings: [('1', '1', 2.0), ('1', '2', 4.0), ('1', '3', 3.5)]
195195
Samples from trust: [('2', '966', 1.0), ('2', '104', 1.0), ('5', '1509', 1.0)]
196196
```
197197
Our dataset is now ready to use for model training and evaluation. A concrete example is [sorec_filmtrust](../../examples/sorec_filmtrust.py), which illustrates how to perform an experiment with the [SoRec](../models/sorec/) model on FilmTrust. More details regarding the other datasets are available in the [documentation](https://cornac.readthedocs.io/en/latest/datasets.html).
@@ -220,4 +220,51 @@ Our dataset is now ready to use for model training and evaluation. A concrete ex
220220
<td align="right">817,741</td>
221221
<td align="center">price, quantity</td>
222222
</tr>
223-
</table>
223+
</table>
224+
225+
---
226+
227+
## Next-Item Datasets
228+
229+
<table>
230+
<tr>
231+
<th>Dataset</th>
232+
<th>Users</th>
233+
<th>#Items</th>
234+
<th>#Sessions</th>
235+
<th>#Interactions</th>
236+
<th>Extra Info.</th>
237+
</tr>
238+
<tr>
239+
<td><a href="https://cornac.readthedocs.io/en/latest/datasets.html#module-cornac.datasets.gowalla">Gowalla</a><br>(<a href="https://snap.stanford.edu/data/loc-gowalla.html">source</a>)</td>
240+
<td align="center">107,092</td>
241+
<td align="right">1,280,969</td>
242+
<td align="right">2,710,119</td>
243+
<td align="right">6,442,892</td>
244+
<td align="center">Check-ins location (longitude, latitude)</td>
245+
</tr>
246+
<tr>
247+
<td><a href="https://cornac.readthedocs.io/en/latest/datasets.html#module-cornac.datasets.yoochoose">YooChoose (buy)</a><br>(<a href="https://2015.recsyschallenge.com/">source</a>)</td>
248+
<td align="center">N/A</td>
249+
<td align="right">19,949</td>
250+
<td align="right">509,696</td>
251+
<td align="right">1,150,753</td>
252+
<td align="center">N/A</td>
253+
</tr>
254+
<tr>
255+
<td><a href="https://cornac.readthedocs.io/en/latest/datasets.html#module-cornac.datasets.yoochoose">YooChoose (click)</a></td>
256+
<td align="center">N/A</td>
257+
<td align="right">52,739</td>
258+
<td align="right">9,249,729</td>
259+
<td align="right">33,003,944</td>
260+
<td align="center">N/A</td>
261+
</tr>
262+
<tr>
263+
<td><a href="https://cornac.readthedocs.io/en/latest/datasets.html#module-cornac.datasets.yoochoose">YooChoose (test)</a></td>
264+
<td align="center">N/A</td>
265+
<td align="right">42,155</td>
266+
<td align="right">2,312,432</td>
267+
<td align="right">8,251,791</td>
268+
<td align="center">N/A</td>
269+
</tr>
270+
</table>

cornac/datasets/epinions.py

Lines changed: 10 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -43,8 +43,11 @@ def load_feedback(reader: Reader = None) -> List:
4343
Data in the form of a list of tuples (user, item, rating).
4444
4545
"""
46-
fpath = cache(url='http://www.trustlet.org/datasets/downloaded_epinions/ratings_data.txt.bz2',
47-
unzip=True, relative_path='ratings_data.txt', cache_dir=_get_cache_dir())
46+
fpath = cache(
47+
url="https://static.preferred.ai/cornac/datasets/epinions/ratings_data.zip",
48+
unzip=True,
49+
relative_path="epinions/ratings_data.txt",
50+
)
4851
reader = Reader() if reader is None else reader
4952
return reader.read(fpath, sep=' ')
5053

@@ -63,7 +66,10 @@ def load_trust(reader: Reader = None) -> List:
6366
Data in the form of a list of tuples (source_user, target_item, trust_value).
6467
6568
"""
66-
fpath = cache(url='http://www.trustlet.org/datasets/downloaded_epinions/trust_data.txt.bz2',
67-
unzip=True, relative_path='trust_data.txt', cache_dir=_get_cache_dir())
69+
fpath = cache(
70+
url="https://static.preferred.ai/cornac/datasets/epinions/trust_data.zip",
71+
unzip=True,
72+
relative_path="epinions/trust_data.txt",
73+
)
6874
reader = Reader() if reader is None else reader
6975
return reader.read(fpath, sep=' ')

0 commit comments

Comments
 (0)