Skip to content

Commit b73d82c

Browse files
Add key figures(x5, hillstorm) (#146)
* modified: sklift/datasets/descr/hillstrom.rst * modified: sklift/datasets/descr/hillstrom.rst * modified: sklift/datasets/descr/x5.rst * modified: sklift/tests/test_metrics.py * 📝 the same style * 📝 revert unnecessary changes Co-authored-by: Maksim Shevchenko
1 parent 37f1592 commit b73d82c

File tree

5 files changed

+38
-12
lines changed

5 files changed

+38
-12
lines changed

sklift/datasets/descr/criteo.rst

Lines changed: 5 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -25,8 +25,11 @@ Key figures
2525
* Format: CSV
2626
* Size: 297M (compressed) 3,2GB (uncompressed)
2727
* Rows: 13,979,592
28-
* Average Visit Rate: .046992
29-
* Average Conversion Rate: .00292
28+
* Response Ratio:
29+
30+
* Average `Visit` Rate: .046992
31+
* Average `Conversion` Rate: .00292
32+
3033
* Treatment Ratio: .85
3134

3235

@@ -35,7 +38,3 @@ This dataset is released along with the paper:
3538
“*A Large Scale Benchmark for Uplift Modeling*"
3639
Eustache Diemert, Artem Betlei, Christophe Renaudin; (Criteo AI Lab), Massih-Reza Amini (LIG, Grenoble INP)
3740
This work was published in: `AdKDD 2018 <https://adkdd-targetad.wixsite.com/2018/>`_ Workshop, in conjunction with KDD 2018.
38-
39-
40-
41-

sklift/datasets/descr/hillstrom.rst

Lines changed: 15 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -42,4 +42,18 @@ Finally, we have a series of variables describing activity in the two weeks foll
4242

4343
* Visit: 1/0 indicator, 1 = Customer visited website in the following two weeks.
4444
* Conversion: 1/0 indicator, 1 = Customer purchased merchandise in the following two weeks.
45-
* Spend: Actual dollars spent in the following two weeks.
45+
* Spend: Actual dollars spent in the following two weeks.
46+
47+
Key figures
48+
################
49+
50+
* Format: CSV
51+
* Size: 433KB (compressed) 4,935KB (uncompressed)
52+
* Rows: 64,000
53+
* Response Ratio:
54+
55+
* Average `visit` Rate: .15,
56+
* Average `conversion` Rate: .009,
57+
* the values in the `spend` column are unevenly distributed from 0.0 to 499.0
58+
59+
* Treatment Ratio: The parts are distributed evenly between the *three* classes

sklift/datasets/descr/lenta.rst

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -110,7 +110,7 @@ Key figures
110110

111111
* Format: CSV
112112
* Size: 153M (compressed) 567M (uncompressed)
113-
* Rows: 687 029
114-
* Response Ratio: 0.1
115-
* Treatment Ratio: 0.75
113+
* Rows: 687,029
114+
* Response Ratio: .1
115+
* Treatment Ratio: .75
116116

sklift/datasets/descr/megafon.rst

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -23,8 +23,8 @@ Key figures
2323
################
2424
* Format: CSV
2525
* Size: 554M
26-
* Rows: 600 000
27-
* Average Conversion Rate: .2
26+
* Rows: 600,000
27+
* Response Ratio: .2
2828
* Treatment Ratio: .5
2929

3030

sklift/datasets/descr/x5.rst

Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -23,4 +23,17 @@ Fields
2323
* treatment_flg (binary): information on performed communication
2424
* target (binary): customer purchasing
2525

26+
Key figures
27+
################
28+
29+
* Format: CSV
30+
* Size: 647M (compressed) 4.17GB (uncompressed)
31+
* Rows:
32+
33+
* in 'clients.csv': 400,162
34+
* in 'purchases.csv': 45,786,568
35+
* in 'uplift_train.csv': 200,039
36+
37+
* Response Ratio: .62
38+
* Treatment Ratio: .5
2639

0 commit comments

Comments
 (0)