Skip to content

Commit c5967c1

Browse files
committed
Merge branch 'dev' of https://github.com/maks-sh/scikit-uplift into dev
2 parents 490c7c7 + 161c1b7 commit c5967c1

File tree

9 files changed

+192
-135
lines changed

9 files changed

+192
-135
lines changed

docs/api/datasets/fetch_criteo.rst

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,5 @@
1+
.. _Criteo:
2+
13
**************************************
24
`sklift.datasets <./>`_.fetch_criteo
35
**************************************
Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,9 @@
1+
.. _Hillstrom:
2+
13
****************************************
24
`sklift.datasets <./>`_.fetch_hillstrom
35
****************************************
46

57
.. autofunction:: sklift.datasets.datasets.fetch_hillstrom
68

7-
.. include:: ../../../sklift/datasets/descr/lenta.rst
9+
.. include:: ../../../sklift/datasets/descr/hillstrom.rst

docs/api/datasets/fetch_lenta.rst

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,5 @@
1+
.. _Lenta:
2+
13
***********************************
24
`sklift.datasets <./>`_.fetch_lenta
35
***********************************

docs/api/datasets/fetch_x5.rst

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,5 @@
1+
.. _X5:
2+
13
***********************************
24
`sklift.datasets <./>`_.fetch_x5
35
***********************************

requirements.txt

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,5 @@
11
scikit-learn>=0.21.0
22
numpy>=1.16
33
pandas
4-
matplotlib
4+
matplotlib
5+
requests

sklift/datasets/datasets.py

Lines changed: 158 additions & 112 deletions
Large diffs are not rendered by default.

sklift/datasets/descr/hillstrom.rst

Lines changed: 12 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -4,17 +4,20 @@ Kevin Hillstrom Dataset: MineThatData
44
Data description
55
################
66

7-
This is a copy of `MineThatData E-Mail Analytics And Data Mining Challenge dataset <https://blog.minethatdata.com/2008/03/minethatdata-e-mail-analytics-and-data.html/>`_.
7+
This is a copy of `MineThatData E-Mail Analytics And Data Mining Challenge dataset <https://blog.minethatdata.com/2008/03/minethatdata-e-mail-analytics-and-data.html>`_.
88

9-
date: March 20, 2008
10-
11-
This dataset contains 64,000 customers who last purchased within twelve months. The customers were involved in an e-mail test.
9+
This dataset contains 64,000 customers who last purchased within twelve months.
10+
The customers were involved in an e-mail test.
1211

1312
* 1/3 were randomly chosen to receive an e-mail campaign featuring Mens merchandise.
1413
* 1/3 were randomly chosen to receive an e-mail campaign featuring Womens merchandise.
1514
* 1/3 were randomly chosen to not receive an e-mail campaign.
1615

17-
During a period of two weeks following the e-mail campaign, results were tracked. Your job is to tell the world if the Mens or Womens e-mail campaign was successful.
16+
During a period of two weeks following the e-mail campaign, results were tracked.
17+
Your job is to tell the world if the Mens or Womens e-mail campaign was successful.
18+
19+
Fields
20+
################
1821

1922
Historical customer attributes at your disposal include:
2023

@@ -30,9 +33,10 @@ Historical customer attributes at your disposal include:
3033
Another variable describes the e-mail campaign the customer received:
3134

3235
* Segment
33-
* Mens E-Mail
34-
* Womens E-Mail
35-
* No E-Mail
36+
37+
* Mens E-Mail
38+
* Womens E-Mail
39+
* No E-Mail
3640

3741
Finally, we have a series of variables describing activity in the two weeks following delivery of the e-mail campaign:
3842

sklift/datasets/descr/lenta.rst

Lines changed: 8 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -8,16 +8,6 @@ An uplift modeling dataset containing data about Lenta's customers grociery shop
88

99
Source: **BigTarget Hackathon** hosted by Lenta and Microsoft in summer 2020.
1010

11-
12-
Key figures
13-
################
14-
* Format: CSV
15-
* Size: 153M (compressed) 567M (uncompressed)
16-
* Rows: 687 029
17-
* Response Ratio: 0.1
18-
* Treatment Ratio: 0.75
19-
20-
2111
Fields
2212
################
2313

@@ -115,4 +105,12 @@ Major features:
115105
* - stdev_discount_depth_[15d,1m]
116106
- discount sum coefficient of variation for 15 days, 1 month
117107

108+
Key figures
109+
################
110+
111+
* Format: CSV
112+
* Size: 153M (compressed) 567M (uncompressed)
113+
* Rows: 687 029
114+
* Response Ratio: 0.1
115+
* Treatment Ratio: 0.75
118116

sklift/datasets/descr/x5.rst

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -3,9 +3,10 @@ X5 RetailHero Uplift Modeling Dataset
33

44
The dataset is provided by X5 Retail Group at the RetailHero hackaton hosted in winter 2019.
55

6-
The dataset contains raw retail customer purchaces, raw information about products and general info about customers.
6+
The dataset contains raw retail customer purchases, raw information about products and general info about customers.
77

8-
`Hackaton website <https://ods.ai/competitions/x5-retailhero-uplift-modeling/data/>`_.
8+
9+
`Machine learning competition website <https://ods.ai/competitions/x5-retailhero-uplift-modeling/data/>`_.
910

1011
Data description
1112
################
@@ -14,7 +15,6 @@ Data contains several parts:
1415

1516
* train.csv: a subset of clients for training. The column *treatment_flg* indicates if there was a communication. The column *target* shows if there was a purchase afterward;
1617
* clients.csv: general info about clients;
17-
* products.csv: general info about stock items;
1818
* purchases.csv: clients’ purchase history prior to communication.
1919

2020
Fields

0 commit comments

Comments
 (0)