[WIP] fix bug in examples #119

TwsThomas · 2020-05-14T11:26:31Z

Change the download source from official link to openML.
For employees_salaries

GaelVaroquaux · 2020-05-14T11:42:23Z

Darn, still failing!

Is it the download that is failing? If so, we should change where we download from. I had in mind that we had uploaded this data on openML, but I am not sure.

TwsThomas · 2020-05-14T11:44:40Z

It seems to me it is the download indeed.
I will work on that with openML.

GaelVaroquaux · 2020-05-14T11:48:55Z

I will work on that with openML.

Did we already upload it there? I have in mind that we had started uploading the datasets. For the download, you can use https://scikit-learn.org/stable/modules/generated/sklearn.datasets.fetch_openml.html

TwsThomas · 2020-05-14T11:54:22Z

Did we already upload it there? I have in mind that we had started
uploading the datasets.

Most of them are already in openML (cf the PR skrub-data/datasets#4)

For the download, you can use
https://scikit-learn.org/stable/modules/generated/sklearn.datasets.fetch_openml.html

I will do it with that ok.

GaelVaroquaux · 2020-05-14T11:56:15Z

Most of them are already in openML (cf the PR skrub-data/datasets#4)

Well, that work is paying out today!!! That's good news.

GaelVaroquaux

Looks great!

I have in mind that employees salary is used in other examples. No?

GaelVaroquaux · 2020-05-14T13:48:40Z

dirty_cat/datasets/fetching.py

    """

-    return fetch_dataset(EMPLOYEE_SALARIES_CONFIG, show_progress=False)
+    from sklearn.datasets import fetch_openml


I think that I would prefer if this import was moved to the top of the module, with the other imports.

GaelVaroquaux · 2020-05-14T13:49:12Z

examples/02_fit_predict_plot_employee_salaries.py

 # Then we load it:
 import pandas as pd
-df = pd.read_csv(employee_salaries['path']).astype(str)
+df = data = employee_salaries['data']


Suggested change

df = data = employee_salaries['data']

df = employee_salaries['data']

GaelVaroquaux · 2020-05-14T13:49:40Z

examples/02_fit_predict_plot_employee_salaries.py

 # Test if load was unsuccesful
 if '"code" : "authentication_required"' in str(df.iloc[0]):
-    print('Error while loading the data') #raise IOError
+    raise IOError


Should we remove the if clause: we shouldn't need it anymore, right?

GaelVaroquaux · 2020-05-14T13:52:27Z

CI is still failing because other examples need to be updated. But this is looking good!

Can you also add an entry in CHANGES.rst that documents the changes of API in the fetcher?

codecov · 2020-05-14T15:09:42Z

Codecov Report

Merging #119 into master will decrease coverage by 0.11%.
The diff coverage is 25.00%.

@@            Coverage Diff             @@
##           master     #119      +/-   ##
==========================================
- Coverage   64.29%   64.17%   -0.12%     
==========================================
  Files          11       11              
  Lines         801      804       +3     
  Branches      153      153              
==========================================
+ Hits          515      516       +1     
- Misses        246      248       +2     
  Partials       40       40

Impacted Files	Coverage Δ
dirty_cat/datasets/fetching.py	`81.57% <25.00%> (-1.31%)`	⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 3695d51...e6ea1ed. Read the comment docs.

GaelVaroquaux · 2020-05-14T15:51:36Z

CHANGES.rst

@@ -1,3 +1,7 @@
+Release 0.0.7
+=============
+* **datasets.fetch_employee_salaries**: change the origin of download for employee_salaries.


Please also mention that this changed the function signature: the function now return a Bunch with a dataframe under the field "data", and not the path to the csv file. Also, the field "description" has been renamed to "DESCR".

Mentioning this is important, because these changes can break user code, and they need to quickly identify them from the changelog.

CHANGES.rst

Co-authored-by: Gael Varoquaux <gael.varoquaux@normalesup.org>

GaelVaroquaux · 2020-05-15T07:21:17Z

Hurray, merging!

WIP fix examples

959e9cb

test CI

bb8484c

TwsThomas added 2 commits May 14, 2020 14:26

test

f6dccbf

replace fetch_employees salary

3a930d8

GaelVaroquaux reviewed May 14, 2020

View reviewed changes

TwsThomas added 2 commits May 14, 2020 17:00

minor import refactor

520dc8c

refactor code with df name from openML

b387c3c

TwsThomas added 2 commits May 14, 2020 17:13

refactor target in fetching.py

b2aea04

minor

041060f

GaelVaroquaux reviewed May 14, 2020

View reviewed changes

update change.rst

b3c8d25

GaelVaroquaux reviewed May 15, 2020

View reviewed changes

CHANGES.rst Outdated Show resolved Hide resolved

Update CHANGES.rst indent

e6ea1ed

Co-authored-by: Gael Varoquaux <gael.varoquaux@normalesup.org>

GaelVaroquaux merged commit 57b0ab3 into skrub-data:master May 15, 2020

	df = data = employee_salaries['data']
	df = employee_salaries['data']

[WIP] fix bug in examples #119

[WIP] fix bug in examples #119

Uh oh!

Conversation

TwsThomas commented May 14, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

GaelVaroquaux commented May 14, 2020

Uh oh!

TwsThomas commented May 14, 2020

Uh oh!

GaelVaroquaux commented May 14, 2020 via email

Uh oh!

TwsThomas commented May 14, 2020

Uh oh!

GaelVaroquaux commented May 14, 2020 via email

Uh oh!

GaelVaroquaux left a comment

Choose a reason for hiding this comment

Uh oh!

GaelVaroquaux May 14, 2020

Choose a reason for hiding this comment

Uh oh!

GaelVaroquaux May 14, 2020

Choose a reason for hiding this comment

Uh oh!

GaelVaroquaux May 14, 2020

Choose a reason for hiding this comment

Uh oh!

GaelVaroquaux commented May 14, 2020

Uh oh!

codecov bot commented May 14, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

GaelVaroquaux May 14, 2020

Choose a reason for hiding this comment

Uh oh!

Uh oh!

GaelVaroquaux commented May 15, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

TwsThomas commented May 14, 2020 •

edited

Loading

codecov bot commented May 14, 2020 •

edited

Loading