-
Notifications
You must be signed in to change notification settings - Fork 196
[WIP] fix bug in examples #119
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Darn, still failing! Is it the download that is failing? If so, we should change where we download from. I had in mind that we had uploaded this data on openML, but I am not sure. |
|
It seems to me it is the download indeed. |
|
I will work on that with openML.
Did we already upload it there? I have in mind that we had started
uploading the datasets.
For the download, you can use
https://scikit-learn.org/stable/modules/generated/sklearn.datasets.fetch_openml.html
|
Most of them are already in openML (cf the PR skrub-data/datasets#4)
I will do it with that ok. |
|
Most of them are already in openML (cf the PR skrub-data/datasets#4)
Well, that work is paying out today!!! That's good news.
|
GaelVaroquaux
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great!
I have in mind that employees salary is used in other examples. No?
dirty_cat/datasets/fetching.py
Outdated
| """ | ||
|
|
||
| return fetch_dataset(EMPLOYEE_SALARIES_CONFIG, show_progress=False) | ||
| from sklearn.datasets import fetch_openml |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that I would prefer if this import was moved to the top of the module, with the other imports.
| # Then we load it: | ||
| import pandas as pd | ||
| df = pd.read_csv(employee_salaries['path']).astype(str) | ||
| df = data = employee_salaries['data'] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| df = data = employee_salaries['data'] | |
| df = employee_salaries['data'] |
| # Test if load was unsuccesful | ||
| if '"code" : "authentication_required"' in str(df.iloc[0]): | ||
| print('Error while loading the data') #raise IOError | ||
| raise IOError |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we remove the if clause: we shouldn't need it anymore, right?
|
CI is still failing because other examples need to be updated. But this is looking good! Can you also add an entry in CHANGES.rst that documents the changes of API in the fetcher? |
Codecov Report
@@ Coverage Diff @@
## master #119 +/- ##
==========================================
- Coverage 64.29% 64.17% -0.12%
==========================================
Files 11 11
Lines 801 804 +3
Branches 153 153
==========================================
+ Hits 515 516 +1
- Misses 246 248 +2
Partials 40 40
Continue to review full report at Codecov.
|
| @@ -1,3 +1,7 @@ | |||
| Release 0.0.7 | |||
| ============= | |||
| * **datasets.fetch_employee_salaries**: change the origin of download for employee_salaries. | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please also mention that this changed the function signature: the function now return a Bunch with a dataframe under the field "data", and not the path to the csv file. Also, the field "description" has been renamed to "DESCR".
Mentioning this is important, because these changes can break user code, and they need to quickly identify them from the changelog.
Co-authored-by: Gael Varoquaux <gael.varoquaux@normalesup.org>
|
Hurray, merging! |
Change the download source from official link to openML.
For employees_salaries