Load dataset from Hugging Face and pass to UrbanMapper#45
Conversation
There was a problem hiding this comment.
Excellent! ☀️ That's gonna be highly highly useful throughout the UrbanMapper's workflow or any subsequent libraries coming out of OSCUR / VIDA! Thanks for this one.
Proposed some modifications. Finally, may you open a github issue's feature request, proposing the need for a from_oscur(.) / from_huggingface_oscur(.) throughout the Loader module to accelerate this process? While the feature request could or could never see the light of the day, at least it's written down and could be debated, with today's team and tomorrow's one.
Last suggestion, could you re work the commit's message please to follow Git Karma Convention: (1) https://karma-runner.github.io/6.4/dev/git-commit-msg.html, or you can look at the other commits we've done thus far (2) https://github.com/VIDA-NYU/UrbanMapper/commits/main/ ? Also explained here but as you know will soon be released in a proper documentation per #43 as soonish as I'm done with it ✅
Finally, one question, however; would you say we should include https://pypi.org/project/datasets/ within UrbanMapper? I would say yes and no. Yes because one of the example anyway's utilising it. No because we do not use it as part of the UrbanMapper's modules. What is your call here? If you believe it should be included, then you'll need to uv add datasets and commit the changes done to pyproject.toml.
Congrats for this PR! First contributor 🎉
50da15d to
9b4ef7e
Compare
|
Hey @soniacq , Do not forget to rebase with Make sure you have everything committed on you current work-based branch, including some dummy commits with names like To recap in action: Tada 🎉 Cheers! |
20f4cdd to
9f8e4f2
Compare
0409ad6 to
535542b
Compare
…r using from_dataframe().
535542b to
1b3235b
Compare
… integration - Updated the introduction to include details about the OSCUR Hugging Face Dataset source, explaining its purpose and how it can be used seamlessly in all examples without requiring local downloads. - Adjusted the conclusion to reflect the inclusion of four supported formats: CSV, Parquet, Shapefile, and Hugging Face datasets.
|
@simonprovost I have addressed all your comments—thank you for the valuable insights. I believe this PR is now ready to be merged into the master branch. Please have a look when you get a chance. Thanks! |
Loading Data from Hugging Face
This example loads the "oscur/pluto" dataset from Hugging Face, selects the training split, and converts the first 1,000 rows into a pandas DataFrame for efficient analysis and exploration. The resulting DataFrame can then be loaded into UrbanMapper using from_dataframe().