Skip to content

Load dataset from Hugging Face and pass to UrbanMapper#45

Merged
simonprovost merged 2 commits intomainfrom
load_data_huggingface
May 6, 2025
Merged

Load dataset from Hugging Face and pass to UrbanMapper#45
simonprovost merged 2 commits intomainfrom
load_data_huggingface

Conversation

@soniacq
Copy link
Contributor

@soniacq soniacq commented Apr 25, 2025

Loading Data from Hugging Face

This example loads the "oscur/pluto" dataset from Hugging Face, selects the training split, and converts the first 1,000 rows into a pandas DataFrame for efficient analysis and exploration. The resulting DataFrame can then be loaded into UrbanMapper using from_dataframe().

Screenshot 2025-04-25 at 4 02 01 PM

Copy link
Member

@simonprovost simonprovost left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Excellent! ☀️ That's gonna be highly highly useful throughout the UrbanMapper's workflow or any subsequent libraries coming out of OSCUR / VIDA! Thanks for this one.

Proposed some modifications. Finally, may you open a github issue's feature request, proposing the need for a from_oscur(.) / from_huggingface_oscur(.) throughout the Loader module to accelerate this process? While the feature request could or could never see the light of the day, at least it's written down and could be debated, with today's team and tomorrow's one.

Last suggestion, could you re work the commit's message please to follow Git Karma Convention: (1) https://karma-runner.github.io/6.4/dev/git-commit-msg.html, or you can look at the other commits we've done thus far (2) https://github.com/VIDA-NYU/UrbanMapper/commits/main/ ? Also explained here but as you know will soon be released in a proper documentation per #43 as soonish as I'm done with it ✅

Finally, one question, however; would you say we should include https://pypi.org/project/datasets/ within UrbanMapper? I would say yes and no. Yes because one of the example anyway's utilising it. No because we do not use it as part of the UrbanMapper's modules. What is your call here? If you believe it should be included, then you'll need to uv add datasets and commit the changes done to pyproject.toml.

Congrats for this PR! First contributor 🎉

@simonprovost simonprovost force-pushed the main branch 8 times, most recently from 50da15d to 9b4ef7e Compare April 28, 2025 01:21
@simonprovost
Copy link
Member

Hey @soniacq ,

Do not forget to rebase with main now that #43 and #46 are merged into the main track. You may know how to do it, but in this case:

Make sure you have everything committed on you current work-based branch, including some dummy commits with names like save or any stash workflow. Then proceed to main and pull the latest. Git checkout back to your branch and run git rebase main to ensure you are "up to date" with main. If conflicts arise, it is possible because the docs PR (#43) we completed with @ctsilva was quite large; determine what should be kept based on what is coming and your changes. Feel free to message me if you are unsure, but you should be fine, trusting you on that it's no big deal ✅

To recap in action:

# Assuming you have nothing to commit
git checkout main
git pull
git checkout -
git rebase main

Tada 🎉

Cheers!

@simonprovost simonprovost force-pushed the load_data_huggingface branch from 535542b to 1b3235b Compare May 2, 2025 16:18
… integration

- Updated the introduction to include details about the OSCUR Hugging Face Dataset source, explaining its purpose and how it can be used seamlessly in all examples without requiring local downloads.
- Adjusted the conclusion to reflect the inclusion of four supported formats: CSV, Parquet, Shapefile, and Hugging Face datasets.
@soniacq
Copy link
Contributor Author

soniacq commented May 6, 2025

@simonprovost I have addressed all your comments—thank you for the valuable insights. I believe this PR is now ready to be merged into the master branch. Please have a look when you get a chance. Thanks!

@simonprovost
Copy link
Member

Thanks to @soniacq for this contrib! We needed this very much! Can't wait for #49 🎉

@simonprovost simonprovost merged commit 647f5b3 into main May 6, 2025
7 checks passed
@simonprovost simonprovost deleted the load_data_huggingface branch May 6, 2025 21:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants