-
Notifications
You must be signed in to change notification settings - Fork 1
Open
Description
What does the long term future look like in the Python/PyData landscape (say 2025)? What would be the ideal 'dream' dataframe library? E.g, what are the issues we need to tackle?
For instance, vaex solves most of the 2017 issues mentioned by Wes: https://wesmckinney.com/blog/apache-arrow-pandas-internals/
Also think about:
- Sizes of datasets (e.g rows and/or column counts), compared to current hardware+Moore's Law.
- Kinds of data, more unstructured?
- Expectation on the hardware, more cores, more GPU?
- Distributed vs cloud vs single computer
- API, (e.g. expose laziness or not?)
Are we going in the right direction, also taking into account the convergence/divergence of dataframe libraries?
Metadata
Metadata
Assignees
Labels
No labels