Skip to content

Towards Modin 1.0

Devin Petersohn edited this page Feb 11, 2022 · 13 revisions

This page contains the details of the criteria by which we will decide and declare that Modin is stable enough for a 1.0 release.

Feature requirements

  • Backend agnostic Execution API
    • Supported on the backends: Ray, Dask, Python Multiprocessing
  • Data-manipulation front-end API for DSLs
    • Supported on the DSLs: pandas, SQL
    • (mostly) full pandas API support
  • Dataframe algebra translation layer
  • Query execution both in eager and lazy modes
  • Data partitioning and placement API
    • Partition protocol, high level API for partitioning and placement
  • Fault tolerance and error handling
    • Logging
    • Fault tolerance for a node or worker failures

Performance

  • Reasonable performance on common operators
  • Not slower than pandas on medium dataset sizes (500MB)

Non-functional requirements

  • Microbenchmarks, end-to-end benchmarks, and dashboard
  • Full documentation both for programmers and users
  • Code complying to the adopted style-guide and docstring format
  • Dataframe exchange protocol
Clone this wiki locally