-
Notifications
You must be signed in to change notification settings - Fork 0
Google Summer of Code
numpy/scipy page to steal things: https://github.com/scipy/scipy/wiki/GSoC-2016-project-ideas
This is the GSoC'16 ideas page for pandas. pandas is a Python package providing fast, flexible, and expressive data structures designed to make working with “relational” or “labeled” data both easy and intuitive. It aims to be the fundamental high-level building block for doing practical, real world data analysis in Python. Additionally, it has the broader goal of becoming the most powerful and flexible open source data analysis / manipulation tool available in any language. It has become a centerpiece of the PyData stack.
This page lists a number of ideas for Google Summer of Code projects for pandas, plus gives some pointers for potential GSoC students on how to get started with contributing and putting together their application.
Pandas participates in GSoC 2016 under the umbrella of Python Software Foundation / NUMFocus.
PSF student guidelines: http://wiki.python.org/moin/SummerOfCode/Expectations
Advice on writing a proposal (written with the Mailman project in mind, but generally applicable)
We expect from students that they're at least comfortable with Python (intermediate level). Some projects may also require Cython or C/C++ skills. Knowing how to use Git is also important; this can be learned before the official start of GSoC if needed though.
Potential candidates should to take a look at the guidelines on how to contribute to pandas, see the documentations here. Making a small enhancement/bugfix/documentation fix/etc (does not need to be related to your proposal) to pandas before applying for the GSoC is a requirement from the PSF; it can help you get some idea how things would work during the GSoC.
Start on your proposal early, post a draft to the mailing list and iterate based on the feedback you receive. This will not only improve the quality of your proposal, but also help you find a suitable mentor.
- implement some missing features in sparse, see here
- intermediate level project
- see issue here
- basically adding a
weights
arg for things like.mean
, etc. - can be a beginner level project, requires python and some cython knowledge
adding input-output connectors & support for to_*
and from_*
for these binary formats (to use existing libraries to actually read/write; this item is for integration/shipping within pandas). requires some knowledge of the outside library and a bit of pandas internals.
- avro support
- parquet support
- BSON?
- construction of a general interface (possibly via a
numba
extension), to allow automatic direction of code tonumba
via.apply
- use Ahead-of-time to generate code for groupby (and other algos), rather than direct templating
- requires some pandas internal knowledge as well as familiarity with
numba
- implement as sub-class of
IntervalIndex
- make a first class extension dtype
- deep knowledge of pandas internals is needed
- support for non-ns dtypes
- deep knowledge of pandas internals is needed. helpful to understand
numpy
dtype mechanisms.
- construct a new dtype to support string operations (moved from
object
) - understanding of
Categorical
is required - deep knowledge of pandas internals is needed. helpful to understand
numpy
dtype mechanisms.
- allow more meta-data to be attached to pandas objects and propogate in the common cases
- flesh out additional features that are needed to fully support
Panel
operations and implement inxarray
. - see if can port some selected operations from
xarray
back to pandas.
- deep knowledge of pandas internals is needed. helpful to understand
numpy
dtype mechanisms.
internal refactor of pandas preserving the user-facing API and the developer API (numpy). requires a deep knowledge of c++ and pandas internals.