Skip to content

Commit 5209bb6

Browse files
remove all Roadmap points pending a PDEP
1 parent 55fc3e4 commit 5209bb6

File tree

1 file changed

+0
-124
lines changed

1 file changed

+0
-124
lines changed

web/pandas/about/roadmap.md

Lines changed: 0 additions & 124 deletions
Original file line numberDiff line numberDiff line change
@@ -35,127 +35,3 @@ For more information about PDEPs, and how to submit one, please refer to
3535

3636
{% endfor %}
3737

38-
## Roadmap points pending a PDEP
39-
40-
<div class="alert alert-warning" role="alert">
41-
pandas is in the process of moving roadmap points to PDEPs (implemented in
42-
August 2022). During the transition, some roadmap points will exist as PDEPs,
43-
while others will exist as sections below.
44-
</div>
45-
46-
### Apache Arrow interoperability
47-
48-
[Apache Arrow](https://arrow.apache.org) is a cross-language development
49-
platform for in-memory data. The Arrow logical types are closely aligned
50-
with typical pandas use cases.
51-
52-
We'd like to provide better-integrated support for Arrow memory and
53-
data types within pandas. This will let us take advantage of its I/O
54-
capabilities and provide for better interoperability with other
55-
languages and libraries using Arrow.
56-
57-
### Decoupling of indexing and internals
58-
59-
The code for getting and setting values in pandas' data structures
60-
needs refactoring. In particular, we must clearly separate code that
61-
converts keys (e.g., the argument to `DataFrame.loc`) to positions from
62-
code that uses these positions to get or set values. This is related to
63-
the proposed BlockManager rewrite. Currently, the BlockManager sometimes
64-
uses label-based, rather than position-based, indexing. We propose that
65-
it should only work with positional indexing, and the translation of
66-
keys to positions should be entirely done at a higher level.
67-
68-
Indexing is a complicated API with many subtleties. This refactor will require care
69-
and attention. The following principles should inspire refactoring of indexing code and
70-
should result on cleaner, simpler, and more performant code.
71-
72-
1. Label indexing must never involve looking in an axis twice for the same label(s).
73-
This implies that any validation step must either:
74-
75-
* limit validation to general features (e.g. dtype/structure of the key/index), or
76-
* reuse the result for the actual indexing.
77-
78-
2. Indexers must never rely on an explicit call to other indexers.
79-
For instance, it is OK to have some internal method of `.loc` call some
80-
internal method of `__getitem__` (or of their common base class),
81-
but never in the code flow of `.loc` should `the_obj[something]` appear.
82-
83-
3. Execution of positional indexing must never involve labels (as currently, sadly, happens).
84-
That is, the code flow of a getter call (or a setter call in which the right hand side is non-indexed)
85-
to `.iloc` should never involve the axes of the object in any way.
86-
87-
4. Indexing must never involve accessing/modifying values (i.e., act on `._data` or `.values`) more than once.
88-
The following steps must hence be clearly decoupled:
89-
90-
* find positions we need to access/modify on each axis
91-
* (if we are accessing) derive the type of object we need to return (dimensionality)
92-
* actually access/modify the values
93-
* (if we are accessing) construct the return object
94-
95-
5. As a corollary to the decoupling between 4.i and 4.iii, any code which deals on how data is stored
96-
(including any combination of handling multiple dtypes, and sparse storage, categoricals, third-party types)
97-
must be independent from code that deals with identifying affected rows/columns,
98-
and take place only once step 4.i is completed.
99-
100-
* In particular, such code should most probably not live in `pandas/core/indexing.py`
101-
* ... and must not depend in any way on the type(s) of axes (e.g. no `MultiIndex` special cases)
102-
103-
6. As a corollary to point 1.i, `Index` (sub)classes must provide separate methods for any desired validity check of label(s) which does not involve actual lookup,
104-
on the one side, and for any required conversion/adaptation/lookup of label(s), on the other.
105-
106-
7. Use of trial and error should be limited, and anyway restricted to catch only exceptions
107-
which are actually expected (typically `KeyError`).
108-
109-
* In particular, code should never (intentionally) raise new exceptions in the `except` portion of a `try... exception`
110-
111-
8. Any code portion which is not specific to setters and getters must be shared,
112-
and when small differences in behavior are expected (e.g. getting with `.loc` raises for
113-
missing labels, setting still doesn't), they can be managed with a specific parameter.
114-
115-
### Numba-accelerated operations
116-
117-
[Numba](https://numba.pydata.org) is a JIT compiler for Python code.
118-
We'd like to provide ways for users to apply their own Numba-jitted
119-
functions where pandas accepts user-defined functions (for example,
120-
`Series.apply`,
121-
`DataFrame.apply`,
122-
`DataFrame.applymap`, and in groupby and
123-
window contexts). This will improve the performance of
124-
user-defined-functions in these operations by staying within compiled
125-
code.
126-
127-
### Documentation improvements
128-
129-
We'd like to improve the content, structure, and presentation of the
130-
pandas documentation. Some specific goals include
131-
132-
- Overhaul the HTML theme with a modern, responsive design
133-
(`15556`)
134-
- Improve the "Getting Started" documentation, designing and writing
135-
learning paths for users different backgrounds (e.g. brand new to
136-
programming, familiar with other languages like R, already familiar
137-
with Python).
138-
- Improve the overall organization of the documentation and specific
139-
subsections of the documentation to make navigation and finding
140-
content easier.
141-
142-
### Performance monitoring
143-
144-
Pandas uses [airspeed velocity](https://asv.readthedocs.io/en/stable/)
145-
to monitor for performance regressions. ASV itself is a fabulous tool,
146-
but requires some additional work to be integrated into an open source
147-
project's workflow.
148-
149-
The [asv-runner](https://github.com/asv-runner) organization, currently
150-
made up of pandas maintainers, provides tools built on top of ASV. We
151-
have a physical machine for running a number of project's benchmarks,
152-
and tools managing the benchmark runs and reporting on results.
153-
154-
We'd like to fund improvements and maintenance of these tools to
155-
156-
- Be more stable. Currently, they're maintained on the nights and
157-
weekends when a maintainer has free time.
158-
- Tune the system for benchmarks to improve stability, following
159-
<https://pyperf.readthedocs.io/en/latest/system.html>
160-
- Build a GitHub bot to request ASV runs *before* a PR is merged.
161-
Currently, the benchmarks are only run nightly.

0 commit comments

Comments
 (0)