Skip to content

ENH: Persistent and Interoperable Accessors for NDFrame and Index (sliding window POC) #62064

@dangreb

Description

@dangreb

Feature Type

  • Adding new functionality to pandas

  • Changing existing functionality in pandas

  • Removing existing functionality in pandas

Problem Description

Please regard

Took some free time this month for a run of the mill architecture assessment over pandas Python sources, this is delivery.
For health related reasons, I use LLMs redactions to prevent severe prolix on public comms. I'll mark personal remarks in italics.
Engage at will, worry free.

Summary

This issue proposes an enhancement to the internal architecture of accessors in pandas, aiming to support persistent, interoperable accessors that can coexist with and respond to their host NDFrame or Index objects in a lifecycle-aware and low-copy fashion.

Motivation

Accessors in pandas today (e.g., .str, .dt, .cat, .attrs) are ephemeral, created on-demand for every access. While this is lightweight and efficient for simple use cases, it limits their utility in richer modeling contexts, especially for:

  • Persisting contextual or intermediate state;
  • Synchronizing behavior after host transformations (e.g., .copy(), slicing, chaining);
  • Acting as first-class modeling layers inside pandas pipelines.

These limitations are especially relevant in data engineering and ML pipelines, where pandas often loses its centrality once feature computation becomes non-trivial — forcing users to fall back to NumPy, custom classes, or external orchestration layers.

Also beware of safe excess over implementaion of promissing concepts around the topic. Obstructive inclinations drives punctual desertion, often followed by antithetical responses, the well known crux of enhencement frameworks design. Structure that is expressive and interoperable, supports continuity, comprehension, and reuse, granting high reach under few familiar instrument sets should principle approach._

Feature Description

Proposal

Introduce a model for persistent accessors, which:

  • Are instantiated once per host instance and survive as long as the host exists;
    - Lifetime enroll either to the high level Pandas Object, the underlying data suplly, or be a composite of both legs
  • Can respond to critical transformations on the host (e.g., copy, assignment, indexing);
  • Are managed via hooks, weak references, or catalog mechanisms;
  • Remain interoperable with existing pandas workflows;
  • Allow extension via a structured API for third parties.

Advise tripartit roadmap, issuing persistence and minimal lifetime only hooks first, follows immersion for carefull goldilock hooks placement for full managed event propagation design. Finally, design and edify provision infrastructure, availing accessors with outlined read access to source data entities.

This does not require a full architectural rewrite, and can initially be scoped to enable a formal accessor lifecycle with opt-in semantics and soft hooks.

Alternative Solutions

Proof of Concept (POC)

I'm currently prototyping an accessor extension under a sliding window use case. The idea is to offer vertical rolling windows of fixed size, supporting:

  • Multiple successive rolls along axis 0 (depth stacking);
  • Minimal memory footprint (zero-copy across views);
  • "Scoped local views" for stateless metric computation, visual flattening, and full-dimensional window analysis;
  • A lightweight dispatcher for apply-like routines, with hyperparameter support and batch-level threading (GIL-free environments only).

Key implementation details include:

  • Accessors are indexed by a catalog based on weakref.WeakKeyDictionary, using attrs as the sole strong reference;
  • A root hook object (UUID-backed immutable set subclass) is stored in .attrs, which carries the accessor identity;
  • Copy operations trigger deep hooks via custom __deepcopy__, enabling propagation of the accessor along with host duplication;
  • When the host NDFrame is garbage collected, the accessor is finalized via weakref.finalize;
  • NDArray-level hooks are being tested to track mathematical and logical transformations on the underlying data;
  • Implementation is being tested in a dev environment with no-GIL support, built using scientific nightly wheels via Mamba.

This approach is entirely backward-compatible and demonstrates how a structured accessor lifecycle could empower pandas to participate in more advanced modeling scenarios — without becoming a full modeling framework itself.

Additional Context

Use Case Implications

Persistent, interoperable accessors would allow:

  • Declarative pipelines inside pandas (e.g., .fe, .validate, .track, etc.);
  • Advanced feature engineering without leaving NDFrame;
  • Safer data transformation propagation (accessor-aware copies and views);
  • Reuse and encapsulation of logic across modeling contexts.
    _- Architectural equidistance with _

It could also help pandas reclaim some conceptual territory currently dominated by _ honestly interface-wise dead ringer doppelganger marvels__ (e.g., Polars, Modin, cuDF), by moving beyond isomorphic syntax and toward architectural expression.

Metadata

Metadata

Assignees

No one assigned

    Labels

    EnhancementNeeds TriageIssue that has not been reviewed by a pandas team member

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions