Skip to content

Improve DataFrame docs flow#1526

Open
vandit98 wants to merge 1 commit into
apache:mainfrom
vandit98:vandit98/1397-dataframe-docs-flow
Open

Improve DataFrame docs flow#1526
vandit98 wants to merge 1 commit into
apache:mainfrom
vandit98:vandit98/1397-dataframe-docs-flow

Conversation

@vandit98
Copy link
Copy Markdown

@vandit98 vandit98 commented May 8, 2026

Closes #1397.

Rationale for this change

The DataFrame guide currently mixes the main user flow with lower-level Arrow streaming details, display behavior, and metrics guidance. This makes the page harder to scan for new users who are trying to understand the basic DataFrame lifecycle.

What changes are included in this PR?

  • Adds a short roadmap near the top of the DataFrame guide so readers know where the main overview ends and where specialized topics live.
  • Moves the detailed Arrow streaming / __arrow_c_stream__ content into a dedicated arrow-interface page under the DataFrame section.
  • Replaces the long in-page Arrow section with a compact related-topics section linking to Arrow interface, rendering, and execution metrics pages.
  • Adds the new Arrow interface page to the DataFrame section toctree.

Are there any user-facing changes?

Yes, documentation-only changes. The DataFrame docs should be easier to scan and the Arrow streaming content now has a dedicated page.

Verification performed by Vandit:

  • git diff --check
  • Python sanity check that docs/source/user-guide/dataframe/index.rst links arrow-interface and that docs/source/user-guide/dataframe/arrow-interface.rst exists with the expected heading
  • Attempted Sphinx build in a temporary venv with docs dependencies; it stopped on the existing top-level docs/source/index.rst IPython example because the compiled datafusion package was not installed in that temp venv (ModuleNotFoundError: No module named 'datafusion').

@timsaucer timsaucer mentioned this pull request May 27, 2026
11 tasks
@timsaucer
Copy link
Copy Markdown
Member

This is a good start. You can get around the problem you have building the docs if you have built the repo in your venv.

This looks mostly like moving text around. I asked my agent to take a look and it picked up on a couple of things which I agree with. Mostly I'm thinking about the last point it makes. I was hoping to get a fresh look on what would be an ideal flow to the site. This is a good start though.

--

Flow problems

  1. Two overlapping link lists on same page. Overview adds "More specialized topics live on their own pages" (common-operations, arrow-interface, rendering, execution-metrics). Then near bottom a second "Related Topics" repeats arrow-interface, rendering, execution-metrics. Same 3 links twice. Redundant — reader sees them up top, then again at end. Pick one. Top list also includes common-operations; bottom omits it → inconsistent. Suggest: keep top roadmap brief, make bottom "Related Topics" the detailed one (or vice versa), not both.

  2. New arrow-interface.rst overlaps existing io/arrow.rst. Both document __arrow_c_stream__. New page even links to io/arrow "for additional details." Two pages, same protocol. Pre-existing condition (old index already coexisted with io/arrow), so not a regression — but the reorg was the moment to dedupe, and didn't. Worth a comment.

Does it close #1397?

Partially, but reasonably. Issue: page "jumps around from common operations to Arrow C interface to rendering" + needs a home for execution metrics. PR pulls Arrow deep-dive into own page, adds lifecycle roadmap, links metrics page. Main complaint (Arrow deep-dive breaking the flow) — fixed.

Gap: page still ends with reference-dump — "Core Classes", "Expression Classes", "Built-in Functions" — untouched. That's part of the "all over the place" the issue named. Out of scope is defensible, but #1397 asked for a "fresh look at organization," so a reviewer could push for more. Not a blocker.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Improve online documentation page for DataFrame

2 participants