Skip to content

Conversation

@jhamman
Copy link
Member

@jhamman jhamman commented Dec 29, 2024

This PR updates Zarr's documentation to include a new multi-page user-guide, in-place of the prior single page tutorial.

Notes for reviewers:

  • This was split off from Doc updates for 3.0 release #2568
  • Most content remains unchanged from the tutorial, despite the large diff line count. Some new sections were added in 2568. The only net new section added was config.rst.
  • Examples now use the Ipython sphinx directive

closes #1767

Copy link
Contributor

@dstansby dstansby left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Liking the new sections, makes it much easier to read than the old tutorial 👍

A couple of major questions:

The "copying/migrating data" tutorial section wasn't migrated from the old tutorial, is this intentional?

For IPython, what was the motivation for switching to this from doctests? As it stands it has a few major drawbacks:

  1. Trying to copy/paste a block of code doesn't work, because it selects the Out[...] text too:
    Screenshot 2024-12-30 at 10 07 54
  2. The expected output isn't embedded any more. This gave us another layer of regression tests when we ran the tutorial as a doc test, and on top of making sure the code runs, made sure the output was as expected.
  3. I find having a line break between each line of code makes it much less readable - old vs new:
    Screenshot 2024-12-30 at 10 52 17
    Screenshot 2024-12-30 at 10 51 24

Is it possible to fix these with new IPython plugin?

Finally, what's the plan for all the TODOs (especially the numcodecs and Sharding bits, and configuring blosc)? To do them in another PR before 3.0? If so would be good to open issues for these so they're not lost and released as TODOs in 3.0

@jhamman
Copy link
Member Author

jhamman commented Jan 1, 2025

Thanks for the review @dstansby!

Liking the new sections, makes it much easier to read than the old tutorial 👍

A couple of major questions:

The "copying/migrating data" tutorial section wasn't migrated from the old tutorial, is this intentional?

We have not implemented copy, copy_all, and copy_store yet. I'll add these to the list of TODOs in the migration guide if they are not there already.

For IPython, what was the motivation for switching to this from doctests?

For one, we weren't using the doc tests! Beyond that, this isn't something I feel super strong about. We've had a lot of success using the Ipython directive in the Xarray project.

As it stands it has a few major drawbacks:

  1. Trying to copy/paste a block of code doesn't work, because it selects the Out[...] text too:

Not true if you use the copy button, example below:

z1[:] = 42

z1[0, :] = np.arange(10000)

z1[:, 0] = np.arange(10000)

I think there is probably a way to further configure the style.

  1. The expected output isn't embedded any more. This gave us another layer of regression tests when we ran the tutorial as a doc test, and on top of making sure the code runs, made sure the output was as expected.

We can insert assertions in hidden cells as desired. This would achieve the same thing.

  1. I find having a line break between each line of code makes it much less readable - old vs new:

Is it possible to fix these with new IPython plugin?

Likely yes. Can we iterate on the styling after this first PR goes in?

Finally, what's the plan for all the TODOs (especially the numcodecs and Sharding bits, and configuring blosc)? To do them in another PR before 3.0? If so would be good to open issues for these so they're not lost and released as TODOs in 3.0

I'm waiting #2463. I have a branch combining these but I'll need to start it over now that I've split up the docs refactor. So yes, let's do them in another PR.

@dstansby
Copy link
Contributor

dstansby commented Jan 1, 2025

Trying to copy/paste a block of code doesn't work, because it selects the Out[...] text too:

Not true if you use the copy button

This isn't a solution if you want to select two lines of code from a longer block.

This gave us another layer of regression tests when we ran the tutorial as a doc test, and on top of making sure the code runs, made sure the output was as expected.

We can insert assertions in hidden cells as desired. This would achieve the same thing.

But with a much higher maintenance burden I think? We would have to manually concoct a hidden assertion for every code block in this case, whereas with the doctests we can just generate all the output automatically.

So given there don't seem to be any benefits to using the IPython plugin, but there still seem to be drawbacks, I think we should keep existing code block style for now.

re. the TODOs, I'm a bit tired to do it now but tomorrow I can open issues to keep track of them (if you haven't already).

@jhamman
Copy link
Member Author

jhamman commented Jan 1, 2025

My view is that the most important thing is to have documentation that actually runs! I'm happy to hand this off to someone with sufficient experience with doctests (maybe @dstansby) but that isn't me. If no one is volunteering to do it right now, I'm going to press forward with this solution and we can revisit when someone else has time/energy for another doctest effort. I'll also note that these aren't mutually exclusive concepts (reading up on the ipython extension is probably worthwhile).

@dstansby
Copy link
Contributor

dstansby commented Jan 2, 2025

In order to keep this moving, lets drop the switch to using the IPython plugin, since that shouldn't block a v3 release.

Copy link
Contributor

@dstansby dstansby left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Marking as request changes to revert the IPython change. @jhamman I'd be happy to change the code snippets in code back, but don't want to push to your PR without permission - let me know either way.

@jhamman
Copy link
Member Author

jhamman commented Jan 2, 2025

If you can update the code snippets today, feel free to push to this branch open a PR (this branch is in the zarr-python repo). I'm all checked in. Like I said above, I'm not planning on reverting the change myself though.

* docs: add new top level about page

* fixup

* fixup

* fixup
@dstansby
Copy link
Contributor

dstansby commented Jan 2, 2025

👍 I can take a look. I think it's a reasonable ask to revert the change though, and not merge before it's reverted.

@dstansby dstansby mentioned this pull request Jan 2, 2025
* docs: add docs on extending zarr 3

* Apply suggestions from code review

Co-authored-by: David Stansby <[email protected]>

* move note up

* remove test.py (#2612)

* Note that whole directories can be deleted in LocalStore (#2606)

* fix: run-coverage command now tracks src directory (#2615)

* fix doc build

* Update docs/user-guide/extending.rst

---------

Co-authored-by: Norman Rzepka <[email protected]>
Co-authored-by: David Stansby <[email protected]>
Co-authored-by: Davis Bennett <[email protected]>
@normanrz
Copy link
Member

normanrz commented Jan 2, 2025

I think it's a reasonable ask to revert the change though, and not merge before it's reverted.

To be honest, I think it is more important to merge this PR quickly because of the dependent PRs. I feel this is blocking our progress right now. We can change away from IPython at any later point in time.

@d-v-b
Copy link
Contributor

d-v-b commented Jan 2, 2025

I think it's a reasonable ask to revert the change though, and not merge before it's reverted.

To be honest, I think it is more important to merge this PR quickly because of the dependent PRs. I feel this is blocking our progress right now. We can change away from IPython at any later point in time.

+1 to this, lets get this PR in and sort out doctest / ipython later

@dcherian
Copy link
Contributor

dcherian commented Jan 2, 2025

I concur. Working doc examples are very valuable!

@dstansby
Copy link
Contributor

dstansby commented Jan 2, 2025

I'm sorry, but I really want to insist we don't merge this with IPython in it, because it's a change that has demonstratable downsides. On a point of principle, just opening a large PR shouldn't come with an assumption that it's going to get merged without engaging with review. On a practical point, I spent about an hour sorting this out at #2623 today. So 🙏 can folks review and merge #2623 into this PR first.

@d-v-b
Copy link
Contributor

d-v-b commented Jan 2, 2025

the status quo which is doctests, to avoid holding up v3.

admittedly I'm late to this discussion, but my impression was that the status quo was that the doctests were not actually getting run, resulting in stale example code in the docs. I don't think anyone wants that.

So the next question is, how can we ensure that the docs contain real examples? Either we make the doctests work, or we use another strategy, e.g. running code when we build the docs (the ipython directive). Given the current context, we should do whichever of these two is most expedient, in the interest of moving things forward -- the most important thing is that the generated docs are correct. This allows everyone to hack on the docs productively. IMO, the only thing that should block that part of this PR is a functionally equivalent solution to the problem solved with the ipython directive. Does that make sense?

@dstansby
Copy link
Contributor

dstansby commented Jan 2, 2025

admittedly I'm late to this discussion, but my impression was that the status quo was that the doctests were not actually getting run, resulting in stale example code in the docs. I don't think anyone wants that.

👍 , which is why I fixed the output in #2623...

@jhamman
Copy link
Member Author

jhamman commented Jan 2, 2025

Let's deescalate this a bit.

If we can't get the doctests pr passing with ci today, we can merge this as is (with ipython) and go back to doctests as soon as it's ready.

I'm hearing general agreement that this PR is directionally aligned with where we want to go. Let's continue to iterate but in a non blocking way.

@dstansby - I know this isn't your ideal but I think it's what is best for the project overall.

@dstansby
Copy link
Contributor

dstansby commented Jan 2, 2025

Okay, quick update, I'm trialling enabling doctests at #2626, which I think I have working. If it works, that PR should fail, and then I'll close and merge into #2623. Sorry it's a bit complex, but avoids me messing around with GH actions config in #2623 to get the tests running.

* Use doctests for arrays.rst

* Use doctests for attributes.rst

* Use doctests for config.rst

* Use doctests for consolidated metadata

* Use doctests for groups.rst

* Use doctests for preformance.rst

* Use doctests for storage.rst

* Remove ipython config for docs

* Fix performance doctest output

* Enable doctests
@dstansby dstansby dismissed their stale review January 2, 2025 23:19

Changes addressed

@dstansby
Copy link
Contributor

dstansby commented Jan 2, 2025

Quick update, I've:

  • added a new doctests CI run (was much simpler than running doctests in all the existing test runs)
  • added a new hatch environment that can be used to run the doctests (hatch run doctest:run) or automatically fix the output if it changes (hatch run doctest:fix)

Some of the doctests need fixing because of #2463 being merged, but hopefully the new CI job and hatch env make that a lot easier for someone else to do.

Copy link
Contributor

@dstansby dstansby left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doctests need fixing - otherwise I think this is 👍

@normanrz
Copy link
Member

normanrz commented Jan 3, 2025

Doctests need fixing - otherwise I think this is 👍

Working on it.

@normanrz normanrz requested a review from dstansby January 3, 2025 14:15
Copy link
Contributor

@dstansby dstansby left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 🚢 - a couple of optional comments. I haven't reviewed any of the actual content this time.


@pytest.mark.parametrize("root_name", [None, "root"])
def test_tree(root_name: Any) -> None:
os.environ["OVERRIDE_COLOR_SYSTEM"] = "truecolor"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just out of interest, why's this needed?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

test_tree.py expects the tree to be rendered with the control sequences for text formatting (e.g. bold font). However, by default rich doesn't include that when running within pytest, because it detects that it is not running in a real terminal as the output is captured by pytest. This overrides the behavior in rich. This issue only surfaced now, because I added rich to the test dependencies and therefore test_tree.py wasn't skipped anymore.
https://github.com/zarr-developers/zarr-python/blob/main/tests/test_tree.py#L31-L54

@dstansby dstansby merged commit d6384f5 into main Jan 3, 2025
33 checks passed
@dstansby dstansby deleted the docs/3.0-user-guide branch January 3, 2025 14:31
@normanrz normanrz mentioned this pull request Jan 3, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

documentation Improvements to the documentation

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

[v3] Documentation: update tutorial docs

5 participants