Skip to content

Conversation

@jhamman
Copy link
Member

@jhamman jhamman commented Dec 29, 2024

This PR adds a page describing the migration process for users upgrading to zarr 3.0.

Notes for reviewers:

Closes #1798

@jhamman jhamman added the documentation Improvements to the documentation label Dec 29, 2024
@jhamman jhamman added this to the 3.0.0 milestone Dec 30, 2024
Copy link
Contributor

@dstansby dstansby left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a good start, but I think needs some major improvements and additions. In particular:

  • Removing the "getting ready" section, since there won't be a decent period of time between these docs existing and the final release for folks to prepare for v3.
  • Being more explicit about what's change and how to adapt existing code (I left inline comments for this)
  • Listing where keyword only arguments have been introduced (e.g., I noticed these in zarr.create), as this is a major breaking change where users will have to update code

I also compared the v2 and v3 API, and there's some major additions that need making to this guide so users can migrate:

  • Explaining how to adapt code that previously used zarr.hierarchy
  • Explaining how to migrate code that uses functions in zarr.storage that no longer exist in v3
  • Explaining what the alternative to zarr.n5 is
  • Explaining how to update code that imports from zarr.codecs
  • Explaining how to update code that uses zarr.attrs
  • Explaining how to update code that uses zarr.sync

Comment on lines 176 to 177
* :func:`zarr.copy`
* :func:`zarr.copy_all`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

According to #2407 (comment), copy and copy_all aren't going to be ported at all?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, we haven't taken any action there yet though.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the plan to eventually implement copy and copy_all, or not? Currently this PR implies yes, but #2407 (comment) implies no.

@d-v-b
Copy link
Contributor

d-v-b commented Jan 1, 2025

In addition to a migration guide, what do people think about a "zarr v2 vs zarr v3" section that just describes how the semantics of v2 and v3 arrays are different? Not sure if that's in scope for this PR, or if it should be a separate effort, but it seems like the content is highly overlapping. I think the migration guide could link to the "v2 vs v3 concepts" section as needed when people have questions like "what's the logic behind with dimension separator handling zarr-python 3.xx"

@jhamman
Copy link
Member Author

jhamman commented Jan 2, 2025

I'm now reasonably happy with this as a first cut and suggest we do what we can to get this online ASAP. There are gaps that can be filled in as we go but this is much better than the current alternative (no migration guide).


to @d-v-b's #2596 (comment), let's add a zarr v2 vs v3 page elsewhere. I will not try to squeeze that into this PR.

to @dstansby's comment about specific migration paths for stuff in zarr.hierarchy, zarr.attrs, etc. I believe the statement in the docs reflects our position:

The following internal modules are being removed or significantly changed. If your application relies on imports from any of the below modules, you will need to either a) modify your application to no longer rely on these imports or b) vendor the parts of the specific modules that you need.

I'm open to specific suggestions for how we could improve this wording.

Configuration
~~~~~~~~~~~~~

There is a new configuration system based on `donfig <https://github.com/pytroll/donfig>`_,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah I didn't realise - in that case, this should be expanded to explain how users migrate from the old zarr.config system to the new system.

Comment on lines +202 to +205
* :func:`zarr.copy` (:issue:`2407`)
* :func:`zarr.copy_all` (:issue:`2407`)
* :func:`zarr.copy_store` (:issue:`2407`)
* :func:`zarr.Group.move` (:issue:`2108`)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a bit confusing that these say they're not ported, but are linked to functions in the API docs (which would imply they are implemented). Do they need removing from the API docs?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All of these exist as stubs in the library but currently raise a NotImplementedError

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we remove them from the API docs then, so the docs don't give a misleading impression they exist?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No opinion on whether to keep them in the API docs.

Comment on lines 176 to 177
* :func:`zarr.copy`
* :func:`zarr.copy_all`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the plan to eventually implement copy and copy_all, or not? Currently this PR implies yes, but #2407 (comment) implies no.

Copy link
Contributor

@dstansby dstansby left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🙌 this is looking great! I left another round of review comments, and a couple of more general things:

  • Near the top, language should be changed from (e.g.) "will be" to "has been", since these are changes that have now happened.
  • Since zarr.sync has gone with no replacement, there should be a short section on how to do process and thread syncrhonization in v3 (if it is possible - if not, just a short bit explaining it's not possible should be added)

Configuration
~~~~~~~~~~~~~

There is a new configuration system based on `donfig <https://github.com/pytroll/donfig>`_,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there an example of code that someone would write in v2, which they now have to use the configuration system for instead? I think should be the measure of if something should go in the migration guide or not.

Comment on lines +202 to +205
* :func:`zarr.copy` (:issue:`2407`)
* :func:`zarr.copy_all` (:issue:`2407`)
* :func:`zarr.copy_store` (:issue:`2407`)
* :func:`zarr.Group.move` (:issue:`2108`)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we remove them from the API docs then, so the docs don't give a misleading impression they exist?

Base automatically changed from docs/3.0-user-guide to main January 3, 2025 14:31
@jhamman
Copy link
Member Author

jhamman commented Jan 3, 2025

I'm going to wrap this up today. Fixing conflicts and resolving the final comments.

@jhamman jhamman requested a review from dstansby January 3, 2025 17:26
Copy link
Contributor

@dstansby dstansby left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment on lines +23 to +27
The goals described above necessitated some breaking changes to the API (hence the
major version update), but where possible we have maintained backwards compatibility
in the most widely used parts of the API. This in the :class:`zarr.Array` and
:class:`zarr.Group` classes and the "top-level API" (e.g. :func:`zarr.open_array` and
:func:`zarr.open_group`).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
The goals described above necessitated some breaking changes to the API (hence the
major version update), but where possible we have maintained backwards compatibility
in the most widely used parts of the API. This in the :class:`zarr.Array` and
:class:`zarr.Group` classes and the "top-level API" (e.g. :func:`zarr.open_array` and
:func:`zarr.open_group`).
The goals described above necessitated some breaking changes to the API (hence the
major version update), but where possible we have maintained backwards compatibility
with functions in the top-level ``zarr`` namespace (e.g. :func:`zarr.open_array` and :func:`zarr.open_group`).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have also attempted to maintain backward compatibility for the Group/Array classes. Neither the top level API or the Array/Group classes hit 100% backward compatibility.

@jhamman jhamman requested a review from dstansby January 3, 2025 20:42
Copy link
Contributor

@dcherian dcherian left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Didn't verify the correctness, just reviewed the text.

@jhamman jhamman dismissed dstansby’s stale review January 3, 2025 21:27

responded to all outstanding comments

@jhamman jhamman enabled auto-merge (squash) January 3, 2025 21:28
@jhamman jhamman merged commit eef18f3 into main Jan 3, 2025
33 checks passed
@jhamman jhamman deleted the docs/3.0-migration branch January 4, 2025 01:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

documentation Improvements to the documentation

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

[V3] v2 -> v3 data migration

4 participants