Skip to content

Conversation

rabernat
Copy link
Contributor

While working on #2031 I became familiar with the new V3 Codec API and its peculiarities. And I saw that we don't yet have actual unit tests for the codecs. We have some tests in tests/v3/test_codecs/, but I'd call these more end-to-end tests, since they are creating Arrays.

I think it's important for us to unit-test al of the important internal interfaces separately from end-to-end tests. This is particularly important for codecs, so we can guard against data corruption issues.

This PR is a step in that direction.

TODO:

  • Add decode_partial and encode_partial tests.
  • Parametrize more variation of input data
  • Look for opportunities to make these tests simpler / faster (right now there is a combinatorial explosion of possibilities)

@rabernat
Copy link
Contributor Author

rabernat commented Jul 14, 2024

One area of feedback on the Codec API: it makes very little sense to me that the Codec API is async. Almost by definition Codecs are blocking, CPU-intensive code. They are not doing I/O. Why should their core methods be async?

It should be the Pipeline's job to dispatch blocking Codecs calls to threads. Not the Codec itself.

@d-v-b
Copy link
Contributor

d-v-b commented Jul 20, 2024

One area of feedback on the Codec API: it makes very little sense to me that the Codec API is async. Almost by definition Codecs are blocking, CPU-intensive code. They are not doing I/O. Why should their core methods be async?

It should be the Pipeline's job to dispatch blocking Codecs calls to threads. Not the Codec itself.

The reason why all codecs need to be async is because sharding is a codec, and the encode / decode operation of the sharding codec requires doing IO.

I would love to see some formal separation between "codecs that read from storage" (i.e., just sharding) and "codecs that transform bytes in memory" (all the other codecs), but I'm not sure what this would look like.

@jhamman jhamman added the V3 label Aug 9, 2024
@jhamman jhamman added this to the After 3.0.0 milestone Oct 1, 2024
Copy link
Contributor

@d-v-b d-v-b left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I love tests! thanks for this @rabernat

@d-v-b
Copy link
Contributor

d-v-b commented Oct 10, 2024

oops, I approved without noting that this is a draft. sorry for the noise. consider it a draft approval.

@jhamman jhamman changed the base branch from v3 to main October 14, 2024 20:57
@dstansby dstansby removed the V3 label Dec 12, 2024
@dstansby dstansby added the needs release notes Automatically applied to PRs which haven't added release notes label Jan 9, 2025
@dstansby dstansby removed this from the After 3.0.0 milestone May 14, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

needs release notes Automatically applied to PRs which haven't added release notes

Projects

Status: Todo

Development

Successfully merging this pull request may close these issues.

4 participants