- 
          
- 
                Notifications
    You must be signed in to change notification settings 
- Fork 311
Allow use of filters that expand chunks by a large factor #5939
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
bit (size of lengths) integer, when using the 2.0 file format.
…to inefficient_compressors
| The failure looks unrelated to this change and may be a transient issue | 
| 
 The MacOS failure is a known issue that happens from time to time | 
        
          
                test/dsets.c
              
                Outdated
          
        
      | } /* end filter_expand2() */ | ||
|  | ||
| /*------------------------------------------------------------------------- | ||
| * Function: test_chunk_expand2 | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like the main case - a filter that expands chunks by a large amount that wasn't previously allowed - isn't tested.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't follow - that's what this does. I verified it failed before I implemented the changes in the library. test_chunk_expand is a preexisting test that tests cases where it's expected to fail, but I didn't want to copy that directly because it would have resulted in files that were too big.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In that case, I think it's just the comment here that's wrong - it says this test is evaluating error handling on expected failure, but the test itself expects every operation to succeed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh sorry I'll fix that
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You could replace the similar macros:
H5D_BT2_COMPUTE_CHUNK_SIZE_LEN
H5D_EARRAY_COMPUTE_CHUNK_SIZE_LEN
H5D_FARRAY_COMPUTE_CHUNK_SIZE_LEN
with a single macro:
#define H5D_COMPUTE_CHUNK_SIZE_LEN(chunk_size_len, idx_info) \
    do { \
        if ((idx_info)->pline->nused > 0) { \
            if ((idx_info)->layout->version > H5O_LAYOUT_VERSION_4) \
                (chunk_size_len) = H5F_SIZEOF_SIZE((idx_info)->f); \
            else { \
                (chunk_size_len) = 1 + ((H5VM_log2_gen((uint64_t)(idx_info)->layout->u.chunk.size) + 8) / 8); \
                if ((chunk_size_len) > 8) \
                    (chunk_size_len) = 8; \
            } \
        } \
        else \
            (chunk_size_len) = 0; \
    } while(0)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The earray and farray ones are necessarily split into two. I can try to unify them on Monday though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm actually inclined to leave it how it is, since this only applies to earray, farray, and btree2; not btree1, single, or none. It would then be inelegant to create a shared macro that only applies to some indices. There is a spot in H5Dchunk.c that does the calculation independently that would ideally be unified with the calculations in the index codes, but the proper way to do that would be to add an index callback to retrieve the chunk size encoding length. We don't really have time for that right now though, and it only applies to obsolete file formats so it's probably not worth it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The 2.0+ file format docs should be updated to replace appropriate mentions of Chunk Size (variable size; at most 8 bytes). This will require updating a couple of byte-layout tables.
| 
 Yes I'm planning to do that, but since IIRC docs changes can go in after the code freeze I'm prioritizing other things right now | 
When using file format 2.0 or above, always encode the size of a filtered chunk using a 64 bit (size of lengths) integer. Chunks are still limited to 2^32 - 1 bytes, but this restriction could be lifted in the future without API or file format changes.
Most of the changes are due to changing
H5D_chk_idx_info_tto contain anH5O_layout_t *instead of anH5O_layout_chunk_t *and anH5O_storage_chunk_t *. This was done so the index clients could access the version of the layout message.Fixes #108
Important
Update chunked dataset file format to use 64-bit encoding for filtered chunk sizes in file format 2.0+, with refactoring and new tests.
2^32 - 1bytes.H5Dbtree.c,H5Dbtree2.c,H5Dearray.c,H5Dfarray.c,H5Dchunk.c, andH5Dint.c.H5D_chk_idx_info_tto useH5O_layout_tinstead of separate layout and storage.test_chunk_expand2()indsets.cto test handling of filters that expand chunks too much.H5Z_FILTER_EXPAND2for testing purposes.This description was created by for 06346d3. You can customize this summary. It will automatically update as commits are pushed.
 for 06346d3. You can customize this summary. It will automatically update as commits are pushed.