-
Notifications
You must be signed in to change notification settings - Fork 4
Description
This error occurs very intermittently. The test_multiprocess.py can run in a loop 30 to 40 times before the error appears. It is not related to multiprocessing, as it also happens when multiprocessing is set to false. I have observed the issue with both the filedriver and s3driver.
It is most likely a race condition where multiple threads are either attempting to read cached data simultaneously or corrupting the cached data.
Reading with filedriver, multiProcess=False
1738763781.621584 WARNING [h5promise.py: 43] H5Coro encountered error reading gt1l/heights/dist_ph_across: invalid header continuation signature: 0xc1b57ce5
Reading with filedriver, multiProcess=True
1738763621.298881 WARNING [h5promise.py: 43] H5Coro encountered error reading gt1l/heights/ph_id_pulse: invalid reserved fields: 134,56049
Reading with retries (retry code was added for testing) shows that each retried read failed with the same value which would point to corrupt cached data.
Reading with filedriver, multiProcess=True
1738772036.277421 WARNING [h5promise.py: 46] Retrying dataset read for gt1l/heights/signal_conf_ph (attempt 1): invalid reserved fields: 123,20894
1738772036.278032 WARNING [h5promise.py: 46] Retrying dataset read for gt1l/heights/signal_conf_ph (attempt 2): invalid reserved fields: 123,20894
1738772036.278143 WARNING [h5promise.py: 48] H5Coro encountered error reading gt1l/heights/signal_conf_ph: invalid reserved fields: 123,20894