Fix: dictionary changed size during iteration in GCSObjectMetadataClient#468
Fix: dictionary changed size during iteration in GCSObjectMetadataClient#468kitagry merged 5 commits intom3dev:masterfrom
Conversation
There was a problem hiding this comment.
Pull Request Overview
This PR fixes a crash in _adjust_gcs_metadata_limit_size by avoiding in-place dict modification during iteration and adds a test to verify metadata trimming.
- Accumulates keys to remove before deleting them to prevent “dictionary changed size during iteration” errors
- Adds a unit test to confirm total metadata size remains within GCS limits
Reviewed Changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.
| File | Description |
|---|---|
| gokart/gcs_obj_metadata_client.py | Change loop to collect and then remove keys |
| test/test_gcs_obj_metadata_client.py | New test ensuring large metadata is truncated |
Comments suppressed due to low confidence (3)
test/test_gcs_obj_metadata_client.py:143
- [nitpick] The test name suggests an expected runtime error but actually checks size truncation. Rename it to something like
test_adjust_gcs_metadata_limit_size_truncates_labelsfor clarity.
def test_adjust_gcs_metadata_limit_size_runtime_error(self):
test/test_gcs_obj_metadata_client.py:151
- Consider adding assertions to verify that specific keys beyond the size limit were removed, not just the total size, for a more robust test.
self.assertLessEqual(total_size, 8 * 1024)
gokart/gcs_obj_metadata_client.py:159
- This method mutates the
labelsdict in-place and returns it. To avoid unexpected side effects, consider returning a new dict or documenting the in-place behavior in the docstring.
def _get_label_size(label_name: str, label_value: str) -> int:
| result = GCSObjectMetadataClient._adjust_gcs_metadata_limit_size(large_labels) | ||
|
|
||
| total_size = sum(len(k.encode('utf-8')) + len(v.encode('utf-8')) for k, v in result.items()) | ||
| self.assertLessEqual(total_size, 8 * 1024) |
There was a problem hiding this comment.
Can you define a constant value in gokart/gcs_obj_metadata_client.py
MAX_GCS_METADATA_SIZE: Final[int] = 8 * 1024and refer here and L157 of gokart/gcs_obj_metadata_client.py
There was a problem hiding this comment.
Thank you for your review. I've fixed it!
|
|
||
| result = GCSObjectMetadataClient._adjust_gcs_metadata_limit_size(large_labels) | ||
|
|
||
| total_size = sum(len(k.encode('utf-8')) + len(v.encode('utf-8')) for k, v in result.items()) |
There was a problem hiding this comment.
When working with non-ASCII characters (such as Japanese etc.), it is necessary because the number of characters and the number of bytes do not match.
There was a problem hiding this comment.
I think so. But, in this test case, is it need?
There was a problem hiding this comment.
I apologize. I misunderstood it as production code. Since this test only needs to verify that no error occurs, it doesn't seem necessary.
|
Thank you! |
The issue was in the _adjust_gcs_metadata_limit_size method where labels
dictionary was being modified while iterating through it. Fixed by:
This prevents the "dictionary changed size during iteration" error that
occurred when large metadata needed to be adjusted to fit GCS size limits.