Skip to content

Commit 8bf5bb8

Browse files
committed
Update base for Update on "[XNNPACK][Weights Cache] Use sha256 hash of bytes instead of tensor name"
In production use cases, I've become increasingly afraid of the Weights Cache managing weights across multiple models and the potential for collisions on names. Names like "encoder.layer.weight1" are popular names for encoder models, and that name may be reused across many different models. In reality such a tensor found in different models will be different. A way to alleviate such concerns around collisions is to provide a strong hashing guarantee around the tensor's bytes. Namely if we use the sha256 hash of the tensor bytes as the named key we would have much stronger guarantees around the potential of collisions between weights. Additionally this can provide stronger weight deduplication guarantees. For now we use the named key as the only method for deduplicating weights, but if the underlying bytes are the same but the keys are different we won't be able to deduplicate. Using a hash on the underlying bytes as a key though would help with this (though how likely this happens remains to be seen). Regardless i think hashing the bytes will be much safer in the long-term. The draw back is that this adds a guaranteed 64 bytes per weight. On smaller models this might amount to a bit. Open to discuss on whether other hashing algorithms might provide tolerable collision guarantees like: md5_hash. Differential Revision: [D71212509](https://our.internmc.facebook.com/intern/diff/D71212509/) [ghstack-poisoned]
2 parents 2522789 + 5a5fab7 commit 8bf5bb8

File tree

58 files changed

+949
-179
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

58 files changed

+949
-179
lines changed

.ci/scripts/gather_benchmark_configs.py

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -263,7 +263,8 @@ def is_valid_huggingface_model_id(model_name: str) -> bool:
263263
def get_benchmark_configs() -> Dict[str, Dict]: # noqa: C901
264264
"""
265265
Gather benchmark configurations for a given set of models on the target operating system and devices.
266-
266+
CHANGE IF this function's return changed:
267+
extract_model_info() in executorch/.github/scripts/extract_benchmark_results.py IF YOU CHANGE THE RESULT OF THIS FUNCTION.
267268
Args:
268269
None
269270

0 commit comments

Comments
 (0)