Skip to content

Commit b35f588

Browse files
rhdedgariamemilio
authored andcommitted
feat: update qdrant hash function from SHA-1 to SHA-256 (llamastack#3477)
# What does this PR do? Updates the qdrant provider's convert_id function to use a FIPS-validated cryptographic hashing function, so that llama-stack is considered to be `Designed for FIPS`. The standard library `uuid.uuid5()` function uses SHA-1 under the hood, which is not FIPS-validated. This commit uses an approach similar to the one merged in llamastack#3423. Closes llamastack#3476. ## Test Plan Unit tests from scripts/unit-tests.sh were ran to verify that the tests pass. A small test script can display the data flow: ```python import hashlib import uuid # Input _id = "chunk_abc123" print(_id) # Step 1: Format and encode hash_input = f"qdrant_id:{_id}".encode() print(hash_input) # Result: b'qdrant_id:chunk_abc123' # Step 2: SHA-256 hash sha256_hash = hashlib.sha256(hash_input).hexdigest() print(sha256_hash) # Result: "184893a6eafeaac487cb9166351e8625b994d50f3456d8bc6cea32a014a27151" # Step 3: Create UUID from first 32 chars uuid_string = str(uuid.UUID(sha256_hash[:32])) print(uuid_string) # sha256_hash[:32] = "184893a6eafeaac487cb9166351e8625" # Final result: "184893a6-eafe-aac4-87cb-9166351e8625" ``` Signed-off-by: Doug Edgar <[email protected]>
1 parent c899938 commit b35f588

File tree

1 file changed

+6
-2
lines changed
  • llama_stack/providers/remote/vector_io/qdrant

1 file changed

+6
-2
lines changed

llama_stack/providers/remote/vector_io/qdrant/qdrant.py

Lines changed: 6 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -5,6 +5,7 @@
55
# the root directory of this source tree.
66

77
import asyncio
8+
import hashlib
89
import uuid
910
from typing import Any
1011

@@ -49,10 +50,13 @@ def convert_id(_id: str) -> str:
4950
Converts any string into a UUID string based on a seed.
5051
5152
Qdrant accepts UUID strings and unsigned integers as point ID.
52-
We use a seed to convert each string into a UUID string deterministically.
53+
We use a SHA-256 hash to convert each string into a UUID string deterministically.
5354
This allows us to overwrite the same point with the original ID.
5455
"""
55-
return str(uuid.uuid5(uuid.NAMESPACE_DNS, _id))
56+
hash_input = f"qdrant_id:{_id}".encode()
57+
sha256_hash = hashlib.sha256(hash_input).hexdigest()
58+
# Use the first 32 characters to create a valid UUID
59+
return str(uuid.UUID(sha256_hash[:32]))
5660

5761

5862
class QdrantIndex(EmbeddingIndex):

0 commit comments

Comments
 (0)