-
Notifications
You must be signed in to change notification settings - Fork 32
Description
## Proposal: Unification of Unique Identifier Generation
CC: @sanderegg @matusdrobuliak66 @giancarloromeo @GitHK
We currently have multiple methods across the codebase for generating unique identifiers. To improve consistency and flexibility, I propose unifying this functionality under servicelib.identifiers_utils. The main objectives are:
-
Generate Context-Specific Unique Name Identifiers:
- The ability to generate unique identifiers based on different contexts/scopes, such as globally unique, unique within a project, process, hostname, or cluster.
- The context can be passed as discriminators that will define the scope of uniqueness.
-
Standardized Identifier Formats:
- Provide support for generating both standard UUIDs and human-readable identifiers with optional prefixes.
- UUIDs should follow a standard format like
uuid4for general uniqueness oruuid5(namespace-based) for deterministic IDs based on specific discriminators. - Human-readable identifiers should support optional prefixes (e.g.,
pay_123456124for payment identifiers).
Example Implementation:
import hashlib
import time
import uuid
import socket
from models_library.basic_types import IdStr
def short_sha256(input_string: str, length: int = 8) -> IdStr:
"""Generates a truncated SHA-256 hash of the input string."""
sha_signature = hashlib.sha256(input_string.encode()).hexdigest()
return IdStr(sha_signature[:length])
def generate_name_identifier(*discriminators, prefix: str | None = None, length: int = 8) -> IdStr:
"""
Generates a unique identifier based on the provided discriminators (e.g., project name, hostname).
Optionally includes a human-readable prefix and truncates the identifier to the desired length.
"""
idr = short_sha256("/".join(map(str, discriminators)), length=length)
if prefix:
idr = f"{prefix}_{idr}"
return idr
def generate_uuid(*discriminators, base_uuid: uuid.UUID | None = None) -> uuid.UUID:
"""
Generates a UUID based on the provided discriminators.
Uses uuid5 for namespace-based determinism.
"""
if not base_uuid:
base_uuid = uuid.uuid4()
return uuid.uuid5(base_uuid, "/".join(map(str, discriminators)))
# Example usage
def get_rabbitmq_client_unique_name(prefix: str) -> IdStr:
"""
Generates a unique RabbitMQ client name based on the hostname and current time,
with an optional prefix.
"""
hostname = socket.gethostname()
return generate_name_identifier(time.time(), hostname, prefix=prefix, length=8)Key Points and Improvements:
-
Contextual Uniqueness: The
generate_name_identifierfunction allows you to pass any relevant context (e.g., hostname, project, or process) to ensure uniqueness within the intended scope. -
Prefix Support: Human-readable prefixes can be added to identifiers for better clarity and debugging, such as
pay_for payment identifiers oruser_for user-related identifiers. -
Shortened SHA-256 Identifiers: For identifiers that require truncation, we use a shortened SHA-256 hash, which can be configured via the
lengthparameter to balance between uniqueness and brevity. However, consider using longer truncations if there are concerns about collisions in large systems. -
UUID Generation: For cases requiring globally unique or deterministic identifiers, the
generate_uuidfunction usesuuid5for generating namespace-based UUIDs (preferred overuuid3due to its stronger cryptographic properties). -
Flexibility: Both
generate_name_identifierandgenerate_uuidfunctions are flexible, allowing users to define how discriminators affect uniqueness within their system.
Next Steps:
- We can further extend this utility by allowing specific discriminators, such as user ID or session ID, for more fine-grained uniqueness when necessary.
- Additional formats (e.g., base62 encoding for compact identifiers) can be considered if we find a need to reduce the length of identifiers without sacrificing uniqueness.
Originally posted by @pcrespov in #6365 (comment)