Skip to content

Proposal: Unification of Unique Identifier Generation #6376

@GitHK

Description

@GitHK
          ## Proposal: Unification of Unique Identifier Generation

CC: @sanderegg @matusdrobuliak66 @giancarloromeo @GitHK

We currently have multiple methods across the codebase for generating unique identifiers. To improve consistency and flexibility, I propose unifying this functionality under servicelib.identifiers_utils. The main objectives are:

  1. Generate Context-Specific Unique Name Identifiers:

    • The ability to generate unique identifiers based on different contexts/scopes, such as globally unique, unique within a project, process, hostname, or cluster.
    • The context can be passed as discriminators that will define the scope of uniqueness.
  2. Standardized Identifier Formats:

    • Provide support for generating both standard UUIDs and human-readable identifiers with optional prefixes.
    • UUIDs should follow a standard format like uuid4 for general uniqueness or uuid5 (namespace-based) for deterministic IDs based on specific discriminators.
    • Human-readable identifiers should support optional prefixes (e.g., pay_123456124 for payment identifiers).

Example Implementation:

import hashlib
import time
import uuid
import socket
from models_library.basic_types import IdStr

def short_sha256(input_string: str, length: int = 8) -> IdStr:
    """Generates a truncated SHA-256 hash of the input string."""
    sha_signature = hashlib.sha256(input_string.encode()).hexdigest()
    return IdStr(sha_signature[:length])


def generate_name_identifier(*discriminators, prefix: str | None = None, length: int = 8) -> IdStr:
    """
    Generates a unique identifier based on the provided discriminators (e.g., project name, hostname).
    Optionally includes a human-readable prefix and truncates the identifier to the desired length.
    """
    idr = short_sha256("/".join(map(str, discriminators)), length=length)
    if prefix:
        idr = f"{prefix}_{idr}"
    return idr


def generate_uuid(*discriminators, base_uuid: uuid.UUID | None = None) -> uuid.UUID:
    """
    Generates a UUID based on the provided discriminators.
    Uses uuid5 for namespace-based determinism.
    """
    if not base_uuid:
        base_uuid = uuid.uuid4()
    return uuid.uuid5(base_uuid, "/".join(map(str, discriminators)))


# Example usage
def get_rabbitmq_client_unique_name(prefix: str) -> IdStr:
    """
    Generates a unique RabbitMQ client name based on the hostname and current time,
    with an optional prefix.
    """
    hostname = socket.gethostname()
    return generate_name_identifier(time.time(), hostname, prefix=prefix, length=8)

Key Points and Improvements:

  1. Contextual Uniqueness: The generate_name_identifier function allows you to pass any relevant context (e.g., hostname, project, or process) to ensure uniqueness within the intended scope.

  2. Prefix Support: Human-readable prefixes can be added to identifiers for better clarity and debugging, such as pay_ for payment identifiers or user_ for user-related identifiers.

  3. Shortened SHA-256 Identifiers: For identifiers that require truncation, we use a shortened SHA-256 hash, which can be configured via the length parameter to balance between uniqueness and brevity. However, consider using longer truncations if there are concerns about collisions in large systems.

  4. UUID Generation: For cases requiring globally unique or deterministic identifiers, the generate_uuid function uses uuid5 for generating namespace-based UUIDs (preferred over uuid3 due to its stronger cryptographic properties).

  5. Flexibility: Both generate_name_identifier and generate_uuid functions are flexible, allowing users to define how discriminators affect uniqueness within their system.

Next Steps:

  • We can further extend this utility by allowing specific discriminators, such as user ID or session ID, for more fine-grained uniqueness when necessary.
  • Additional formats (e.g., base62 encoding for compact identifiers) can be considered if we find a need to reduce the length of identifiers without sacrificing uniqueness.

Originally posted by @pcrespov in #6365 (comment)

Metadata

Metadata

Labels

t:enhancementImprovement or request on an existing feature

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions