-
Notifications
You must be signed in to change notification settings - Fork 525
Description
Issue Description
The codebase uses UUID-based primary keys for 18 database models, but there is an inconsistency in how UUIDs are converted to strings. This creates two different representations that are incompatible with each other.
UUID String Format Inconsistency
Format 1: Hyphenated UUID (str(uuid.uuid4()))
Storage: 36 characters including hyphens (e.g., "550e8400-e29b-41d4-a716-446655440000")
Classes using this format:
Role(db.py:224)UserRole(db.py:266)PendingUserApproval(db.py:1276)RegisteredOAuthClient(db.py:2829)EmailApiToken(db.py:2902)SSOAuthSession(db.py:3207)
Format 2: Hexadecimal UUID (uuid.uuid4().hex)
Storage: 32 characters, no hyphens (e.g., "550e8400e29b41d4a716446655440000")
Classes using this format:
EmailTeam(db.py:834)EmailTeamMember(db.py:943)EmailTeamMemberHistory(db.py:1009)EmailTeamInvitation(db.py:1078)EmailTeamJoinRequest(db.py:1181)Tool(db.py:1565)Server(db.py:2275)Gateway(db.py:2431)A2AAgent(db.py:2555)GrpcService(db.py:2689)OAuthToken(db.py:2780)OAuthState(db.py:2805)
Note: EmailApiToken also has a jti field (db.py:2906) using hyphenated format.
Impact
1. Data Inconsistency
- Two different string representations of the same UUID concept
- Foreign key relationships may expect one format but receive another
- API clients must handle different formats depending on the entity type
2. Storage Overhead
- Hyphenated format: 36 bytes per ID
- Hex format: 32 bytes per ID
- Both formats are significantly larger than a 4-byte int or 8-byte bigint
3. Potential Performance Concerns
While not currently causing issues, using 32-36 byte strings as primary keys may introduce database performance problems:
- Index Size: String-based indexes are much larger than integer indexes
- Int (4 bytes) vs UUID hex (32 bytes) = 8x size increase
- Int (4 bytes) vs UUID hyphenated (36 bytes) = 9x size increase
- Comparison Speed: String comparisons are slower than integer comparisons
- Memory Usage: Larger primary keys increase memory footprint for indexes and joins
- Foreign Key Joins: Each foreign key relationship carries the storage overhead
- B-tree Performance: Larger keys reduce B-tree fan-out, requiring more disk I/O
4. Cross-Database Compatibility
- PostgreSQL has native UUID type (16 bytes) that would be more efficient
- SQLite stores strings as-is, making the overhead more pronounced
- Migration to native UUID type would be complex with two string formats
Recommendations
Option 1: Standardize on Hex Format (Quick Fix)
- Convert all
str(uuid.uuid4())touuid.uuid4().hex - Saves 4 bytes per ID in hyphenated classes
- Maintains consistency across the codebase
- Still retains UUID string overhead
Option 2: Use Native UUID Type (Best for PostgreSQL)
- Use SQLAlchemy's
UUIDcolumn type for PostgreSQL - Falls back to
CHAR(32)orCHAR(36)for SQLite - Reduces storage to 16 bytes in PostgreSQL
- Requires migration script
Option 3: Use Binary UUID Storage (Most Efficient)
- Store UUIDs as 16-byte binary values
- Convert to string only at API boundary
- Most efficient storage and performance
- Requires significant refactoring
Option 4: Migrate to Integer IDs (Maximum Performance)
- Use auto-incrementing BigInt (8 bytes) for high-volume tables
- Reserve UUIDs for distributed/federated scenarios
- Best database performance
- May complicate federation features
Additional Notes
Two classes still use integer auto-increment IDs and are unaffected by this issue:
Resource(db.py:1842):id: Mapped[int]Prompt(db.py:2083):id: Mapped[int]EmailAuthEvent(db.py:696):id: Mapped[int]
Suggested Action
At minimum, standardize the UUID string format across all models to ensure consistency. Consider the performance implications for high-volume tables and evaluate whether native UUID types or integer IDs would be more appropriate for the project's scale and federation requirements.