Unsafe method: SecureRandom.urlsafe_base64

Due to a double encoding error in `SecureRandom.urlsafe_base64` that method is unsafe. The fix is easy and posted by @cheerfulstoic in PR #13.

Description

This issue is more pernicious than a simple inefficiency—it is a major bug that effectively breaks the implicit contract of random ID generation. `Base.encode64/1` outputs a Base64 string from the source entropy bytes. That's fine. But that output is piped into `Base.url_encode64/1` that also "expects" raw random bytes (in this context), and that is not what it receives. As a result, the generated IDs are not only longer than expected, but their characters are no longer evenly distributed. This has serious implications for both the Shannon entropy of the IDs and the implicit assumption that the IDs themselves are uniformly distributed.

Shannon entropy is maximized when each possible character in a sequence is equally probable. However, passing the already Base64-encoded output of `Base.encode64/1` into `Base.url_encode64/1` skews those probabilities. For example, among the 64 possible Base64 characters (65 if you count the padding character =, which we can ignore here), some characters such as 'f', '7', '-', and '_' never appear, while others occur with vastly different frequencies (e.g., '0' occurs about 29 times more often than '8'). This happens because the output of `Base.encode64/1` is not truly random—its bit patterns reflect the encoding scheme. Consequently, `Base.url_encode64/1` never encounters certain bit patterns and over-encounters others, producing highly uneven character distributions. The result is that calculating the actual entropy of the generated IDs is tedious at best (I didn't attempt it), but it is certainly less than optimal.

Character distribution issues compound the problem. As noted, four characters never appear at all, and others occur with varying frequency. But it gets worse: suppose a consumer of this method expected an even distribution—for example, partitioning data by the first character of the ID. Because of the double-encoding bug, IDs can only ever start with 19 of the 64 possible characters. That means roughly 70% of the partitions would remain empty.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Unsafe method: SecureRandom.urlsafe_base64 #16

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Unsafe method: SecureRandom.urlsafe_base64 #16

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions