You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: 62-cell-id/cell-id.md
+18-8Lines changed: 18 additions & 8 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -69,9 +69,9 @@ Relaxing the field to *optional* would lead to undesirable behavior. An optional
69
69
70
70
#### Reason for Character Restrictions (pattern, min/max length)
71
71
72
-
The [RFC 3986 (Uniform Resource Identifier (URI): Generic Syntax)](https://www.ietf.org/rfc/rfc3986.txt) defines the unreserved characters allowed for URI generation. Since IDs should be usable as referencable points in web requests, we want to restrict characters to at least these characters. Of these remaining non-alphanumeric reserved characters (`-`, `.`, `_`, and `~`) three of them have semantic meaning or are restricted in URL generation leaving only alphanumericand `-` as legal characters we want to support. This extra restriction also helps with storage of ids in databases, where non-ascii characters in identifiers can oftentimes lead to query, storage, or application bugs when not handled correctly. Since we don't have a pre-existing strong need for such characters (`.`, `_`, and `~`) in our `id` field, we propose not introducing the additional complexity of allowing these other characters here.
72
+
The [RFC 3986 (Uniform Resource Identifier (URI): Generic Syntax)](https://www.ietf.org/rfc/rfc3986.txt) defines the unreserved characters allowed for URI generation. Since IDs should be usable as referencable points in web requests, we want to restrict characters to at least these characters. Of these remaining non-alphanumeric reserved characters (`-`, `.`, `_`, and `~`), one has semantic meaning which doesn't impact our use-case (`_`) and two of them are restricted in URL generation leaving only alphanumeric, `-`, and `_` as legal characters we want to support. This extra restriction also helps with storage of ids in databases, where non-ascii characters in identifiers can oftentimes lead to query, storage, or application bugs when not handled correctly. Since we don't have a pre-existing strong need for such characters (`.` and `~`) in our `id` field, we propose not introducing the additional complexity of allowing these other characters here.
73
73
74
-
The length restrictions are there for a few reasons. First, you don't want empty strings in your ids, so enforce some natural minimum. We could use 1 or 2 for accepting bascially any id pattern, or be more restrictive with a higher minimum to reserve a wider combination of min length ids (`63^k` combinations). Second, you want a fixed max length for string identifiers for indexable ids in many database solutions for both performance and ease of implementation concerns. These will certainly be used in recall mechanisms so ease of database use should be a strong criterion. Third, a UUID string takes 36 characters to represent (with the `-` characters), and we likely want to support this as a supported identity pattern for certain applications that want this.
74
+
The length restrictions are there for a few reasons. First, you don't want empty strings in your ids, so enforce some natural minimum. We could use 1 or 2 for accepting bascially any id pattern, or be more restrictive with a higher minimum to reserve a wider combination of min length ids (`63^k` combinations). Second, you want a fixed max length for string identifiers for indexable ids in many database solutions for both performance and ease of implementation concerns. These will certainly be used in recall mechanisms so ease of database use should be a strong criterion. Third, a UUID string takes 36 characters to represent (with the `-` characters), and we likely want to support this as a supported identity pattern for certain applications that want this. Thus we choose a 1-64 character limit range to provide flexibility and some measure of consistency.
75
75
76
76
### Updating older formats
77
77
@@ -118,8 +118,8 @@ index e3dedf2..4f192e6 100644
118
118
+ "description": "A string field representing the identifier of this particular cell.",
119
119
+ "type": "string",
120
120
+ "pattern": "^[a-zA-Z0-9-]+$",
121
-
+ "minLength": 2,
122
-
+ "maxLength": 36
121
+
+ "minLength": 1,
122
+
+ "maxLength": 64
123
123
+ },
124
124
+
125
125
"cell": {
@@ -218,11 +218,21 @@ If bookkeeping of current cell ids is not desirable, a 64-bit random id (11 char
218
218
219
219
```python
220
220
defget_cell_id(id_length=8):
221
-
# Ok technically this isn't exactly a 64-bit k-length string... but it's close and easy to implement
222
-
returnstr(uuid.uuid4())[:id_length]
221
+
n_bytes =max(id_length *3//4, 1)
222
+
# since standard base64 uses + and /, which the proposed regex excludes we need to use urlsafe_b64encode
#### Option C: Join human-readable strings from a corpus randomly
226
+
#### Option C: uuid-subset
227
+
228
+
Basically the same as Option B, just a different flavor of random generation.
229
+
230
+
```python
231
+
defget_cell_id(id_length=8):
232
+
return uuid.uuid4().hex[:id_length]
233
+
```
234
+
235
+
#### Option D: Join human-readable strings from a corpus randomly
226
236
227
237
One frequently used pattern for generating human recognizable ids is to combine common words together instead of arbitrarily random bits. Things like `danger-noodle` is a lot easier to remember or reference for a person than `ZGFuZ2VyLW5vb2RsZQ==`. Below would be how this is achieved, though it requires a set of names to use in id generation. There are dependencies in Python, as well as corpus csv files, for this that make it convenient but it would have to add to the install dependencies.
228
238
@@ -233,7 +243,7 @@ def get_cell_id(num_words=2):
233
243
234
244
#### Preference
235
245
236
-
Use Option B. Option C is also viable but adds a corpus requirement to the id generation step.
246
+
Use Option D for most human readable, but adds a corpus requirement to the id generation step. If corpus is not desired, use Options B or C.
0 commit comments