@@ -52,11 +52,11 @@ Async:
5252
5353.. code-block :: python
5454
55- from anyio import sleep # AsyncSonyFlake supports both asyncio and trio
55+ import anyio
5656 from sonyflake_turbo import AsyncSonyFlake, SonyFlake
5757
5858 sf = SonyFlake(0x 1337 , 0x CAFE , start_time = 1749081600 )
59- asf = AsyncSonyFlake(sf, sleep)
59+ asf = AsyncSonyFlake(sf, sleep = anyio.sleep) # defaults to asyncio.sleep
6060
6161 print (" one" , await asf)
6262 print (" n" , await asf(5 ))
@@ -68,23 +68,182 @@ Async:
6868 Important Notes
6969===============
7070
71- SonyFlake algorithm produces IDs at rate 256 IDs per 10msec per 1 Machine ID.
72- One obvious way to increase the throughput is to use multiple generators with
73- different Machine IDs. This library provides a way to do exactly that by
74- passing multiple Machine IDs to the constructor of the `SonyFlake ` class.
75- Generated IDs are non-repeating and are always increasing. But be careful! You
76- should be conscious about assigning Machine IDs to different processes and/or
77- machines to avoid collisions. This library does not come with any Machine ID
78- management features, so it's up to you to figure this out.
79-
80- This library has limited free-threaded mode support. It won't crash, but
81- you won't get much performance gain from multithreaded usage. Consider
82- creating generators per thread instead of sharing them across multiple
83- threads.
84-
85- This library also contains pure-Python implementation as a fallback in case of
86- C extension unavailability (e.g. with PyPy or when installed with
87- ``--no-binary `` flag).
71+ Vanilla SonyFlake Difference
72+ ----------------------------
73+
74+ In vanilla SonyFlake, whenever counter overflows, it simply waits for the next
75+ 10ms window. Which severely limits the throughput. I.e. single generator
76+ produces 256ids/10ms.
77+
78+ Turbo version is basically the same as vanilla SonyFlake, except it accepts
79+ more than one Machine ID in constructor args. On counter overflow, it advances
80+ to the next "unexhausted" Machine ID and resumes the generation. Waiting for
81+ the next 10ms window happens only when all of the Machine IDs were exhausted.
82+
83+ This behavior is not much different from having multiple vanilla ID generators
84+ in parallel, but by doing so we ensure produced IDs are always monotonically
85+ increasing (per generator instance) and avoid potential concurrency issues
86+ (by not doing concurrency).
87+
88+ Few other features in comparison to other SonyFlake implementations found in
89+ the wild:
90+
91+ * Optional C extension module, for extra performance in CPython.
92+ * Async-framework-agnostic wrapper.
93+ * Thread-safe. Also has free-threading/nogil support.
94+
95+ .. note ::
96+
97+ Safe for concurrent use; internal locking ensures correctness. Sleeps are
98+ always done after internal state updates.
99+
100+ .. _Locks : https://docs.python.org/3/library/threading.html#lock-objects
101+
102+ Machine IDs
103+ -----------
104+
105+ Machine ID is a 16 bit integer in range ``0x0000 `` to ``0xFFFF ``. Machine IDs
106+ are encoded as part of the SonyFlake ID:
107+
108+ +----+-----------------+------------+---------+
109+ | | Time | Machine ID | Counter |
110+ +====+=================+============+=========+
111+ | 0x | 0874AD4993 [# ]_ | CAFE | 04 |
112+ +----+-----------------+------------+---------+
113+
114+ SonyFlake IDs, in spirit, are UUIDv6 _, but compressed down to 64 bit. But
115+ unfortunately, we do not have luxury of having 48 bits for encoding node id
116+ (UUID equivalent of SonyFlake's Machine ID). UUID standard proposes to use
117+ pseudo-random value for this field, which is sub-optimal for our case due to
118+ high risk of collisions.
119+
120+ Vanilla SonyFlake, on the other hand, used lower 16 bits of the private IP
121+ address. Which is sort of works, but has two major drawbacks:
122+
123+ 1. It assumes you have *exactly one * ID generator per machine in your network.
124+ 2. You're leaking some of your infrastructure info.
125+
126+ In the modern world (k8s, "lambdas", etc...), both of these fall apart:
127+
128+ 1. Single machine often runs multiple different processes and/or threads.
129+ More often than not they're isolated enough to successfully coordinate
130+ ID generation.
131+ 2. Security aspect aside, container IPs within cluster network are not
132+ something globally unique, especially if trimmed down to 16 bit.
133+
134+ Solving this issue is up to you, as a developer. This particular library does
135+ not include Machine ID management logic, so you are responsible for
136+ coordinating Machine IDs in your deployment.
137+
138+ Task is not trivial, but neither is impossible. Here are a few ideas:
139+
140+ * Coordinate ID assignment via something like etcd _ or ZooKeeper _ using lease _
141+ pattern. Optimal, but a bit bothersome to implement.
142+ * Reinvent Twitter's SnowFlake _ by having a centralized service/sidecar. Extra
143+ round-trips SonyFlake intended to avoid.
144+ * Assign Machine IDs manually. DevOps team will hate you.
145+ * Use random Machine IDs. ``If I ignore it, maybe it will go away.jpg ``
146+
147+ But nevertheless, it has one helper class: ``MachineIDLCG ``. This is a
148+ primitive LCG _-based 16 bit PRNG. It is intended to be used in tests, or in
149+ situations where concurrency is not a problem (e.g. desktop or CLI apps).
150+ You can also reuse it for generating IDs for a lease to avoid congestion when
151+ going etcd/ZooKeeper route.
152+
153+ How many Machine IDs you want to allocate per generator is something you
154+ should figure out on your own. Here's some numbers for you to start
155+ (generating 1 million SonyFlake IDs):
156+
157+ +--------+-------------+
158+ | Time | Machine IDs |
159+ +========+=============+
160+ | 1.22s | 32 |
161+ +--------+-------------+
162+ | 2.44s | 16 |
163+ +--------+-------------+
164+ | 4.88s | 8 |
165+ +--------+-------------+
166+ | 9.76s | 4 |
167+ +--------+-------------+
168+ | 19.53s | 2 |
169+ +--------+-------------+
170+ | 39.06s | 1 |
171+ +--------+-------------+
172+
173+ .. [# ] 1409529600 + 0x874AD4993 / 100 = 2026-03-05T09:15:19.87Z
174+ .. _UUIDv6 : https://www.rfc-editor.org/rfc/rfc9562.html#name-uuid-version-6
175+ .. _etcd : https://etcd.io/
176+ .. _ZooKeeper : https://zookeeper.apache.org/
177+ .. _SnowFlake : https://en.wikipedia.org/wiki/Snowflake_ID
178+ .. _lease : https://martinfowler.com/articles/patterns-of-distributed-systems/lease.html
179+ .. _LCG : https://en.wikipedia.org/wiki/Linear_congruential_generator
180+
181+ Clock Rollback
182+ --------------
183+
184+ There is no logic to handle clock rollbacks or drift at the moment. If clock
185+ moves backward, it will ``sleep() `` (``await sleep() `` in async wrapper)
186+ until time catches up to last timestamp.
187+
188+ Start Time
189+ ----------
190+
191+ SonyFlake ID has 39 bits dedicated for the time component with a resolution of
192+ 10ms. The time is stored relative to ``start_time ``. By default it is
193+ 1409529600 (``2014-09-01T00:00:00Z ``), but you may want to define your own
194+ "epoch".
195+
196+ Motivation
197+ ----------
198+
199+ Sometimes you have to bear with consequences of decisions you've made long
200+ time ago. On a project I was leading, I made a decision to utilize SonyFlake.
201+ Everything was fine until we needed to ingest a lot of data, very quickly.
202+
203+ A flame graph showed we were sleeping way too much. The culprit was
204+ SonyFlake library we were using at that time. Some RTFM later, it was revealed
205+ that the problem was somewhere between the chair and keyboard.
206+
207+ Solution was found rather quickly: just instantiate more generators and cycle
208+ through them about every 256 IDs. Nothing could go wrong, right? Aside from
209+ fact that hack was of questionable quality, it did work.
210+
211+ Except, we've got hit by `Hyrum's Law `_. Unintentional side effect of the hack
212+ above was that IDs lost its "monotonically increasing" property [# ]_. Ofc, some
213+ of our and other team's code were dependent on this SonyFlake's feature. Duh.
214+
215+ Adding even more workarounds like pre-generate IDs, sort them and ingest was
216+ a compelling idea, but I did not feel right. Hence, this library was born.
217+
218+ .. [# ] E.g. if you cycle through generators with Machine IDs 0xCAFE and 0x1337
219+ You may get the following IDs: ``0x0874b2a7a0cafe00 ``,
220+ ``0x0874b2a7a0133700 ``. Even though there are no collisions, sorting
221+ them will result in a different order (vs order they've been generated)
222+ .. _Hyrum's Law : https://www.hyrumslaw.com/
223+
224+ Why should I use it?
225+ --------------------
226+
227+ If you're starting a new project, please use UUIDv7 _. It is superior to
228+ SonyFlake in almost every way. It is an internet standard (RFC 9562), it is
229+ already available in various languages' standard libraries and is supported by
230+ popular databases (PostgreSQL, MariaDB, etc...).
231+
232+ Otherwise you might want to use it for one of the following reasons:
233+
234+ * You already use it and encountered similar problems mentioned in
235+ `Motivation `_ section.
236+ * You want to avoid extra round-trips to fetch IDs.
237+ * Usage of UUIDs is not feasible (legacy codebase, db indexes limited to 64
238+ bit integers, etc...) but you still want to benefit from index
239+ locality/strict global ordering.
240+ * As a cheap way to reduce predicability of IDOR _ attacks.
241+ * Architecture lunatism is still strong within you and you want your code to
242+ be DDD-like (e.g. being able to reference an entity before it is stored in
243+ DB).
244+
245+ .. _UUIDv7 : https://www.rfc-editor.org/rfc/rfc9562.html#name-uuid-version-7
246+ .. _IDOR : https://cheatsheetseries.owasp.org/cheatsheets/Insecure_Direct_Object_Reference_Prevention_Cheat_Sheet.html
88247
89248Development
90249===========
@@ -102,16 +261,16 @@ Run tests:
102261
103262.. code-block :: sh
104263
105- py.test
264+ pytest
106265
107- Building wheels:
266+ Build wheels:
108267
109268.. code-block :: sh
110269
111270 pip install cibuildwheel
112271 cibuildwheel
113272
114- Building ``py3-none-any `` wheel (without C extension):
273+ Build a ``py3-none-any `` wheel (without the C extension):
115274
116275.. code-block :: sh
117276
0 commit comments