Skip to content

Commit 4c4c5d6

Browse files
authored
Merge pull request #67 from healeycodes/post/go-semaphore
Add semaphore post
2 parents a51ae3e + 3cf37d1 commit 4c4c5d6

File tree

2 files changed

+316
-1
lines changed

2 files changed

+316
-1
lines changed

pages/index.tsx

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -66,7 +66,7 @@ export default function Home({ allPostsData, description, words }) {
6666
I wrote <Link href="/maybe-the-fastest-disk-usage-program-on-macos">one of the fastest disk-usage programs on macOS</Link> by
6767
using macOS-specific system calls, and then
6868
made it faster by <Link href="/optimizing-my-disk-usage-program">reducing thread scheduling overhead and lock contention</Link>. I
69-
also showed how to beat the performance of <code>grep</code> by just <Link href="/beating-grep-with-go">using goroutines</Link>.
69+
also showed how to beat the performance of <code>grep</code> by just <Link href="/beating-grep-with-go">using goroutines</Link>. I like learning by building things from scratch; like <Link href="/a-fair-cancelable-semaphore-in-go">a fair, and cancelable semaphore in Go</Link>.
7070
</p>
7171
<p>
7272
My <Link href="/installing-npm-packages-very-quickly">experimental package manager</Link> uses simple concurrency patterns to be faster than every package manager aside from Bun (mine is 11% slower) when cold-installing from a lockfile.
Lines changed: 315 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,315 @@
1+
---
2+
title: "A Fair, Cancelable Semaphore in Go"
3+
date: "2025-12-21"
4+
tags: ["go"]
5+
description: "Building a fair, cancelable semaphore in Go and the subtle concurrency issues involved."
6+
---
7+
8+
They say that you don't fully understand something unless you can build it from scratch. To wit, my challenge to the more technical readers of this blog is: can you build a semaphore from scratch in your favorite programming language? Bonus points for also handling context cancellation.
9+
10+
I attempted this in Go and it was about 5x harder than I thought it would be. Largely due to concurrency/locking bugs – I assume you'll have an easier time in, say, JavaScript.
11+
12+
A brief reminder: semaphores are tools used in programming to limit how many tasks can run at the same time by controlling access to shared resources.
13+
14+
Here's a quick example of their use-case. Your operating system has limits on the amount of file descriptors that can be open but you didn't know this when you wrote the following program:
15+
16+
```go
17+
g, ctx := errgroup.WithContext(context.Background())
18+
19+
for _, path := range files {
20+
g.Go(func(p string) error {
21+
f, err := os.Open(p)
22+
if err != nil {
23+
return err
24+
}
25+
defer f.Close()
26+
27+
return processFile(f)
28+
}(path))
29+
}
30+
31+
if err := g.Wait(); err != nil {
32+
return err
33+
}
34+
```
35+
36+
When there's a large amount of files, you hit an error like:
37+
38+
```text
39+
panic: open /my/file: too many open files!
40+
```
41+
42+
## Channels
43+
44+
In Go, channels are a built-in concurrency primitive for communicating between goroutines. Which is exactly what we need to do here: the goroutine that's finished using the resource needs to tell one of the goroutines that's waiting that it can start using it.
45+
46+
```go
47+
g, ctx := errgroup.WithContext(context.Background())
48+
49+
// Initialize buffered channel with 10 empty structs
50+
sema := make(chan struct{}, 10)
51+
52+
for _, path := range files {
53+
g.Go(func(p string) error {
54+
55+
// Acquire a semaphore slot (blocks if the buffer is full)
56+
sema <- struct{}{}
57+
defer func() {
58+
<-sema // Release the semaphore slot
59+
}()
60+
61+
f, err := os.Open(p)
62+
if err != nil {
63+
return err
64+
}
65+
defer f.Close()
66+
67+
return processFile(f)
68+
}(path))
69+
}
70+
```
71+
72+
This works great as a simple limiter but it's missing two features that I often need in the semaphores I use:
73+
74+
- First In First Out (FIFO) ordering. Requests are served in arrival order, which makes behavior easier to reason about and debug.
75+
- Context cancellation. Waiting or in-progress operations can be aborted when they're no longer needed, preventing wasted work and resource leaks.
76+
77+
Why is the above snippet _not_ FIFO? Multiple goroutines sending to the channel compete with each other. The scheduler decides which send proceeds first, so ordering isn't guaranteed. There's no explicit queue.
78+
79+
Just using a `chan` isn't going to cut it.
80+
81+
## Adding a Queue
82+
83+
Go has a doubly linked list in the standard library that we can use as the queue. This queue will contain the channels that are used to wake up the blocked call to acquire the semaphore.
84+
85+
Trying to acquire a semaphore has one of two immediate outcomes:
86+
- The fast path: there's an available permit and the call returns right away.
87+
- The slow path: there's no permits, and we enqueue a channel and wait.
88+
89+
```text
90+
When there are no available permits, G2 blocks on the Acquire() call
91+
until an earlier goroutine, G1, calls Release().
92+
93+
Time →
94+
────────────────────────────────────────
95+
96+
G2: Acquire() ──── blocks ─────▶ resumes
97+
98+
G1: Release() ───────┘
99+
```
100+
101+
We need four bits of state:
102+
- The maximum number of permits
103+
- The available number of permits
104+
- A queue structure that stores channels
105+
- A lock to protect access to all the above
106+
107+
```go
108+
import (
109+
"container/list"
110+
"sync"
111+
)
112+
113+
type Semaphore struct {
114+
mu sync.Mutex
115+
free int64 // available permits
116+
max int64 // maximum permits
117+
waiters list.List // queue of chan struct{}, closed to wake
118+
}
119+
120+
// NewSemaphore creates a semaphore with n permits.
121+
func NewSemaphore(n int64) *Semaphore {
122+
return &Semaphore{
123+
free: n,
124+
max: n,
125+
}
126+
}
127+
128+
// Acquire blocks until a permit is available, then takes it.
129+
func (s *Semaphore) Acquire() {
130+
s.mu.Lock()
131+
132+
// Fast path: permit available
133+
if s.free > 0 {
134+
s.free--
135+
s.mu.Unlock()
136+
return
137+
}
138+
139+
// Slow path: enqueue ourselves and wait
140+
waiter := make(chan struct{})
141+
s.waiters.PushBack(waiter)
142+
s.mu.Unlock()
143+
144+
<-waiter // blocks until Release closes the channel
145+
}
146+
147+
// Release returns a permit. Panics if over-released.
148+
func (s *Semaphore) Release() {
149+
s.mu.Lock()
150+
151+
if s.free+1 > s.max {
152+
s.mu.Unlock()
153+
panic("semaphore: released more than acquired")
154+
}
155+
s.free++
156+
157+
// Wake the first waiter if any
158+
if front := s.waiters.Front(); front != nil {
159+
s.waiters.Remove(front)
160+
s.free-- // reserve permit for waiter
161+
s.mu.Unlock()
162+
close(front.Value.(chan struct{})) // wake waiter (non-blocking)
163+
return
164+
}
165+
166+
s.mu.Unlock()
167+
}
168+
```
169+
170+
I'm pretty happy with this. The only way I think I could use less LOC is by removing the panic on calling `Release()` too many times (then we don't need to track `max`).
171+
172+
The code would be easier to read if the `Acquire()` call reserved its own permit in all cases but I couldn't figure out a way to do this while keeping the FIFO constraint. Some semaphores do allow permit stealing behavior (sometimes called "barging") to increase throughput at the cost of fairness.
173+
174+
## Context Cancellation
175+
176+
Good programs don't keep doing work after it no longer matters. Adding context cancellation lets a blocked operation stop waiting when the surrounding task is canceled or times out, which prevents wasted effort and makes systems easier to reason about and shut down cleanly.
177+
178+
Inside `Acquire()`, when waiting on the signal from a `Release()` call via the channel, we need to race the context being cancelled.
179+
180+
When the context is cancelled, there are two possible outcomes:
181+
- The `Acquire()` call is still queued and it needs to clean its state up (by removing itself from the queue), and then return a context error.
182+
- The `Acquire()` call has already been granted a permit and owns it, and so it needs to release that permit before returning a context error.
183+
184+
In order to tell these cases apart, we need a new bit of data: a `granted` flag that tracks whether a permit has been granted. Which I've wrapped inside this `waiter` struct with the existing channel:
185+
186+
```go
187+
type waiter struct {
188+
ch chan struct{}
189+
granted bool
190+
}
191+
```
192+
193+
`Acquire()` checks `granted` under the lock on cancellation. If we were granted a permit but are canceling anyway, we must release it:
194+
195+
```go
196+
func (s *Semaphore) Acquire(ctx context.Context) error {
197+
s.mu.Lock()
198+
199+
// Fast path
200+
if s.free > 0 {
201+
s.free--
202+
s.mu.Unlock()
203+
return nil
204+
}
205+
206+
w := &waiter{ch: make(chan struct{})}
207+
elem := s.waiters.PushBack(w)
208+
s.mu.Unlock()
209+
210+
// Race the release signal and the context
211+
select {
212+
case <-w.ch:
213+
return nil
214+
215+
case <-ctx.Done():
216+
s.mu.Lock()
217+
218+
if w.granted {
219+
// Permit was reserved for us, but we're canceling
220+
// Must release the permit we own
221+
s.mu.Unlock()
222+
s.Release()
223+
return ctx.Err()
224+
}
225+
226+
// Not yet granted, remove from queue
227+
s.waiters.Remove(elem)
228+
s.mu.Unlock()
229+
return ctx.Err()
230+
}
231+
}
232+
```
233+
234+
And `Release()` sets `granted = true` under the lock before waking the waiter:
235+
236+
```go
237+
func (s *Semaphore) Release() {
238+
s.mu.Lock()
239+
240+
if s.free+1 > s.max {
241+
s.mu.Unlock()
242+
panic("semaphore: released more than acquired")
243+
}
244+
s.free++
245+
246+
// Wake the first waiter if any
247+
if front := s.waiters.Front(); front != nil {
248+
w := front.Value.(*waiter)
249+
s.waiters.Remove(front)
250+
s.free-- // reserve permit for waiter
251+
w.granted = true // mark granted under the lock
252+
s.mu.Unlock()
253+
close(w.ch) // wake waiter
254+
return
255+
}
256+
257+
s.mu.Unlock()
258+
}
259+
```
260+
261+
To put it all together, here's the semaphore being used in my original example at the top:
262+
263+
```go
264+
g, ctx := errgroup.WithContext(context.Background())
265+
sema := NewSemaphore(10)
266+
267+
for _, path := range files {
268+
g.Go(func(p string) error {
269+
err := sema.Acquire(ctx) // acquire a permit (and wait if needed)
270+
if err != nil {
271+
return err
272+
}
273+
defer sema.Release() // release the permit when we return
274+
275+
f, err := os.Open(p)
276+
if err != nil {
277+
return err
278+
}
279+
defer f.Close()
280+
281+
return processFile(f)
282+
}(path))
283+
}
284+
285+
if err := g.Wait(); err != nil {
286+
return err
287+
}
288+
```
289+
290+
In the end, a semaphore is "just" a counter plus a way to park goroutines until the counter says they can proceed. The surprising part is everything around that core: what order you unblock waiters in, what happens when work becomes irrelevant, and what invariants you need to keep to avoid deadlocks and leaks.
291+
292+
A plain buffered channel is a good-enough concurrency limiter but it doesn't give you FIFO semantics when many goroutines contend at once, and it doesn't naturally compose with cancellation.
293+
294+
## Bugs I Ran Into
295+
296+
While iterating on this semaphore, I ran into two particularly tricky bugs.
297+
298+
The first was a deadlock caused by `Release()` sending a message on an unbuffered channel without a listener:
299+
300+
1. `Release()` removes the waiter from the queue and is about to send the wake-up message
301+
2. The waiter's `select` chooses `ctx.Done()` first and returns without receiving
302+
3. `Release()` blocks forever on the send because nobody is receiving anymore!
303+
304+
I fixed this by closing the channel in `Release()` instead of sending an empty struct.
305+
306+
The second was a permit leak caused by trying to detect "was I granted a permit?" by checking whether the channel was closed. There was a race between `Release()` reserving the permit and the waiter observing that fact:
307+
308+
1. `Release()` reserves a permit for a waiter (`s.free--`)
309+
2. The waiter's context is cancelled, the waiter re-locks
310+
3. The waiter tries to infer "granted" from the channel state, gets the wrong answer
311+
4. The waiter returns `ctx.Err()` without releasing the permit that was reserved for it
312+
313+
That permit is gone forever. I fixed this with the `granted` flag — it's set under the lock, so the waiter can reliably check whether it owns a permit.
314+
315+
After I had everything working, I looked up the source code of the semaphore I would typically use, [x/sync/semaphore](https://pkg.go.dev/golang.org/x/sync/semaphore) from Go's extended library. I found that it uses the same patterns: closing the channel to avoid the deadlock, and keeping all waiter state under the mutex to avoid the permit leak. The channel is just the notification mechanism, and the mutex-protected state is the source of truth.

0 commit comments

Comments
 (0)