Skip to content

Commit 0c00671

Browse files
committed
add post
1 parent 6c3d060 commit 0c00671

File tree

4 files changed

+298
-1
lines changed

4 files changed

+298
-1
lines changed

components/layout.tsx

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -47,7 +47,7 @@ export default function Layout({ children, title, description }) {
4747
:root {
4848
--text: #1d1d27;
4949
--input-background: #fff;
50-
--link: #0265d5;
50+
--link: #1772ea;
5151
--link-hover: #496495;
5252
--light-text: #73738b;
5353
--border: #b6b6c2;

data/posts.ts

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -7,6 +7,7 @@ export const popularPosts = [
77

88
// Starred posts (not in any specific order)
99
export const postStars = [
10+
"installing-npm-packages-very-quickly",
1011
"compiling-lisp-to-bytecode-and-running-it",
1112
"making-python-less-random",
1213
"lisp-compiler-optimizations",

data/projects.ts

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -32,6 +32,12 @@ export default [
3232
desc: "A server-side web framework that deploys to Vercel.",
3333
to: "/my-own-python-web-framework",
3434
},
35+
{
36+
name: "caladan",
37+
link: "https://github.com/healeycodes/caladan",
38+
desc: "Experimental npm package manager. Installs from lockfile and runs bin scripts.",
39+
to: "/installing-npm-packages-very-quickly",
40+
},
3541
{
3642
name: "untrusted-python",
3743
link: "https://github.com/healeycodes/untrusted-python",
Lines changed: 290 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,290 @@
1+
---
2+
title: "Installing NPM Packages Very Quickly"
3+
date: "2025-03-25"
4+
tags: ["go"]
5+
description: "Building a package manager with a fast install step."
6+
---
7+
8+
Some package managers are faster than others. The early JavaScript package managers, [npm](https://docs.npmjs.com/cli/v11) and [yarn](https://yarnpkg.com/), are commonly replaced by faster alternatives like [bun](https://bun.sh/) and [pnpm](https://pnpm.io/). I've also seen benchmarks between package managers where the performance gap is rather large – but I'm not sure why one package manager would ever be significantly faster than another.
9+
10+
To understand more about package manager performance, I traced some call paths through bun's [Zig codebase](https://github.com/oven-sh/bun) and pnpm's [TypeScript codebase](https://github.com/pnpm/pnpm) but I was still missing some details about the performance challenges these projects were taking on.
11+
12+
So I built my own toy package manager called [caladan](https://github.com/healeycodes/caladan). For now, it just does two things: install npm packages from a valid `package-lock.json` file and run [bin scripts](https://docs.npmjs.com/cli/v11/configuring-npm/package-json#bin).
13+
14+
I wanted to get close to the cold install performance of `bun` and I'm pretty happy with the results. Benchmarks are usually incorrect so there's a good chance I'm being unfair to `bun` here. Here are the results nonetheless:
15+
16+
```bash
17+
# ran on m1 mac w/ 600mbps network, bun v1.2.5
18+
# both have an equivalent lockfile with 16 packages (311mb on disk)
19+
# cache is cleared before each run with `bun pm cache rm && rm -rf node_modules`
20+
21+
./benchmark.sh
22+
Benchmark 1: ./caladan install-lockfile fixtures/1
23+
Time (mean ± σ): 1.767 s ± 0.052 s [User: 2.168 s, System: 2.236 s]
24+
Range (min … max): 1.729 s … 1.857 s 5 runs
25+
26+
Benchmark 2: bun install --force --ignore-scripts --no-cache --network-concurrency 64
27+
Time (mean ± σ): 1.587 s ± 0.097 s [User: 0.496 s, System: 1.293 s]
28+
Range (min … max): 1.486 s … 1.693 s 5 runs
29+
30+
Summary
31+
bun install --force --ignore-scripts --no-cache --network-concurrency 64 ran
32+
1.11 ± 0.08 times faster than ./caladan install-lockfile fixtures/1
33+
```
34+
35+
The much lower [user time](https://en.wikipedia.org/wiki/CPU_time#User_and_System_time) of `bun` points to its efficient Zig codebase. Seeing similar-ish system times and overall wall clock times suggests that both tools have the same fundamental limits (whether network, disk I/O, or system call overhead). On a faster and more capable machine, `bun` would be able to make better use of the available resources.
36+
37+
To verify that my package manager is doing the same work, I checked that the sizes of the directories inside `node_modules` were comparable, and I checked that the bin scripts ran without any errors (e.g. `nanoid`, `next`, and `image-size`).
38+
39+
```bash
40+
./caladan run fixtures/1 nanoid
41+
Running nanoid with args: []
42+
Working directory: fixtures/1
43+
guxvWmbNcvIuAowqzrnEu
44+
```
45+
46+
The benchmark script is [open source](https://github.com/healeycodes/caladan) and hopefully you'll correct me if I've set it up unfairly.
47+
48+
I'll outline my efforts to get close to `bun`'s cold install performance in the following sections.
49+
50+
## Installing a Package
51+
52+
`package-lock.json` is automatically generated by the *previous install* to lock the exact versions of all dependencies (and their dependencies) in a Node.js project. It ensures consistent installations across different environments by recording the precise dependency tree at the previous install.
53+
54+
It's mostly made up of dependency entries like this:
55+
56+
```json
57+
"dependencies": {
58+
// ..
59+
"date-fns": {
60+
"version": "2.29.3",
61+
"resolved": "<https://registry.npmjs.org/date-fns/-/date-fns-2.29.3.tgz>",
62+
"integrity": "sha512-dDCnyH2WnnKusqvZZ6+jA1O51Ibt8ZMRNkDZdyAyK4YfbDwa/cEmuztzG5pk6hqlp9aSBPYcjOlktquahGwGeA=="
63+
},
64+
```
65+
66+
Our job, as a minimal package manager, is to install all of these dependencies.
67+
68+
1. Parse `package-lock.json`
69+
2. Download the compressed files from `resolved`
70+
3. Verify their `integrity` by calculating the hash of these files
71+
4. Extract them to `node_modules`
72+
5. Parse `node_modules/$package/package.json` and check for a `bin` property
73+
6. (If so, create a symlink inside `node_modules/.bin/$package`)
74+
75+
*Not listed here are other features like pre- and post-install scripts that I haven't implemented. I think I'm also missing some validation steps (e.g. checking if `package.json` differs from the lockfile).*
76+
77+
To get everything working, I started by implementing these steps to run sequentially. It was very slow and took, like, ~30sec to install all the packages for my small project.
78+
79+
I got a 2x improvement by skipping installing extra packages when I didn't need them (i.e. by filtering by OS). On my MacBook, I don't need to install `node_modules/@next/swc-darwin-x64` but I *do* need to install `node_modules/@next/swc-darwin-arm64`.
80+
81+
The next big improvement was to run things in parallel. I put each package's download-and-extract step in its own goroutine and stuck them in an [errgroup](https://pkg.go.dev/golang.org/x/sync/errgroup).
82+
83+
```go
84+
g := errgroup.Group{}
85+
86+
// Process each package in parallel
87+
for pkgName, pkgInfo := range packages {
88+
g.Go(func() error {
89+
90+
// Skip OS-specific packages that don't match current OS
91+
// ..
92+
// Create package directory
93+
// ..
94+
// Normalize package path
95+
// ..
96+
97+
// Download the package tarball
98+
DownloadAndExtractPackage(
99+
ctx,
100+
httpSemaphore,
101+
tarSemaphore,
102+
client,
103+
pkgInfo.Resolved,
104+
pkgInfo.Integrity,
105+
pkgPath
106+
)
107+
108+
return nil
109+
})
110+
}
111+
112+
// Wait for all packages to complete
113+
err := g.Wait()
114+
// ..
115+
```
116+
117+
This was much faster than doing everything sequentially. However, without limits on parallelism, there was resource contention in two areas: HTTP requests, and unzipping files.
118+
119+
## Comparing CPU Profiles
120+
121+
From reading their codebases, I knew that bun and pnpm used different levels of concurrency for HTTP requests and unzipping files.
122+
123+
When I added separate semaphores around these steps, the performance of my install step improved by ~20% for the small project I've been testing. I knew intuitively that these semaphores helped with resource contention but I thought it would be interesting to prove this using profiling tools.
124+
125+
I've chosen to highlight the effect of adding the semaphore for unzipping files as the performance improvement is more significant there.
126+
127+
In my program, I have an env var that allows me to output CPU profiles:
128+
129+
```go
130+
if cpuProfilePath := os.Getenv("CPU_PROFILE"); cpuProfilePath != "" {
131+
f, err := os.Create(cpuProfilePath)
132+
if err != nil {
133+
fmt.Printf("Error creating CPU profile file: %v\n", err)
134+
os.Exit(1)
135+
}
136+
pprof.StartCPUProfile(f)
137+
defer pprof.StopCPUProfile()
138+
fmt.Printf("CPU profiling enabled, writing to: %s\n", cpuProfilePath)
139+
}
140+
```
141+
142+
I used pprof's `-text` output to compare two different profiles (with unzip sema and without it) side-by-side in my code editor:
143+
144+
```go
145+
go tool pprof -text cpu_without_sema.prof > cpu_without_sema.txt
146+
go tool pprof -text cpu_with_sema.prof > cpu_with_sema.txt
147+
```
148+
149+
### **Decompression Performance Improvement**
150+
151+
With the semaphore, the core decompression functions represented less of the overall percentage of program time, and were also quicker to run. Below is the profile data for `huffmanBlock` (decoding a single Huffman block), and `huffSym` (reading the next Huffman-encoded symbol).
152+
153+
```text
154+
# with semaphore
155+
flat flat% sum% cum cum%
156+
0.11s 1.94% 89.22% 0.42s 7.42% compress/flate.(*decompressor).huffmanBlock
157+
0.11s 1.94% 87.28% 0.19s 3.36% compress/flate.(*decompressor).huffSym
158+
159+
# without semaphore
160+
flat flat% sum% cum cum%
161+
0.11s 1.88% 88.57% 0.51s 8.70% compress/flate.(*decompressor).huffmanBlock
162+
0.19s 3.24% 82.08% 0.29s 4.95% compress/flate.(*decompressor).huffSym
163+
```
164+
165+
There was also a ~5% decrease in the time spent waiting on system calls (`syscall.syscall`) and I/O (`os.(*File).Write` and `os.(*File).ReadFrom`).
166+
167+
### More Detail on Why
168+
169+
The semaphore limits the number of concurrent extraction operations, preventing CPU, memory, and I/O contention. By matching the extraction concurrency to available CPU resources (using 1.5x cores), the system avoids thrashing and context switching.
170+
171+
Notably, there was an increase in "scheduling time" which may seem counterintuitive but here it's desirable as it means synchronization is more orderly and predictable and there's less chaotic contention for system resources:
172+
173+
```text
174+
runtime.schedule +2.70%
175+
runtime.park_m +1.23%
176+
runtime.gopreempt_m +0.42%
177+
runtime.goschedImpl +0.42%
178+
runtime.notewakeup +0.21%
179+
runtime.lock +1.31%
180+
runtime.lockWithRank +1.31%
181+
runtime.lock2 +1.31%
182+
```
183+
184+
We traded a small amount of scheduling time for faster I/O and faster decompression (CPU).
185+
186+
## Keeping Things in Memory
187+
188+
One of the ways you can be fast is to avoid disk operations altogether. This was the final optimization I added. Initially, I downloaded each package to a temporary file and then extracted it into `node_modules`.
189+
190+
I realized I could do everything at the same time using the HTTP response stream:
191+
192+
- Download the bytes of the archive
193+
- Extract directly to the final location
194+
- Calculate the hash as we go so we can verify each package's [integrity](https://docs.npmjs.com/cli/v9/configuring-npm/package-lock-json#dependencies)
195+
196+
```go
197+
// DownloadAndExtractPackage downloads a package tarball and extracts it
198+
func DownloadAndExtractPackage(ctx context.Context, httpSemaphore, tarSemaphore *semaphore.Weighted, client *http.Client, url, integrity, destPath string) error {
199+
httpSemaphore.Acquire(ctx, 1)
200+
defer httpSemaphore.Release(1)
201+
202+
// Request the tarball
203+
resp, err := client.Get(url)
204+
if err != nil {
205+
return fmt.Errorf("error downloading package: %v", err)
206+
}
207+
defer resp.Body.Close()
208+
209+
// Setup hash verification
210+
var hash interface {
211+
io.Writer
212+
Sum() []byte
213+
}
214+
// ..
215+
216+
// Use a TeeReader to compute hash while reading
217+
teeReader := io.TeeReader(resp.Body, hash)
218+
reader := teeReader
219+
220+
tarSemaphore.Acquire(ctx, 1)
221+
defer tarSemaphore.Release(1)
222+
223+
// Extract directly from the download stream
224+
err := extractTarGz(reader, destPath)
225+
if err != nil {
226+
return fmt.Errorf("error extracting package: %v", err)
227+
}
228+
229+
// Compare hashes
230+
// ..
231+
232+
return nil
233+
}
234+
```
235+
236+
In a way, everything gets blocked on the semaphore that wraps the extracting step. But seeing as it's an order of magnitude faster than downloading bytes over the network, it feels like a good design.
237+
238+
## Running Scripts
239+
240+
The final part of my package manager program configures the symlinks for any [bin scripts](https://docs.npmjs.com/cli/v11/configuring-npm/package-json#bin) that the packages might have. It also runs them when invoked with `caladan run <directory> <script> <args>`.
241+
242+
After a package is downloaded to `node_modules/$package/` it has a `package.json` file which may have a `bin` property.
243+
244+
For example, `nanoid` has:
245+
246+
```json
247+
"bin": "./bin/nanoid.cjs",
248+
```
249+
250+
Which means there's a file at `node_modules/nanoid/bin/nanoid.cjs` that we need to create an executable symlink for at `node_modules/.bin/nanoid`.
251+
252+
The hardest part is getting the relative file paths correct and ensuring that args are passed correctly. Running the script isn't too hard. It's effectively just `exec.Command`.
253+
254+
```go
255+
func Run(directory string, args []string) {
256+
// Set up command to run script using project-relative path
257+
binScriptName := filepath.Join("./node_modules/.bin", scriptName)
258+
cmd := exec.Command("sh", "-c", binScriptName+" "+strings.Join(scriptArgs, " "))
259+
260+
// Set working directory to the specified directory (project root)
261+
cmd.Dir = directory
262+
fmt.Printf("Working directory: %s\n", directory)
263+
264+
// Connect standard IO
265+
cmd.Stdout = os.Stdout
266+
cmd.Stderr = os.Stderr
267+
cmd.Stdin = os.Stdin
268+
269+
// Run the command and wait for it to finish
270+
err := cmd.Run()
271+
// ..
272+
```
273+
274+
## To Conclude
275+
276+
All this to have a package manager that implements 2% of the spec that users expect, hah.
277+
278+
It's ~700 lines of Go ([open source](https://github.com/healeycodes/caladan)) and it was fun to write. I have more of an understanding about the upper-end of performance that's possible when it comes to installing packages.
279+
280+
I'd like to be able to handle a cold install of a `package.json` (creating and updating the lockfile) at similar speeds. I hope to put a follow-up post together when I'm able to get my dependency resolution and hoisting to match how `npm` does it.
281+
282+
I'd also like to look into the cache optimizations that `bun` does for repeat package installs which, in some cases, takes tens of milliseconds.
283+
284+
After getting up close to the basics of package manager-ing over the past week, I feel like JavaScript doesn't cut it as far as the required performance is concerned. I used to think that package managers were network-bound but now I've changed my mind.
285+
286+
The raw performance (and concurrency primitives) of a systems-y language like Go give you so much more power.
287+
288+
To end on a [Jarred Sumner post](https://x.com/jarredsumner/status/1868090574378840523):
289+
290+
> A lot of performance optimizations come from looking closely at things people assume is "just the network" or "just I/O bound"

0 commit comments

Comments
 (0)