Skip to content

Commit ff1c063

Browse files
corylanouclaude
andauthored
feat: add agent skill for cross-platform LLM agent support (#1064)
Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
1 parent 4800f27 commit ff1c063

File tree

8 files changed

+1592
-0
lines changed

8 files changed

+1592
-0
lines changed

skills/litestream/SKILL.md

Lines changed: 323 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,323 @@
1+
---
2+
name: litestream
3+
description: >-
4+
Expert knowledge for contributing to Litestream, a standalone disaster recovery
5+
tool for SQLite. Provides architectural understanding, code patterns, critical
6+
rules, and debugging procedures for WAL monitoring, LTX replication format,
7+
storage backend implementation, multi-level compaction, and SQLite page
8+
management. Use when working with Litestream source code, writing storage
9+
backends, debugging replication issues, implementing compaction logic, or
10+
handling SQLite WAL operations.
11+
license: Apache-2.0
12+
metadata:
13+
author: benbjohnson
14+
version: "1.0"
15+
repository: https://github.com/benbjohnson/litestream
16+
---
17+
18+
# Litestream Agent Skill
19+
20+
Litestream is a standalone disaster recovery tool for SQLite. It runs as a
21+
background process, monitors the SQLite WAL (Write-Ahead Log), converts changes
22+
to immutable LTX files, and replicates them to cloud storage. It uses
23+
`modernc.org/sqlite` (pure Go, no CGO required).
24+
25+
## Quick Start
26+
27+
```bash
28+
# Build
29+
go build -o bin/litestream ./cmd/litestream
30+
31+
# Test (always use race detector)
32+
go test -race -v ./...
33+
34+
# Code quality
35+
pre-commit run --all-files
36+
```
37+
38+
## Critical Rules
39+
40+
These invariants must never be violated:
41+
42+
### 1. Lock Page at 1GB
43+
44+
SQLite reserves a page at byte offset 0x40000000 (1 GB). Always skip it during
45+
replication and compaction. The page number varies by page size:
46+
47+
| Page Size | Lock Page Number |
48+
|-----------|------------------|
49+
| 4 KB | 262145 |
50+
| 8 KB | 131073 |
51+
| 16 KB | 65537 |
52+
| 32 KB | 32769 |
53+
54+
```go
55+
lockPgno := ltx.LockPgno(pageSize)
56+
if pgno == lockPgno {
57+
continue
58+
}
59+
```
60+
61+
### 2. LTX Files Are Immutable
62+
63+
Once an LTX file is written, it must never be modified. New changes create new
64+
files. This guarantees point-in-time recovery integrity.
65+
66+
### 3. Single Replica per Database
67+
68+
Each database replicates to exactly one destination. The Replica component
69+
manages replication mechanics; database state belongs in the DB layer.
70+
71+
### 4. Read Local Before Remote During Compaction
72+
73+
Cloud storage is eventually consistent. Always read from local disk first:
74+
75+
```go
76+
f, err := os.Open(db.LTXPath(info.Level, info.MinTXID, info.MaxTXID))
77+
if err == nil {
78+
return f, nil // Use local copy
79+
}
80+
return replica.Client.OpenLTXFile(...) // Fall back to remote
81+
```
82+
83+
### 5. Preserve Timestamps During Compaction
84+
85+
Set the compacted file's `CreatedAt` to the earliest source file timestamp to
86+
maintain temporal granularity for point-in-time restoration.
87+
88+
```go
89+
info.CreatedAt = oldestSourceFile.CreatedAt
90+
```
91+
92+
### 6. Use Lock() Not RLock() for Writes
93+
94+
```go
95+
// CORRECT
96+
r.mu.Lock()
97+
defer r.mu.Unlock()
98+
r.pos = pos
99+
100+
// WRONG - race condition
101+
r.mu.RLock()
102+
defer r.mu.RUnlock()
103+
r.pos = pos
104+
```
105+
106+
### 7. Atomic File Operations
107+
108+
Always write to a temp file then rename. Never write directly to the final path.
109+
110+
```go
111+
tmpFile, err := os.CreateTemp(dir, ".tmp-*")
112+
// ... write data, sync ...
113+
os.Rename(tmpFile.Name(), finalPath)
114+
```
115+
116+
## Architecture
117+
118+
### System Layers
119+
120+
| Layer | File(s) | Responsibility |
121+
|---------|--------------------------|-------------------------------------------|
122+
| App | `cmd/litestream/` | CLI commands, YAML/env config |
123+
| Store | `store.go` | Multi-DB coordination, compaction |
124+
| DB | `db.go` | Single DB management, WAL monitoring |
125+
| Replica | `replica.go` | Replication to one destination |
126+
| Storage | `*/replica_client.go` | Backend implementations (S3, GCS, etc.) |
127+
128+
Database state logic belongs in the DB layer, not the Replica layer.
129+
130+
### ReplicaClient Interface
131+
132+
All storage backends implement this interface from `replica_client.go`:
133+
134+
```go
135+
type ReplicaClient interface {
136+
Type() string
137+
Init(ctx context.Context) error
138+
LTXFiles(ctx context.Context, level int, seek ltx.TXID, useMetadata bool) (ltx.FileIterator, error)
139+
OpenLTXFile(ctx context.Context, level int, minTXID, maxTXID ltx.TXID, offset, size int64) (io.ReadCloser, error)
140+
WriteLTXFile(ctx context.Context, level int, minTXID, maxTXID ltx.TXID, r io.Reader) (*ltx.FileInfo, error)
141+
DeleteLTXFiles(ctx context.Context, a []*ltx.FileInfo) error
142+
DeleteAll(ctx context.Context) error
143+
}
144+
```
145+
146+
Key contract details:
147+
- `OpenLTXFile` must return `os.ErrNotExist` when file is missing
148+
- `WriteLTXFile` must set `CreatedAt` from backend metadata or upload time
149+
- `LTXFiles` with `useMetadata=true` fetches accurate timestamps (for PIT restore)
150+
- `LTXFiles` with `useMetadata=false` uses fast timestamps (normal operations)
151+
152+
### Lock Ordering
153+
154+
Always acquire locks in this order to prevent deadlocks:
155+
156+
1. `Store.mu`
157+
2. `DB.mu`
158+
3. `DB.chkMu`
159+
4. `Replica.mu`
160+
161+
### Core Components
162+
163+
**DB** (`db.go`): Manages SQLite connection, WAL monitoring, checkpointing, and
164+
long-running read transaction for consistency. Key fields: `path`, `db`, `rtx`
165+
(read transaction), `pageSize`, `notify` channel.
166+
167+
**Replica** (`replica.go`): Tracks replication position (`ltx.Pos` with TXID,
168+
PageNo, Checksum). One replica per database.
169+
170+
**Store** (`store.go`): Coordinates multiple databases and schedules compaction
171+
across levels.
172+
173+
## LTX File Format
174+
175+
LTX (Log Transaction) files are immutable, checksummed archives of database
176+
changes. Structure:
177+
178+
```
179+
+------------------+
180+
| Header | 100 bytes (magic "LTX1", page size, TXID range, timestamp)
181+
+------------------+
182+
| Page Frames | 4-byte pgno + pageSize bytes data, per page
183+
+------------------+
184+
| Page Index | Binary search index for page lookup
185+
+------------------+
186+
| Trailer | 16 bytes (post-apply checksum, file checksum)
187+
+------------------+
188+
```
189+
190+
### Naming Convention
191+
192+
```
193+
Format: MMMMMMMMMMMMMMMM-NNNNNNNNNNNNNNNN.ltx
194+
Example: 0000000000000001-0000000000000064.ltx (TXID 1-100)
195+
```
196+
197+
### Compaction Levels
198+
199+
```
200+
Level 0: /ltx/0000/ Raw LTX files (no compaction)
201+
Level 1: /ltx/0001/ Compacted periodically
202+
Level 2: /ltx/0002/ Compacted less frequently
203+
```
204+
205+
Default compaction levels: L0 (raw), L1 (30s), L2 (5min), L3 (1h), plus daily
206+
snapshots. Compaction merges files by deduplicating pages (latest version wins)
207+
and always skips the lock page.
208+
209+
## Code Patterns
210+
211+
### DO
212+
213+
- Return errors immediately; let callers decide handling
214+
- Use `fmt.Errorf("context: %w", err)` for error wrapping
215+
- Handle database state in the DB layer, not Replica
216+
- Use `db.verify()` to trigger snapshots (don't reimplement)
217+
- Test with race detector: `go test -race`
218+
- Use lazy iterators for `LTXFiles` (paginate, don't load all at once)
219+
220+
### DON'T
221+
222+
- Write data at the 1 GB lock page boundary
223+
- Modify LTX files after creation
224+
- Put database state logic in the Replica layer
225+
- Use `RLock()` when writing shared state
226+
- Write directly to final file paths (use temp + rename)
227+
- Ignore context cancellation in long operations
228+
- Return generic errors instead of `os.ErrNotExist` for missing files
229+
230+
## Specialized Knowledge Areas
231+
232+
Load reference files on demand based on the task:
233+
234+
| Task | Reference File |
235+
|-----------------------------------|-----------------------------------------|
236+
| Understanding system design | `references/ARCHITECTURE.md` |
237+
| Writing or reviewing code | `references/PATTERNS.md` |
238+
| Working with LTX files | `references/LTX_FORMAT.md` |
239+
| WAL monitoring or page operations | `references/SQLITE_INTERNALS.md` |
240+
| Implementing storage backends | `references/REPLICA_CLIENT_GUIDE.md` |
241+
| Writing or debugging tests | `references/TESTING_GUIDE.md` |
242+
243+
## Common Debugging Procedures
244+
245+
### Replication Not Working
246+
247+
1. Verify WAL mode: `PRAGMA journal_mode` must return `wal`
248+
2. Check monitor interval and that the monitor goroutine is running
249+
3. Confirm `db.notify` channel is being signaled on WAL changes
250+
4. Check replica position: `replica.Pos()` should advance with writes
251+
5. Look for `os.ErrNotExist` from `OpenLTXFile` (file not replicated yet)
252+
253+
### Large Database Issues (>1 GB)
254+
255+
1. Verify lock page is being skipped: check `ltx.LockPgno(pageSize)`
256+
2. Test with multiple page sizes (4K, 8K, 16K, 32K)
257+
3. Run with databases both smaller and larger than 1 GB
258+
4. Ensure page iteration loops include the `continue` guard for lock page
259+
260+
### Compaction Problems
261+
262+
1. Confirm local L0 files exist before compaction reads them
263+
2. Check that `CreatedAt` timestamps are preserved (earliest source)
264+
3. Verify compaction level intervals in `Store.levels`
265+
4. Look for eventual consistency issues if reading from remote storage
266+
267+
### Storage Backend Issues
268+
269+
1. Return `os.ErrNotExist` for missing files (not generic errors)
270+
2. Support partial reads via `offset`/`size` in `OpenLTXFile`
271+
3. Handle context cancellation in all methods
272+
4. Test concurrent operations with `-race` flag
273+
5. For eventually consistent backends, add retry logic with backoff
274+
275+
## Contribution Guidelines
276+
277+
### What's Accepted
278+
279+
- Bug fixes and patches (welcome)
280+
- Documentation improvements
281+
- Small code improvements and performance optimizations
282+
- Security vulnerability reports (report privately)
283+
284+
### Discuss First
285+
286+
- Feature requests: open an issue before implementing
287+
- Large changes: discuss approach in an issue first
288+
289+
### Pre-Submit Checklist
290+
291+
- [ ] Read relevant docs from the reference table above
292+
- [ ] Follow patterns in `references/PATTERNS.md`
293+
- [ ] Run `go test -race -v ./...`
294+
- [ ] Run `pre-commit run --all-files`
295+
- [ ] For page iteration: test with >1 GB databases
296+
- [ ] Show investigation evidence in PR (see CONTRIBUTING.md)
297+
298+
## Testing
299+
300+
```bash
301+
# Full test suite with race detection
302+
go test -race -v ./...
303+
304+
# Specific areas
305+
go test -race -v -run TestReplica_Sync ./...
306+
go test -race -v -run TestDB_Sync ./...
307+
go test -race -v -run TestStore_CompactDB ./...
308+
309+
# Coverage
310+
go test -coverprofile=coverage.out ./...
311+
go tool cover -html=coverage.out
312+
```
313+
314+
Key testing areas:
315+
- Lock page handling with >1 GB databases and multiple page sizes
316+
- Race conditions in position updates, WAL monitoring, and checkpointing
317+
- Eventual consistency in storage backend operations
318+
- Atomic file operations and cleanup on error paths
319+
320+
## Environment Validation
321+
322+
Run `scripts/validate-setup.sh` to verify your development environment is
323+
correctly configured for Litestream development.

0 commit comments

Comments
 (0)