Skip to content

Commit 953a46a

Browse files
committed
Add docs/architecture.md file
1 parent e0d8351 commit 953a46a

File tree

1 file changed

+230
-0
lines changed

1 file changed

+230
-0
lines changed

docs/architecture.md

Lines changed: 230 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,230 @@
1+
# Architecture & Design
2+
3+
This document describes the design decisions, key abstractions, and implementation strategies used in the `seeyou-cupx` library.
4+
5+
## Overview
6+
7+
The library provides Rust APIs for reading and writing CUPX files, which are used in aviation/gliding applications to store waypoint data along with associated pictures. The core challenge is that CUPX files contain two concatenated ZIP archives in a single file, requiring careful parsing to separate and access each archive independently.
8+
9+
### Design Goals
10+
11+
- **Zero-copy where possible**: Avoid unnecessary data copying when reading archives
12+
- **Streaming-friendly**: Support both file and in-memory I/O through generic `Read + Seek` traits
13+
- **Type safety**: Leverage Rust's type system to prevent invalid file construction
14+
- **Minimal dependencies**: Keep the dependency tree small and focused
15+
- **Correct error handling**: Distinguish between parsing warnings and fatal errors
16+
17+
## Module Organization
18+
19+
```
20+
seeyou-cupx/
21+
├── src/
22+
│ ├── lib.rs # Public API surface
23+
│ ├── reader.rs # CupxFile: Parsing and reading CUPX files
24+
│ ├── writer.rs # CupxWriter: Creating CUPX files
25+
│ ├── limited_reader.rs # LimitedReader: Byte range restriction wrapper
26+
│ └── error.rs # Error and Warning types
27+
```
28+
29+
### Module Responsibilities
30+
31+
- **`reader.rs`**: Contains the `CupxFile` struct and all parsing logic, including the EOCD search algorithm
32+
- **`writer.rs`**: Contains `CupxWriter` builder pattern for constructing CUPX files with pictures
33+
- **`limited_reader.rs`**: Provides `LimitedReader<R, B>`, a critical abstraction for working with concatenated archives
34+
- **`error.rs`**: Defines `Error` (fatal) and `Warning` (non-fatal) types
35+
36+
## Key Abstractions
37+
38+
### LimitedReader
39+
40+
`LimitedReader<R, B>` is a wrapper that restricts a `Read + Seek` implementation to only access a specific byte range. This is essential for parsing CUPX files because:
41+
42+
1. CUPX files contain two ZIP archives concatenated together
43+
2. ZIP parsers expect to read an entire archive from start to end
44+
3. Without byte range limitation, the ZIP parser would read past the first archive into the second
45+
46+
**How it works**:
47+
- Wraps any `R: Read + Seek` with a `RangeBounds<u64>`
48+
- Translates all read/seek operations to stay within the specified range
49+
- Returns EOF when attempting to read past the range boundary
50+
- Makes the underlying reader appear as if only the byte range exists
51+
52+
**Example use**:
53+
```rust
54+
// Read only bytes 0..1000 from a file
55+
let limited = LimitedReader::new(file, 0..1000)?;
56+
let archive = ZipArchive::new(limited)?; // ZIP parser only sees those bytes
57+
```
58+
59+
### Two-Phase Parsing Strategy
60+
61+
The reader uses a two-phase approach:
62+
63+
**Phase 1: Archive Boundary Detection**
64+
1. Search the file backwards for End of Central Directory (EOCD) signatures
65+
2. Find the boundary between the two ZIP archives
66+
3. Determine if pictures archive exists
67+
68+
**Phase 2: Archive Reading**
69+
1. Create `LimitedReader` for the points archive (second ZIP)
70+
2. Parse `POINTS.CUP` file and extract waypoint/task data
71+
3. Create `LimitedReader` for the pics archive (first ZIP) if it exists
72+
4. Keep pics archive accessible for picture reading
73+
74+
This separation ensures the file is scanned only once for boundaries, then accessed on-demand.
75+
76+
## ZIP File Format & EOCD Search
77+
78+
### Key ZIP Concept
79+
80+
ZIP files end with an End of Central Directory (EOCD) record containing the signature `PK\x05\x06` (at offset 0) and a comment length field (at offset 20). The EOCD appears exactly once per ZIP archive. CUPX files contain two concatenated ZIPs, so two EOCD signatures exist.
81+
82+
### Boundary Detection Algorithm
83+
84+
The parser finds the boundary between archives by searching backwards for EOCD signatures:
85+
86+
1. **Chunked backward search**: Read 64KB chunks from file end, searching for `PK\x05\x06` using `memchr::memmem`
87+
2. **Track positions**: Record the last two EOCD positions found
88+
3. **Calculate boundary**: `second_eocd_offset + 22 + comment_length` (read comment length from EOCD bytes 20-21)
89+
90+
**Archive ranges**:
91+
- Two EOCDs found: Pics `[0..boundary)`, Points `[boundary..end]`
92+
- One EOCD found: No pics (warning), Points `[0..end]`
93+
- Zero EOCDs: Error
94+
95+
Chunked search limits memory to 64KB regardless of file size.
96+
97+
## Reading Flow
98+
99+
```
100+
User calls CupxFile::from_path()
101+
102+
Open file as Read + Seek
103+
104+
Search backwards for EOCD signatures (chunked)
105+
106+
Calculate boundary between archives
107+
108+
Create LimitedReader for points archive (from boundary to EOF)
109+
110+
Parse POINTS.CUP using ZipArchive + seeyou-cup parser
111+
112+
Create LimitedReader for pics archive (from 0 to boundary) if exists
113+
114+
Return CupxFile with pics archive accessible
115+
116+
User calls read_picture() or picture_names()
117+
118+
Access pics archive on-demand via LimitedReader
119+
```
120+
121+
## Writing Flow
122+
123+
```
124+
User creates CupxWriter::new(&cup_file)
125+
126+
User adds pictures via add_picture()
127+
128+
Pictures stored as HashMap<filename, PictureSource>
129+
130+
User calls write() or write_to_path()
131+
132+
Validate all filenames (no empty, no path separators)
133+
134+
Write pics archive:
135+
├── Create ZipWriter
136+
├── For each picture: add to ZIP as "pics/{filename}"
137+
└── Finish pics ZIP
138+
139+
Write points archive:
140+
├── Create in-memory ZipWriter
141+
├── Add POINTS.CUP from CupFile
142+
├── Finish points ZIP to buffer
143+
└── Append buffer to output
144+
145+
Result: Valid CUPX file (pics.zip + points.zip concatenated)
146+
```
147+
148+
### Writer Design Notes
149+
150+
**In-memory points buffer**: The points archive is built entirely in memory before writing. This is acceptable because:
151+
- CUP files are typically small (text-based waypoint data)
152+
- Building in memory simplifies the API (no need for two-pass writing)
153+
- Memory usage is predictable and bounded
154+
155+
**Pictures from paths vs bytes**: `PictureSource` enum allows both:
156+
- `PictureSource::Path`: Read from filesystem during write (avoids loading into memory)
157+
- `PictureSource::Bytes`: Already in memory (useful for generated/modified images)
158+
159+
**Duplicate handling**: Using `HashMap` means adding a picture with the same filename twice replaces the first. This matches intuitive builder pattern behavior.
160+
161+
## Generic Design Patterns
162+
163+
### Generic over Read + Seek
164+
165+
Both `CupxFile<R>` and `LimitedReader<R, B>` are generic over the reader type:
166+
167+
```rust
168+
pub struct CupxFile<R> {
169+
cup_file: CupFile,
170+
pics_archive: Option<ZipArchive<LimitedReader<R, Range<u64>>>>,
171+
}
172+
```
173+
174+
**Benefits**:
175+
- Works with `File`, `Cursor<Vec<u8>>`, `BufReader`, or any custom reader
176+
- Enables testing with in-memory data
177+
- Allows future async implementations without API breakage
178+
179+
**Convenience methods**: `CupxFile<File>` gets special methods like `from_path()` to reduce boilerplate for common cases.
180+
181+
### Builder Pattern for Writing
182+
183+
`CupxWriter` uses the builder pattern with method chaining:
184+
185+
```rust
186+
CupxWriter::new(&cup_file)
187+
.add_picture("a.jpg", path_a)
188+
.add_picture("b.jpg", path_b)
189+
.write_to_path("output.cupx")?;
190+
```
191+
192+
This provides:
193+
- Fluent, readable API
194+
- Flexibility in picture sources
195+
- Compile-time enforcement of required data (CupFile must be provided)
196+
197+
## Error Handling Philosophy
198+
199+
The library distinguishes between **errors** (fatal) and **warnings** (non-fatal):
200+
201+
### Errors (`Error` enum)
202+
- I/O failures
203+
- Malformed ZIP archives
204+
- Invalid CUPX structure (missing EOCD signatures)
205+
- Invalid filenames in writer
206+
- CUP parsing errors
207+
208+
All operations return `Result<T, Error>` for propagation.
209+
210+
### Warnings (`Warning` enum)
211+
- No pictures archive found (still valid CUPX)
212+
- CUP parse warnings (logged but recoverable)
213+
214+
Warnings are collected and returned alongside the result: `Result<(CupxFile, Vec<Warning>), Error>`.
215+
216+
**Rationale**: Many CUPX files in the wild have minor issues but are still usable. Warnings allow users to:
217+
- Log issues without failing
218+
- Decide whether to treat warnings as errors in their context
219+
- Provide better user feedback than silent success or hard failure
220+
221+
## Dependencies
222+
223+
The library has minimal runtime dependencies:
224+
225+
- **`zip`**: ZIP archive reading/writing (with only `deflate` feature enabled)
226+
- **`seeyou-cup`**: CUP file format parsing/writing
227+
- **`thiserror`**: Ergonomic error type derivation
228+
- **`memchr`**: Fast EOCD signature search using SIMD when available
229+
230+
Dev dependencies include `criterion` (benchmarking) and `insta` (snapshot testing).

0 commit comments

Comments
 (0)