|
| 1 | +# Architecture & Design |
| 2 | + |
| 3 | +This document describes the design decisions, key abstractions, and implementation strategies used in the `seeyou-cupx` library. |
| 4 | + |
| 5 | +## Overview |
| 6 | + |
| 7 | +The library provides Rust APIs for reading and writing CUPX files, which are used in aviation/gliding applications to store waypoint data along with associated pictures. The core challenge is that CUPX files contain two concatenated ZIP archives in a single file, requiring careful parsing to separate and access each archive independently. |
| 8 | + |
| 9 | +### Design Goals |
| 10 | + |
| 11 | +- **Zero-copy where possible**: Avoid unnecessary data copying when reading archives |
| 12 | +- **Streaming-friendly**: Support both file and in-memory I/O through generic `Read + Seek` traits |
| 13 | +- **Type safety**: Leverage Rust's type system to prevent invalid file construction |
| 14 | +- **Minimal dependencies**: Keep the dependency tree small and focused |
| 15 | +- **Correct error handling**: Distinguish between parsing warnings and fatal errors |
| 16 | + |
| 17 | +## Module Organization |
| 18 | + |
| 19 | +``` |
| 20 | +seeyou-cupx/ |
| 21 | +├── src/ |
| 22 | +│ ├── lib.rs # Public API surface |
| 23 | +│ ├── reader.rs # CupxFile: Parsing and reading CUPX files |
| 24 | +│ ├── writer.rs # CupxWriter: Creating CUPX files |
| 25 | +│ ├── limited_reader.rs # LimitedReader: Byte range restriction wrapper |
| 26 | +│ └── error.rs # Error and Warning types |
| 27 | +``` |
| 28 | + |
| 29 | +### Module Responsibilities |
| 30 | + |
| 31 | +- **`reader.rs`**: Contains the `CupxFile` struct and all parsing logic, including the EOCD search algorithm |
| 32 | +- **`writer.rs`**: Contains `CupxWriter` builder pattern for constructing CUPX files with pictures |
| 33 | +- **`limited_reader.rs`**: Provides `LimitedReader<R, B>`, a critical abstraction for working with concatenated archives |
| 34 | +- **`error.rs`**: Defines `Error` (fatal) and `Warning` (non-fatal) types |
| 35 | + |
| 36 | +## Key Abstractions |
| 37 | + |
| 38 | +### LimitedReader |
| 39 | + |
| 40 | +`LimitedReader<R, B>` is a wrapper that restricts a `Read + Seek` implementation to only access a specific byte range. This is essential for parsing CUPX files because: |
| 41 | + |
| 42 | +1. CUPX files contain two ZIP archives concatenated together |
| 43 | +2. ZIP parsers expect to read an entire archive from start to end |
| 44 | +3. Without byte range limitation, the ZIP parser would read past the first archive into the second |
| 45 | + |
| 46 | +**How it works**: |
| 47 | +- Wraps any `R: Read + Seek` with a `RangeBounds<u64>` |
| 48 | +- Translates all read/seek operations to stay within the specified range |
| 49 | +- Returns EOF when attempting to read past the range boundary |
| 50 | +- Makes the underlying reader appear as if only the byte range exists |
| 51 | + |
| 52 | +**Example use**: |
| 53 | +```rust |
| 54 | +// Read only bytes 0..1000 from a file |
| 55 | +let limited = LimitedReader::new(file, 0..1000)?; |
| 56 | +let archive = ZipArchive::new(limited)?; // ZIP parser only sees those bytes |
| 57 | +``` |
| 58 | + |
| 59 | +### Two-Phase Parsing Strategy |
| 60 | + |
| 61 | +The reader uses a two-phase approach: |
| 62 | + |
| 63 | +**Phase 1: Archive Boundary Detection** |
| 64 | +1. Search the file backwards for End of Central Directory (EOCD) signatures |
| 65 | +2. Find the boundary between the two ZIP archives |
| 66 | +3. Determine if pictures archive exists |
| 67 | + |
| 68 | +**Phase 2: Archive Reading** |
| 69 | +1. Create `LimitedReader` for the points archive (second ZIP) |
| 70 | +2. Parse `POINTS.CUP` file and extract waypoint/task data |
| 71 | +3. Create `LimitedReader` for the pics archive (first ZIP) if it exists |
| 72 | +4. Keep pics archive accessible for picture reading |
| 73 | + |
| 74 | +This separation ensures the file is scanned only once for boundaries, then accessed on-demand. |
| 75 | + |
| 76 | +## ZIP File Format & EOCD Search |
| 77 | + |
| 78 | +### Key ZIP Concept |
| 79 | + |
| 80 | +ZIP files end with an End of Central Directory (EOCD) record containing the signature `PK\x05\x06` (at offset 0) and a comment length field (at offset 20). The EOCD appears exactly once per ZIP archive. CUPX files contain two concatenated ZIPs, so two EOCD signatures exist. |
| 81 | + |
| 82 | +### Boundary Detection Algorithm |
| 83 | + |
| 84 | +The parser finds the boundary between archives by searching backwards for EOCD signatures: |
| 85 | + |
| 86 | +1. **Chunked backward search**: Read 64KB chunks from file end, searching for `PK\x05\x06` using `memchr::memmem` |
| 87 | +2. **Track positions**: Record the last two EOCD positions found |
| 88 | +3. **Calculate boundary**: `second_eocd_offset + 22 + comment_length` (read comment length from EOCD bytes 20-21) |
| 89 | + |
| 90 | +**Archive ranges**: |
| 91 | +- Two EOCDs found: Pics `[0..boundary)`, Points `[boundary..end]` |
| 92 | +- One EOCD found: No pics (warning), Points `[0..end]` |
| 93 | +- Zero EOCDs: Error |
| 94 | + |
| 95 | +Chunked search limits memory to 64KB regardless of file size. |
| 96 | + |
| 97 | +## Reading Flow |
| 98 | + |
| 99 | +``` |
| 100 | +User calls CupxFile::from_path() |
| 101 | + ↓ |
| 102 | +Open file as Read + Seek |
| 103 | + ↓ |
| 104 | +Search backwards for EOCD signatures (chunked) |
| 105 | + ↓ |
| 106 | +Calculate boundary between archives |
| 107 | + ↓ |
| 108 | +Create LimitedReader for points archive (from boundary to EOF) |
| 109 | + ↓ |
| 110 | +Parse POINTS.CUP using ZipArchive + seeyou-cup parser |
| 111 | + ↓ |
| 112 | +Create LimitedReader for pics archive (from 0 to boundary) if exists |
| 113 | + ↓ |
| 114 | +Return CupxFile with pics archive accessible |
| 115 | + ↓ |
| 116 | +User calls read_picture() or picture_names() |
| 117 | + ↓ |
| 118 | +Access pics archive on-demand via LimitedReader |
| 119 | +``` |
| 120 | + |
| 121 | +## Writing Flow |
| 122 | + |
| 123 | +``` |
| 124 | +User creates CupxWriter::new(&cup_file) |
| 125 | + ↓ |
| 126 | +User adds pictures via add_picture() |
| 127 | + ↓ |
| 128 | +Pictures stored as HashMap<filename, PictureSource> |
| 129 | + ↓ |
| 130 | +User calls write() or write_to_path() |
| 131 | + ↓ |
| 132 | +Validate all filenames (no empty, no path separators) |
| 133 | + ↓ |
| 134 | +Write pics archive: |
| 135 | + ├── Create ZipWriter |
| 136 | + ├── For each picture: add to ZIP as "pics/{filename}" |
| 137 | + └── Finish pics ZIP |
| 138 | + ↓ |
| 139 | +Write points archive: |
| 140 | + ├── Create in-memory ZipWriter |
| 141 | + ├── Add POINTS.CUP from CupFile |
| 142 | + ├── Finish points ZIP to buffer |
| 143 | + └── Append buffer to output |
| 144 | + ↓ |
| 145 | +Result: Valid CUPX file (pics.zip + points.zip concatenated) |
| 146 | +``` |
| 147 | + |
| 148 | +### Writer Design Notes |
| 149 | + |
| 150 | +**In-memory points buffer**: The points archive is built entirely in memory before writing. This is acceptable because: |
| 151 | +- CUP files are typically small (text-based waypoint data) |
| 152 | +- Building in memory simplifies the API (no need for two-pass writing) |
| 153 | +- Memory usage is predictable and bounded |
| 154 | + |
| 155 | +**Pictures from paths vs bytes**: `PictureSource` enum allows both: |
| 156 | +- `PictureSource::Path`: Read from filesystem during write (avoids loading into memory) |
| 157 | +- `PictureSource::Bytes`: Already in memory (useful for generated/modified images) |
| 158 | + |
| 159 | +**Duplicate handling**: Using `HashMap` means adding a picture with the same filename twice replaces the first. This matches intuitive builder pattern behavior. |
| 160 | + |
| 161 | +## Generic Design Patterns |
| 162 | + |
| 163 | +### Generic over Read + Seek |
| 164 | + |
| 165 | +Both `CupxFile<R>` and `LimitedReader<R, B>` are generic over the reader type: |
| 166 | + |
| 167 | +```rust |
| 168 | +pub struct CupxFile<R> { |
| 169 | + cup_file: CupFile, |
| 170 | + pics_archive: Option<ZipArchive<LimitedReader<R, Range<u64>>>>, |
| 171 | +} |
| 172 | +``` |
| 173 | + |
| 174 | +**Benefits**: |
| 175 | +- Works with `File`, `Cursor<Vec<u8>>`, `BufReader`, or any custom reader |
| 176 | +- Enables testing with in-memory data |
| 177 | +- Allows future async implementations without API breakage |
| 178 | + |
| 179 | +**Convenience methods**: `CupxFile<File>` gets special methods like `from_path()` to reduce boilerplate for common cases. |
| 180 | + |
| 181 | +### Builder Pattern for Writing |
| 182 | + |
| 183 | +`CupxWriter` uses the builder pattern with method chaining: |
| 184 | + |
| 185 | +```rust |
| 186 | +CupxWriter::new(&cup_file) |
| 187 | + .add_picture("a.jpg", path_a) |
| 188 | + .add_picture("b.jpg", path_b) |
| 189 | + .write_to_path("output.cupx")?; |
| 190 | +``` |
| 191 | + |
| 192 | +This provides: |
| 193 | +- Fluent, readable API |
| 194 | +- Flexibility in picture sources |
| 195 | +- Compile-time enforcement of required data (CupFile must be provided) |
| 196 | + |
| 197 | +## Error Handling Philosophy |
| 198 | + |
| 199 | +The library distinguishes between **errors** (fatal) and **warnings** (non-fatal): |
| 200 | + |
| 201 | +### Errors (`Error` enum) |
| 202 | +- I/O failures |
| 203 | +- Malformed ZIP archives |
| 204 | +- Invalid CUPX structure (missing EOCD signatures) |
| 205 | +- Invalid filenames in writer |
| 206 | +- CUP parsing errors |
| 207 | + |
| 208 | +All operations return `Result<T, Error>` for propagation. |
| 209 | + |
| 210 | +### Warnings (`Warning` enum) |
| 211 | +- No pictures archive found (still valid CUPX) |
| 212 | +- CUP parse warnings (logged but recoverable) |
| 213 | + |
| 214 | +Warnings are collected and returned alongside the result: `Result<(CupxFile, Vec<Warning>), Error>`. |
| 215 | + |
| 216 | +**Rationale**: Many CUPX files in the wild have minor issues but are still usable. Warnings allow users to: |
| 217 | +- Log issues without failing |
| 218 | +- Decide whether to treat warnings as errors in their context |
| 219 | +- Provide better user feedback than silent success or hard failure |
| 220 | + |
| 221 | +## Dependencies |
| 222 | + |
| 223 | +The library has minimal runtime dependencies: |
| 224 | + |
| 225 | +- **`zip`**: ZIP archive reading/writing (with only `deflate` feature enabled) |
| 226 | +- **`seeyou-cup`**: CUP file format parsing/writing |
| 227 | +- **`thiserror`**: Ergonomic error type derivation |
| 228 | +- **`memchr`**: Fast EOCD signature search using SIMD when available |
| 229 | + |
| 230 | +Dev dependencies include `criterion` (benchmarking) and `insta` (snapshot testing). |
0 commit comments