Skip to content

Commit 34cdfc5

Browse files
authored
Merge pull request #233 from NOAA-GSL/staging
advanced overwrite and metadata-aware sync modes for FTP
2 parents 7a07345 + 7d8aa01 commit 34cdfc5

File tree

10 files changed

+1550
-44
lines changed

10 files changed

+1550
-44
lines changed
Lines changed: 176 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,176 @@
1+
# Implementation Plan: FTP Sync Overwrite Modes and Metadata-Aware Sync
2+
3+
**Issue:** [#11 - `acquire ftp`: add overwrite mode and metadata-aware sync for filled-frame use cases](https://github.com/zyra-project/zyra/issues/11)
4+
5+
**Branch:** `claude/github-issue-11-EESdo`
6+
7+
---
8+
9+
## Problem Statement
10+
11+
The `acquire ftp --sync-dir` feature prevents re-downloading existing files. However, when upstream FTP servers backfill missing data with real information (replacing interpolated/placeholder frames), the stale local data persists because filenames remain unchanged.
12+
13+
---
14+
15+
## Key Files to Modify
16+
17+
| File | Purpose |
18+
|------|---------|
19+
| `src/zyra/connectors/backends/ftp.py` | Core sync logic, add `SyncOptions` dataclass and decision function |
20+
| `src/zyra/connectors/ingest/__init__.py` | CLI argument registration and handler |
21+
| `src/zyra/api/schemas/domain_args.py` | API schema updates |
22+
| `src/zyra/connectors/clients.py` | Update `FTPConnector.sync_directory()` signature |
23+
| `tests/connectors/test_ftp_backend.py` | Unit tests for new functionality |
24+
25+
---
26+
27+
## Architecture Decisions
28+
29+
### 1. New `SyncOptions` Dataclass
30+
31+
A frozen dataclass to encapsulate all sync behavior configuration, following existing patterns in the codebase:
32+
33+
```python
34+
@dataclass(frozen=True)
35+
class SyncOptions:
36+
"""Configuration for FTP sync file replacement behavior."""
37+
overwrite_existing: bool = False
38+
recheck_existing: bool = False
39+
min_remote_size: int | str | None = None # bytes or percentage like "10%"
40+
prefer_remote: bool = False
41+
prefer_remote_if_meta_newer: bool = False
42+
skip_if_local_done: bool = False
43+
recheck_missing_meta: bool = False
44+
frames_meta_path: str | None = None
45+
```
46+
47+
### 2. New `should_download()` Function
48+
49+
Central decision logic implementing this precedence order:
50+
51+
1. `--skip-if-local-done`: Check for `.done` marker file first
52+
2. File doesn't exist locally → always download
53+
3. Zero-byte local file → always replace
54+
4. `--overwrite-existing`: Unconditional download
55+
5. `--prefer-remote`: Always download
56+
6. `--prefer-remote-if-meta-newer`: Check `frames-meta.json` timestamps
57+
7. `--recheck-missing-meta`: Download if companion metadata missing
58+
8. `--min-remote-size`: Size-based comparison
59+
9. `--recheck-existing`: Size comparison when mtime unavailable
60+
10. **Default**: MDTM-based timestamp comparison
61+
62+
### 3. New `get_remote_mtime()` Function
63+
64+
Uses FTP's `MDTM` command to get file modification time (fails gracefully when unsupported).
65+
66+
---
67+
68+
## Implementation Phases
69+
70+
### Phase 1: Core Backend (`ftp.py`)
71+
72+
1. Add `SyncOptions` dataclass
73+
2. Implement `get_remote_mtime()` function
74+
3. Implement helper functions:
75+
- `_parse_min_size()` - Parse bytes or percentage size spec
76+
- `_load_frames_meta()` - Load frames-meta.json file
77+
- `_has_done_marker()` - Check for .done marker file
78+
- `_is_missing_companion_meta()` - Check if file is missing from metadata
79+
4. Implement `should_download()` logic
80+
5. Update `sync_directory()` signature and loop
81+
6. Write unit tests for new functions
82+
83+
### Phase 2: CLI Integration (`ingest/__init__.py`)
84+
85+
1. Add new CLI arguments to FTP parser
86+
2. Update `_cmd_ftp()` to construct `SyncOptions`
87+
3. Pass options to `sync_directory()`
88+
4. Write CLI integration tests
89+
90+
### Phase 3: API Schema (`domain_args.py`)
91+
92+
1. Update `AcquireFtpArgs` model with new fields
93+
2. Update `normalize_and_validate()` if needed
94+
3. Test API endpoint with new parameters
95+
96+
### Phase 4: Connector Client (`clients.py`)
97+
98+
1. Update `FTPConnector.sync_directory()` signature
99+
2. Pass through `sync_options` parameter
100+
101+
### Phase 5: Documentation and Cleanup
102+
103+
1. Update CLI help strings
104+
2. Add docstrings with examples
105+
3. Update any sample workflows using FTP sync
106+
107+
---
108+
109+
## New CLI Flags
110+
111+
| Flag | Type | Description |
112+
|------|------|-------------|
113+
| `--overwrite-existing` | bool | Unconditional replacement regardless of timestamps |
114+
| `--recheck-existing` | bool | Compare file sizes when timestamps unavailable |
115+
| `--min-remote-size` | str/int | Replace if remote larger (bytes or "10%") |
116+
| `--prefer-remote` | bool | Always prioritize remote versions |
117+
| `--prefer-remote-if-meta-newer` | bool | Use frames-meta.json timestamps |
118+
| `--skip-if-local-done` | bool | Respect .done marker files |
119+
| `--recheck-missing-meta` | bool | Re-download files lacking metadata |
120+
| `--frames-meta` | path | Path to frames-meta.json |
121+
122+
---
123+
124+
## Potential Risks & Mitigations
125+
126+
| Risk | Impact | Mitigation |
127+
|------|--------|------------|
128+
| MDTM not supported by all servers | Medium | Graceful fallback with logging; document behavior |
129+
| Performance from extra MDTM calls | High | Cache connection; batch metadata queries |
130+
| frames-meta.json format changes | Low | Validate JSON structure; log warnings |
131+
| Timezone handling in MDTM | Medium | Document UTC assumption |
132+
133+
---
134+
135+
## Test Strategy
136+
137+
### Unit Tests (`test_ftp_backend.py`)
138+
139+
- `test_get_remote_mtime_success` - MDTM parsing when server supports it
140+
- `test_get_remote_mtime_not_supported` - Graceful handling when unsupported
141+
- `test_parse_min_size_bytes` - Parsing absolute byte values
142+
- `test_parse_min_size_percentage` - Parsing percentage values
143+
- `test_should_download_*` - All decision logic branches
144+
- `test_sync_directory_with_sync_options` - Integration test
145+
146+
### CLI Tests
147+
148+
- Verify new arguments are recognized
149+
- Test argument combinations
150+
151+
---
152+
153+
## Usage Examples
154+
155+
```bash
156+
# Default: use MDTM timestamps
157+
zyra acquire ftp ftp://server/data/ --sync-dir ./local/
158+
159+
# Force overwrite all files
160+
zyra acquire ftp ftp://server/data/ --sync-dir ./local/ --overwrite-existing
161+
162+
# Only replace if remote is significantly larger
163+
zyra acquire ftp ftp://server/data/ --sync-dir ./local/ --min-remote-size "20%"
164+
165+
# Use metadata-aware sync
166+
zyra acquire ftp ftp://server/frames/ --sync-dir ./frames/ \
167+
--frames-meta ./frames-meta.json \
168+
--prefer-remote-if-meta-newer
169+
170+
# Skip completed files
171+
zyra acquire ftp ftp://server/data/ --sync-dir ./local/ --skip-if-local-done
172+
```
173+
174+
---
175+
176+
*Ready for implementation upon approval.*

pyproject.toml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
[tool.poetry]
22
name = "zyra"
3-
version = "0.1.44"
3+
version = "0.1.45"
44
description = "A tool to ingest data from various sources and formats, create imagery or video based on that data, and send the results to various locations for dissemination."
55
authors = ["Eric Hackathorn <eric.j.hackathorn@noaa.gov>"]
66
include = [

src/zyra/api/schemas/domain_args.py

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -168,6 +168,16 @@ class AcquireFtpArgs(BaseModel):
168168
credentials: dict[str, str] | None = None
169169
credential: list[str] | None = None
170170
credential_file: str | None = None
171+
# Sync mode options
172+
sync_dir: str | None = None
173+
overwrite_existing: bool | None = None
174+
recheck_existing: bool | None = None
175+
min_remote_size: str | int | None = None
176+
prefer_remote: bool | None = None
177+
prefer_remote_if_meta_newer: bool | None = None
178+
skip_if_local_done: bool | None = None
179+
recheck_missing_meta: bool | None = None
180+
frames_meta: str | None = None
171181

172182
@model_validator(mode="after")
173183
def _require_path_or_listing(self): # type: ignore[override]

0 commit comments

Comments
 (0)