Codex Relay: zyra-project/zyra PR #105 — feat(ftp): add overwrite mode and metadata-aware sync for acquire ftp#220
Merged
Hackshaven merged 21 commits intostagingfrom Feb 17, 2026
Merged
Conversation
8af513e to
3eb109d
Compare
3eb109d to
946357c
Compare
Contributor
There was a problem hiding this comment.
Pull request overview
This PR enhances Zyra’s FTP acquisition “sync directory” flow by adding configurable overwrite/replacement behavior and optional metadata-aware syncing, enabling better handling of backfilled/updated upstream FTP content.
Changes:
- Add
SyncOptions+ decision logic (should_download) and wire it intoftp_backend.sync_directory()(including.donemarker preservation and optional frames-meta support). - Expose new FTP sync flags on the CLI (
zyra acquire ftp ... --sync-dir ...) and pass them through the connector client/API schema. - Add extensive unit + CLI-integration tests and update wizard capabilities + OpenAPI snapshot hash.
Reviewed changes
Copilot reviewed 9 out of 9 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
src/zyra/connectors/backends/ftp.py |
Core implementation of sync options, metadata helpers, and updated sync loop using a shared FTP connection |
src/zyra/connectors/ingest/__init__.py |
Adds CLI flags and constructs/passes SyncOptions into the FTP backend |
src/zyra/connectors/clients.py |
Extends FTPConnector.sync_directory() to accept/pass sync_options |
src/zyra/api/schemas/domain_args.py |
Expands AcquireFtpArgs to include new sync-related fields |
tests/connectors/test_ftp_backend.py |
Adds tests for parsing, download decision logic, MDTM helper, sync behavior, and CLI option plumbing |
src/zyra/wizard/zyra_capabilities.json |
Updates wizard capability metadata to include new FTP flags |
src/zyra/wizard/zyra_capabilities/acquire.json |
Updates acquire-domain wizard capability metadata for new FTP flags |
tests/snapshots/openapi_sha256.txt |
Updates OpenAPI snapshot hash due to schema/CLI surface changes |
docs/plans/issue-11-ftp-sync-overwrite.md |
Adds an implementation plan document for the feature |
946357c to
e30e751
Compare
e30e751 to
c520002
Compare
c520002 to
9728e7f
Compare
010edb7 to
e18d46d
Compare
Add detailed implementation plan for adding overwrite mode and metadata-aware sync to the acquire ftp command, including: - Architecture decisions (SyncOptions dataclass, should_download logic) - Implementation phases - New CLI flags specification - Test strategy - Risk mitigations https://claude.ai/code/session_0197c48GcqyW92ZGQ3Gm1WJw Signed-off-by: Claude <noreply@anthropic.com>
Implements issue #11: Add overwrite mode and metadata-aware sync for filled-frame use cases in the acquire ftp --sync-dir command. New CLI flags: - --overwrite-existing: unconditional replacement - --recheck-existing: compare file sizes when timestamps unavailable - --min-remote-size: replace if remote larger (bytes or percentage) - --prefer-remote: always prioritize remote versions - --prefer-remote-if-meta-newer: use frames-meta.json timestamps - --skip-if-local-done: respect .done marker files - --recheck-missing-meta: re-download files lacking metadata - --frames-meta: path to frames-meta.json Implementation: - Add SyncOptions dataclass to encapsulate sync behavior configuration - Add get_remote_mtime() function using FTP MDTM command - Add should_download() decision logic with precedence-based evaluation - Add helper functions for size parsing and metadata handling - Update sync_directory() to use new SyncOptions - Update CLI argument registration and handler - Update API schema (AcquireFtpArgs) for programmatic access - Update FTPConnector.sync_directory() signature Tests: - Add comprehensive unit tests for all new functionality - Test _parse_min_size with bytes and percentages - Test should_download with all option combinations - Test get_remote_mtime with MDTM parsing - Test sync_directory with SyncOptions https://claude.ai/code/session_0197c48GcqyW92ZGQ3Gm1WJw Signed-off-by: Claude <noreply@anthropic.com>
- Format code with ruff - Fix UP035: import Iterable from collections.abc - Fix SIM102: combine nested if statements - Remove unused import https://claude.ai/code/session_0197c48GcqyW92ZGQ3Gm1WJw Signed-off-by: Claude <noreply@anthropic.com>
The new CLI arguments for FTP sync changed the OpenAPI schema, requiring an updated snapshot hash. https://claude.ai/code/session_0197c48GcqyW92ZGQ3Gm1WJw Signed-off-by: Claude <noreply@anthropic.com>
Pre-commit hook regenerated the manifest to include the new FTP sync options (--overwrite-existing, --recheck-existing, etc.) https://claude.ai/code/session_0197c48GcqyW92ZGQ3Gm1WJw Signed-off-by: Claude <noreply@anthropic.com>
- Fix docstring: clarify default SyncOptions behavior includes mtime-based replacement, not just missing/zero-byte files - Add logging to _load_frames_meta() instead of silently swallowing exceptions - Short-circuit remote metadata fetching when local-only conditions can decide (missing file, zero-byte, or .done marker present) to reduce FTP connections - Add NOTE comment about potential future connection pooling optimization https://claude.ai/code/session_0197c48GcqyW92ZGQ3Gm1WJw Signed-off-by: Claude <noreply@anthropic.com>
Address remaining Copilot review feedback: - Add logging.debug() to OSError handler in prefer_remote_if_meta_newer check - Add logging.debug() to OSError handler in MDTM comparison check These handlers catch expected failures (file inaccessible) and now log at debug level instead of silently passing. https://claude.ai/code/session_0197c48GcqyW92ZGQ3Gm1WJw Signed-off-by: Claude <noreply@anthropic.com>
The pre-commit manifest regeneration removed narrate.json and other commands because the development environment lacks optional dependencies (like OpenAI). This broke tests in the upstream repo. This commit: - Restores all manifest files from before the automated regeneration - Manually adds the new FTP sync options to acquire.json - Skips the manifest regeneration hook to avoid the same issue https://claude.ai/code/session_0197c48GcqyW92ZGQ3Gm1WJw Signed-off-by: Claude <noreply@anthropic.com>
Address Copilot review feedback about connection overhead. Previously, each get_size/get_remote_mtime call opened a new FTP connection, and downloads used yet another connection - up to 3 connections per file. Now sync_directory uses a single FTP connection for: - SIZE queries (file size checks) - MDTM queries (modification time checks) - RETR commands (file downloads) The connection is established lazily on first use and properly closed in a finally block. This significantly reduces latency and server load for large directory syncs. https://claude.ai/code/session_0197c48GcqyW92ZGQ3Gm1WJw Signed-off-by: Claude <noreply@anthropic.com>
- Add debug logging to FTP SIZE/MDTM silent exception handlers - Fix precedence mismatch: skip_if_local_done now checked before missing/zero-byte check in sync_directory, matching should_download() - Add CLI integration tests for SyncOptions argument parsing https://claude.ai/code/session_0197c48GcqyW92ZGQ3Gm1WJw Signed-off-by: Claude <noreply@anthropic.com>
- Fix test comment wording: "10% larger" -> "10% increase" - Fix grammar in --min-remote-size help text: add missing "is" - Use round() instead of int() in _parse_min_size() for accuracy - Rename _has_companion_meta() to _is_missing_companion_meta() for clearer semantics (inverted return value) - Simplify get_mtime_via_conn() by removing duplicate return None Note: Skipped the needs_remote_mtime optimization suggestion as it would change behavior when prefer_remote_if_meta_newer doesn't trigger a download but MDTM comparison would. https://claude.ai/code/session_0197c48GcqyW92ZGQ3Gm1WJw Signed-off-by: Claude <noreply@anthropic.com>
The "remove locals not on server" cleanup step was deleting .done marker files because they don't exist on the remote. This broke the skip_if_local_done behavior across multiple sync runs - after the first sync, markers would be deleted and subsequent syncs would re-download files that should stay skipped. Now files ending in .done are excluded from cleanup, preserving the marker-based skip behavior. https://claude.ai/code/session_0197c48GcqyW92ZGQ3Gm1WJw Signed-off-by: Claude <noreply@anthropic.com>
- Add missing sync_dir to FTP acquire args_schema.optional list - Update docs to reflect renamed function: _has_companion_meta() -> _is_missing_companion_meta() https://claude.ai/code/session_0197c48GcqyW92ZGQ3Gm1WJw Signed-off-by: Claude <noreply@anthropic.com>
Properly regenerated using `zyra generate-manifest` with all dependencies installed. This syncs the manifest with the actual CLI argument definitions. https://claude.ai/code/session_0197c48GcqyW92ZGQ3Gm1WJw Signed-off-by: Claude <noreply@anthropic.com>
The argparse '%%' escape was leaking into generated manifests as literal '10%%' instead of '10%'. Reworded help text to avoid the percent character entirely: "Threshold for replacement (absolute bytes or relative percentage)" This keeps the documentation accurate without relying on argparse escape sequences that don't translate to manifest generation. https://claude.ai/code/session_0197c48GcqyW92ZGQ3Gm1WJw Signed-off-by: Claude <noreply@anthropic.com>
Address timezone correctness issues flagged in code review: - Parse FTP MDTM timestamps as UTC-aware (RFC 3659 defines MDTM as UTC) in both get_mtime_via_conn() and get_remote_mtime() - Use datetime.fromtimestamp(st_mtime, tz=timezone.utc) for local file mtime so both sides of comparisons are UTC-aware - Normalize _get_meta_timestamp() to always return UTC-aware datetimes, handling both aware and naive ISO timestamps from frames-meta.json - Gate recheck_existing size comparison on remote_mtime is None so it only fires as a fallback when MDTM is unavailable, preventing downloads of older files that happen to differ in size https://claude.ai/code/session_0197c48GcqyW92ZGQ3Gm1WJw Signed-off-by: Claude <noreply@anthropic.com>
- Catch ValueError/TypeError in MDTM timestamp parsing (get_mtime_via_conn and get_remote_mtime) to handle malformed server responses gracefully - Validate _load_frames_meta() returns a dict before using it - Fix FTP connection leak in list_files() by adding try/finally with ftp.quit() https://claude.ai/code/session_0197c48GcqyW92ZGQ3Gm1WJw Signed-off-by: Claude <noreply@anthropic.com>
The static analysis flagged that `filename` (derived from parse_ftp_path return tuple) could theoretically carry sensitive data. While `filename` is the path component (not credentials), the log message only needs the raw timestamp string to diagnose parse failures. Simplified the debug log accordingly. https://claude.ai/code/session_0197c48GcqyW92ZGQ3Gm1WJw Signed-off-by: Claude <noreply@anthropic.com>
When a user provides --frames-meta with a path that doesn't exist, log a warning instead of silently falling back to non-metadata sync. This helps catch typos and misconfiguration. https://claude.ai/code/session_0197c48GcqyW92ZGQ3Gm1WJw Signed-off-by: Claude <noreply@anthropic.com>
When prefer_remote_if_meta_newer is set, should_download() can fall through from the metadata check (step 6) to the default MDTM-based comparison (step 10). Include this option in the needs_remote_mtime calculation so the relationship is explicit and defensive against future refactoring. https://claude.ai/code/session_0197c48GcqyW92ZGQ3Gm1WJw Signed-off-by: Claude <noreply@anthropic.com>
The previous commit added `or options.prefer_remote_if_meta_newer` to needs_remote_mtime, but this was redundant: when prefer_remote_if_meta_newer is set without overwrite_existing/prefer_remote, the expression already evaluates to True. When overwrite_existing or prefer_remote IS set, should_download() returns at steps 4-5 before step 10 (MDTM comparison) is reached, so fetching mtime is unnecessary. Revert to the original simple expression and add a comment explaining why this correctly handles all modes including metadata-aware sync. https://claude.ai/code/session_0197c48GcqyW92ZGQ3Gm1WJw Signed-off-by: Claude <noreply@anthropic.com>
e18d46d to
bd60328
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Mirrored from zyra-project/zyra PR #105
Source: zyra-project/zyra#105